1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-07 21:18:59 +03:00
Commit Graph

6013 Commits

Author SHA1 Message Date
Zdenek Kabelac
7679ee9535 Use generic name for message sending function
Drop _thin_pool prefix for _node_send_message so it could be extended later.
Replace current_id with trans_id name.
2011-10-19 16:40:59 +00:00
Zdenek Kabelac
a374aed61c Simple validation of messages in mda
Check we do not combine multiple messages for same LV target
and switch to use  'delete_id' to make it clear for what this device_id
is being used.
2011-10-19 16:39:09 +00:00
Zdenek Kabelac
866b21532a Drop messages referencing deleted LV
lvremove may remove problematic LV for thin target.
2011-10-19 16:37:30 +00:00
Zdenek Kabelac
fc260f5bcb Just indent changes
Some tabs & spaces.
2011-10-19 16:36:39 +00:00
Zdenek Kabelac
c239c824f5 Add internal expected_errno dm_tast var
Certain errno codes could be expected in some situations thus
add experimental support for them.

When expected errno is set after ioctl error - function skips error
printing and exits succefully.

Currently only useful for thin pool messages.
2011-10-19 16:36:01 +00:00
Zdenek Kabelac
d1be6a0c37 Remove test for thin_pool
Since both functions are called during mda read - we don't have full LV info
at this moment.
2011-10-19 16:32:34 +00:00
Petr Rockai
9f4ef9fda6 Remove a redundant (and in some cases, misleading) message about snapshot
extension, in the snapshot dmeventd plugin. The reporting is done as needed by
the LVM command nowadays.
2011-10-19 14:31:49 +00:00
Petr Rockai
2a8662ce2e New. 2011-10-19 09:01:03 +00:00
Petr Rockai
6de64854a8 Keep the LVM-based dmeventd plugins from trying to manipulate the dmeventd
monitoring state of the logical volumes they are currently acting on.

Until now, every time a logical volume has been changed by a dmeventd plugin,
this plugin would have called back to dmeventd through the external FIFO
mechanism. I am fairly sure this was superfluous, inefficient and possibly even
dangerous.
2011-10-19 08:46:26 +00:00
Jonathan Earl Brassow
a4ddd21f8e Fix bad lvconvert help output.
The '--merge' option to lvconvert works on snapshots and RAID1.  The man
pages correctly reflect this, but the CLI help output still used the term,
'SnapshotLogicalVolume'.
2011-10-18 16:27:45 +00:00
Zdenek Kabelac
cd2eab0d10 Use zalloc for malloc,memset 2011-10-17 14:36:06 +00:00
Zdenek Kabelac
edb7aaf046 Drop messages from lvm app context
(revert)
Thinp target uses activation context.
2011-10-17 14:18:07 +00:00
Zdenek Kabelac
10c2510aaf Indent debug message 2011-10-17 14:17:30 +00:00
Zdenek Kabelac
f8690cf8d5 Message support for thin provisiong
lvm part of messaging.

Each message is now stored it's own thin pool section:

message1 {
	create = lv
}

Messages are queued to thin pool dm target when this target
is going to be resumed or used through some dependency.

Currently  'delete' message are purely queued and processed
with next thin pool resume operation (i.e. create_thin).

WARNING - thin provisioning support is developmental code.
2011-10-17 14:17:09 +00:00
Zdenek Kabelac
49e3017e4a Add thin_pool dm message support
Experimental support for kernel message via resume sequence.
2011-10-17 14:16:25 +00:00
Zdenek Kabelac
5c5acddf01 Add _thin_validate_device_id 2011-10-17 14:15:26 +00:00
Zdenek Kabelac
f5dace2cb6 Swap parameters
Use metadata uuid first (match kernel target).
2011-10-17 14:15:01 +00:00
Zdenek Kabelac
783b4e1068 Drop old check for transaction_id
(revert)
2011-10-17 14:14:33 +00:00
Milan Broz
c752c23d33 Fix alignment warning in bitcount calculation for raid segment. 2011-10-17 13:15:35 +00:00
Jonathan Earl Brassow
2fd1acc4dd Use a more correct macro for 'seg_is_linear'
It is better to check 'seg->area_count == 1' than '!seg->stripe_size'.
2011-10-14 14:21:32 +00:00
Jonathan Earl Brassow
c954b73149 cmirrord now returns log name to kernel in CTR so it can be registered
Version 2 of the userspace log protocol accepts return information during the
DM_ULOG_CTR exchange.  The return information contains the name of the log
device that is being used (if there is one).  The kernel can then register the
device via 'dm_get_device'.  Amoung other things, this allows for userspace to
assemble a correct dependency tree of devices - critical for LVM handling of
suspend/resume calls.

Also, update dm-log-userspace.h to match the kernel header associated with
this protocol change.  (Includes a version inc.)
2011-10-14 14:18:49 +00:00
Jonathan Earl Brassow
681ceb16d8 Update stale libdm/misc/dm-log-userspace.h
The upstream kernel version that this file mirrors has changed, here is the
commit message:

commit 86a54a4802df10d23ccd655e2083e812fe990243
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Thu Jan 13 19:59:52 2011 +0000

    dm log userspace: add version number to comms

    This patch adds a 'version' field to the 'dm_ulog_request'
    structure.

    The 'version' field is taken from a portion of the unused
    'padding' field in the 'dm_ulog_request' structure.  This
    was done to avoid changing the size of the structure and
    possibly disrupting backwards compatibility.

    The version number will help notify user-space daemons
    when a change has been made to the kernel/userspace
    log API.

    Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>
    Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-14 14:04:05 +00:00
Zdenek Kabelac
54e43fd3de Use pool for dm_tree allocation
Using the same pool allocation strategy as we use for vg,
so dm_tree structure is part of the pool itself.
2011-10-14 13:34:19 +00:00
Zdenek Kabelac
878960cdba Fix lv_info open_count test
When verify_udev_operations was disable, code for stacking fs operation for
lvm links was completely disable - but this code was also used for collecting
information, that a new node is being created.

Add a new flag which is set when a creation of lv symlinks is requested which
should restore old behaviour of lv_info function, that has called fs_sync()
before quere for open count on device.
2011-10-14 13:23:47 +00:00
Zdenek Kabelac
1ec22a4f73 Remove unused variables 2011-10-11 10:06:57 +00:00
Zdenek Kabelac
f4668058b4 Use constant for the repeated dlid size specification 2011-10-11 10:02:28 +00:00
Zdenek Kabelac
2e36a19e52 Add some fixme locking
Code here is using thread write protected variable without locking.
So add locking, for proper synchronization and a FIXME, since the
code needs closer look.
2011-10-11 09:56:44 +00:00
Zdenek Kabelac
e65626090b Simplify worker loop
Do not reacquire mutex several times without a real reason.
Code readability is also better.
2011-10-11 09:54:39 +00:00
Zdenek Kabelac
b39f294d02 Use barrier instead of mutex
Barrier is supposed to be used in situation like this
and replace tricky mutex usage, where mutex has been unlocked
by a different thread than the locking thread.
2011-10-11 09:26:04 +00:00
Zdenek Kabelac
0372454d07 Add FIXMEs for init_test
Usage of thread unprotected init_test is not correct and needs probably lvm lock
since it part of lvm library. Current implementation may probably fail with
test mode and actually create something unexpectedly (and vice versa).
2011-10-11 09:23:48 +00:00
Zdenek Kabelac
584a487461 Update 2011-10-11 09:20:17 +00:00
Zdenek Kabelac
e9bdc318fd Limit thread stack
Since default thread stack size is around 8MB and clvmd creates for now thread
for message, clvmd may easily reach multi GB size of in-memory locked pages
(runs with mlockall()).

This patch significantly reduces memory usage to just tens of MB,
and now different reasons are the cause of server overloading.
Now we are running out of free file descriptors mostly.
2011-10-11 09:18:49 +00:00
Zdenek Kabelac
d2cd3f4b76 Update whats new 2011-10-11 09:14:51 +00:00
Zdenek Kabelac
83b15720c0 Reduce preallocated stack size
Go with just 64KiB for stack.

Closer inspection should be made, whether we actually need to play with
settings at all.

Since default stack size is 8MB and gets mapped via page locking thus,
it seems there is no big help with preallocation of stack to some value.
2011-10-11 09:13:39 +00:00
Zdenek Kabelac
3701873dc9 Check for refresh_filter failure
Properly detect if the filters were refreshed properly.

(May needs few more fixes ??)

Filter refresh may fail because it may be out of free file descriptors
when clvmd gets overloaded.
2011-10-11 09:09:00 +00:00
Zdenek Kabelac
b105d7e207 Add missing log_error for alloc failure 2011-10-11 09:06:09 +00:00
Zdenek Kabelac
fea9b4eaa3 Use condition instead of sleep
Replace usleep with pthread condition to increase speed testing
(for simplicity just 1 condition for all locks).

Use thread mutex also for unlock resource (so it wakes up awaiting
threads)

Better check some error states and return error in fail case with
unlocked mutex.
2011-10-11 09:05:20 +00:00
Zdenek Kabelac
eb050343b9 Use shorter way for if() 2011-10-11 09:03:33 +00:00
Zdenek Kabelac
4802bbc548 Skip backtrace after log_error 2011-10-11 09:02:20 +00:00
Zdenek Kabelac
3822c98285 Replace with debug
Since the dm_tree_create already reports reason of error,
use log_debug for this message.
2011-10-11 09:01:38 +00:00
Zdenek Kabelac
d70b1eea5d Improve backtrace reporting
Add <backtrace> so the function appears logged for the fail path.
2011-10-11 08:59:42 +00:00
Zdenek Kabelac
0c92ec4d21 Change message severity
Using log_warn to report missing symlinks as warning, since the command
itself returns as successful, we should not produce log_error().
log_warn is better fit here.
2011-10-11 08:57:13 +00:00
Zdenek Kabelac
fdeda0b438 Skip r assignment
Cosmetic, since r is already 0 for the error path, no need to assign it there,
and r is assigned to 1 after switch command.
Also makes the code more readable.
2011-10-11 08:54:01 +00:00
Zdenek Kabelac
e7eebbc90f Reindent some thin functions 2011-10-11 08:51:56 +00:00
Zdenek Kabelac
fcbb8e5c5d Remove test for first_time with FIXME
Workaround for the current code with big FIXME,
since proper solution for pvmove needs to be developed.

Commiting this only for the purpose to get cluster testing covered.
2011-10-11 08:51:02 +00:00
Jonathan Earl Brassow
2c80ace622 Add the ability to convert LVs of "mirror" segtype to "raid1" segtype.
Example:
~> lvconvert --type raid1 vg/mirror_lv

Steps to convert "mirror" to "raid1"
1) Allocate a RAID metadata LV for each mirror image from the same PVs
   on which they are located.
2) Clear the metadata LVs.  This involves writing LVM metadata, so we don't
   change any aspects of the mirror LV before this so that the user can easily
   remove LVs from the failed convert attempt while retaining the original
   mirror.
3) Remove the mirror log, if it exists.
4) Add metadata LVs to mirror LV
5) Rename mirror sub-lvs (s/mimage/rimage/)
6) Change flags and segtype from mirror to raid1
2011-10-07 14:56:01 +00:00
Jonathan Earl Brassow
50a48b38f5 Add the ability to convert linear LVs to RAID1
Example:
~> lvconvert --type raid1 -m 1 vg/lv

The following steps are performed to convert linear to RAID1:
1) Allocate a metadata device from the same PV as the linear device
   to provide the metadata/data LV pair required for all RAID components.
2) Allocate the required number of metadata/data LV pairs for the
   remaining additional images.
3) Clear the metadata LVs.  This performs a LVM metadata update.
4) Create the top-level RAID LV and add the component devices.

We want to make any failure easy to unwind.  This is why we don't create the
top-level LV and add the components until the last step.  Should anything
happen before that, the user could simply remove the unnecessary images.  Also,
we want to ensure that the metadata LVs are cleared before forming the array to
prevent stale information from polluting the new array.

A new macro 'seg_is_linear' was added to allow us to distinguish linear LVs
from striped LVs.
2011-10-07 14:52:26 +00:00
Jonathan Earl Brassow
76ab264200 Allow 'nosync' extension of mirrors.
This patch allows a mirror to be extended without an initial resync of the
extended portion.  It compliments the existing '--nosync' option to lvcreate.
This action can be done implicitly if the mirror was created with the '--nosync'
option, or explicitly if the '--nosync' option is used when extending the device.

Here are the operational criteria:
1) A mirror created with '--nosync' should extend with 'nosync' implicitly
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv ; lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 10.00g                         lv_mlog 100.00

2) The 'M' attribute ('M' signifies a mirror created with '--nosync', while 'm'
signifies a mirror created w/o '--nosync') must be preserved when extending a
mirror created with '--nosync'.  See #1 for example of 'M' attribute.

3) A mirror created without '--nosync' should extend with 'nosync' only when
'--nosync' is explicitly used when extending.
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv; lvs vg
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 20.00m                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 5.02 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 5.02g                         lv_mlog   0.39
vs.
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv --nosync; lvs vg
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 20.00m                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 5.02 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 5.02g                         lv_mlog 100.00

4) The 'm' attribute must change to 'M' when extending a mirror created without
'--nosync' is extended with the '--nosync' option.  (See #3 examples above.)

5) An inactive mirror's sync percent cannot be determined definitively, so it
must not be allowed to skip resync.  Instead, the extend should ask the user if
they want to extend while performing a resync.
[EXAMPLE]# lvchange -an vg/lv
[EXAMPLE]# lvextend -L +5G vg/lv
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  vg/lv is not active.  Unable to get sync percent.
Do full resync of extended portion of vg/lv?  [y/n]: y
  Logical volume lv successfully resized

6) A mirror that is performing recovery (as opposed to an initial sync) - like
after a failure - is not allowed to extend with either an implicit or
explicit nosync option.  [You can simulate this with a 'corelog' mirror because
when it is reactivated, it must be recovered every time.]
[EXAMPLE]# lvcreate -m1 -L 5G -n lv vg --nosync --corelog
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  Logical volume "lv" created
[EXAMPLE]# lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                             100.00
[EXAMPLE]# lvchange -an vg/lv; lvchange -ay vg/lv; lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                               0.08
[EXAMPLE]# lvextend -L +5G vg/lv
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  vg/lv cannot be extended while it is recovering.

7) If 'no' is selected in #5 or if the condition in #6 is hit, it should not
result in the mirror being resized or the 'm/M' attribute being changed.


NOTE:  A mirror created with '--nosync' behaves differently than one created
without it when performing an extension.  The former cannot be extended when
the mirror is recovering (unless in-active), while the latter can.  This is
a reasonable thing to do since recovery of a mirror doesn't take long (at
least in the case of an on-disk log) and it would cause far more time in
degraded mode if the extension w/o '--nosync' was allowed.  It might be
reasonable to add the ability to force the operation in the future.  This
should /not/ force a nosync extension, but rather force a sync'ed extension.
IOW, the user would be saying, "Yes, yes... I know recovery won't take long
and that I'll be adding significantly to the time spent in degraded mode, but
I need the extra space right now!".
2011-10-06 15:32:26 +00:00
Jonathan Earl Brassow
1986e51928 Fix splitmirror in cluster having different DM/LVM views of storage.
This patch also does some clean-up of the splitmirrors code.

I've attempted to clean-up the splitmirrors code to make it easier to
understand with fewer operations.  I've tried to reduce the number of
metadata operations without compromising the intermediate stages which
are necessary for easy clean-up in the even of failure.

These changes now correctly handle cluster situations - including exclusive
cluster mirrors.  Whereas before, a splitmirror operation would result in
remote nodes having LVM commands report the newly split LV with a proper
name while DM commands would report the old (pre-split) names of the device.
IOW, there was a kernel/userspace mismatch.
2011-10-06 14:55:39 +00:00
Jonathan Earl Brassow
f7235e7cb4 Revert initial solution to bug 733114 - I/O error message during splitmirror
The original commit comments can be located via this git commit ID:
	7d8e615c0b

There were three possible solutions to the original problem proposed in the
initial check-in.  The one chosen was as follows:
    2) Do like _remove_mirror_images does and suspend the original, then suspend
    the sub-lv (the error target), then resume the sub-lv, and finally resume the
    original LV.  This seems like extra pointless operations to me, but it doesn't
    produce the error message (although, I'm not sure why) and it allows us to
    leave the visible flag in place.
Turns out, the cluster also views the extra suspend/resume operations as
pointless too and ignores them.  So, this solution doesn't work in a cluster.
Further, I've noticed that in addition to the remote cluster nodes still getting
I/O errors from scanning the error target, they also have a different LVM and
DM views of the same LV.  IOW, while the LVM level (gotten from the LVM metadata)
sees the correct name for the newly split LV, device-mapper still maintains the
old names.

Because the original fix failed to completely fix the problem (or work-around it)
and because a better solution must be found to address the additional cluster
issue of device renaming, I am reverting the above mentioned commit.
2011-10-06 14:49:16 +00:00