1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-12-30 17:18:21 +03:00
Commit Graph

252 Commits

Author SHA1 Message Date
Zdenek Kabelac
7ec8e691c4 libdm: use 64bit type for raid index
Used properly signed 64bit constant for shifting.
2014-02-15 11:36:37 +01:00
Jonathan Brassow
df181cc51e cache: Add DM interface for retrieving a cache's status
This patch defines a structure for holding all of the device-mapper
cache target's status information.  The associated function provides
an easy way for higher levels (LVM) to consume the information.

This patch finishes the device-mapper interface for the cache and
cachepool segment types (i.e. the cache target).
2014-01-27 05:30:42 -06:00
Jonathan Brassow
1ff7e214e0 cache: New 'cache' segment type
This patch adds the cache segment type - the second of two necessary
to create cache logical volumes.  This segment type references the
cachepool (the small fast device) and the origin (the large slow device);
linking them to create the cache device.  The cache device is the
hierarchical device-mapper device that the user ulitmately makes use
of.

The cache segment sources the information necessary to construct the
device-mapper cache target from the origin and cachepool segments to
which it links.
2014-01-27 05:29:35 -06:00
Zdenek Kabelac
0638d1d82e libdm: preload revert after failing callback
Revert activated volumes if callback fails.
This is currently used only for thin_check failure support.

When thin_check detects failure in thin metadata device, it deactivate
volumes in reversed order that have been preloaded for thin pool activation.
After this change lvm command will not leave active pool subvolumes
in dm table.
2014-01-17 10:48:49 +01:00
Zdenek Kabelac
d98511c717 cleanup: indent 2014-01-17 10:48:49 +01:00
Zdenek Kabelac
af7297c73e libdm: pass dnode to callback
Pass dnode  pointer instead of rather unknown child pointer.
The pointer is currently unused and passing child pointer
is quite undefined, while dnode has at least some usability.
2014-01-08 11:57:43 +01:00
Jonathan Brassow
ca51435153 Misc/RAID: Enable resume_lv to handle some renaming conflicts.
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps.  For
example, if there is a 3-way mirror:
	[0][1][2]
and we remove device#0, the devices will be shifted down
	[1][2]
and renamed.
	[0][1]

This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though.  This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name.  The solution is to check for a conflicting name before
attempting to rename.  If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.

Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV.  This "hack"
only worked for single machine use-cases of LVM.  Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
2013-09-09 15:07:28 -05:00
Alasdair G Kergon
83fb622598 deptree: don't remove live node on resume failure
When resuming a node needed by a higher layer of the tree,
if the resume fails, only remove it if the node did not
originally have a live table.

Ref. 97f8454ecc
2013-07-23 13:33:35 +01:00
Zdenek Kabelac
5658ec2bdc libdm: thin pool target sends messages once
Clear send_messages flag when they have been delivered successfully.
There is no need to validate it for all other activations of the same
node in the dm_tree.

Also add extra debug message which shows the reason for skipping
sending of messages because the transaction_id has already the matching
value.
2013-07-15 15:45:28 +02:00
Zdenek Kabelac
47419d21ac cleanup: stack usage
Shortening code with macros return_0, return_NULL.
Add some missing stack prints in error paths.
2013-07-01 23:11:14 +02:00
Jonathan Brassow
8ac9791c36 RAID: s/int/uint32_t for dev_count in dm_status_raid struct
Device count is never negative.  Change 'dev_count' to be
uint32_t instead of int.
2013-06-17 12:58:38 -05:00
Zdenek Kabelac
861fd1108f libdm: move thin max size to header
Move max size of thin metadata into define.
Increase a bit the size to match the kernel size.
(16978542592->17112760320)
2013-06-11 14:21:00 +02:00
Jonathan Brassow
562c678ee2 DM RAID: Add ability to throttle sync operations for RAID LVs.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs.  The options are available for 'lvcreate' and
'lvchange' and are as follows:
  --minrecoveryrate <Rate> [bBsSkKmMgG]
  --maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device.  If a suffix is not given,
kiB/sec/device is assumed.  Setting the rate to 0 removes the preference.
2013-05-31 11:25:52 -05:00
Zdenek Kabelac
e4dfa785d1 libdm: compensate suspend counter for live table
This patch may not be fully correct. It tries to solve
the imbalanced suspend counter.

The problem starts when some LV is created and fails in resume path.
(i.e. resuming to large PV (enforced) over small loop devices)

This fails in _resume_node() after dm_task_run(). And while
existing device with empty table is left in inactive table,
further calls are reporting this device is in suspend state.

When later the lvm2 tries to rollback created device and deactivate it,
it will end with internal error, when we try to decrement
never incremented suspend counter.

As an 'easy fix' for now update suspend counter only for live nodes.

TODO: explore better fix.
2013-05-30 17:35:23 +02:00
Zdenek Kabelac
cb587fd100 libdm: free mem pool on err path
Since we use get_status also in dmeventd, which may use one pool
for a single device, in case it would be repeatedly returning error,
it may not be freeing the pool and would cause slow but steady growth.
To stay safe in the error path release any allocated memory.
2013-05-27 10:30:55 +02:00
Zdenek Kabelac
4707ac7200 libdm: add dm_get_status_snapshot
Add dm_get_status_snapshot() for parsing snapshot status.
2013-05-27 10:30:51 +02:00
Zdenek Kabelac
3ba3bc0d66 cleanup: drop backtrace
After log_error/log_warn there is no point to show <backtrace>
in debug log trace from the next code line.
2013-05-27 10:28:32 +02:00
Jonathan Brassow
2e0740f7ef RAID: Add writemostly/writebehind support for RAID1
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics.  The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value.  If no trailing
character is given, it will set the flag.
Synopsis:
        lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
        lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv

The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set.  It is signified with a 'w'.  If the device
has failed, the 'p'artial flag has priority.

Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   Rwi---r-m    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-w    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r--    1 linear   4.00m

Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   rwi---r-p    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-p    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r-p    1 linear   4.00m

A new reportable field has been added for writebehind as well.  If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--     512
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--

Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--
2013-04-15 13:59:46 -05:00
Jonathan Brassow
faeea37057 RAID: Revert previous commit that allowed identical table loads.
Revert commit 31c24dd9f2.  This commit
was used to force a RAID device-mapper table to be loaded into the
kernel despite the fact that it was identical to the one already
loaded.  The effect allowed a RAID array with a transiently failed
device to refresh and reintegrate the failed device.  This operation
is better done in the kernel on a 'resume'.  Since,
'lvchange --refresh' already performs a suspend/resume cycle, the
above commit is not needed once the kernel change is made.  Reverting
the commit removes an unnecessary (at least for now) change to the
device-mapper interface.
2013-04-11 15:57:14 -05:00
Jonathan Brassow
38f8f4a958 RAID: Capture new RAID kernel sync_action status fields
I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0.  It is backwards compatible with the
old status line - initializing the new fields to '0'.  The new
structure is also more amenable to future changes.  It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features.  It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable).  This allows future fields to be appended.

The new fields that are available are:
 - sync_action : shows what the sync thread in the kernel is doing
                 (idle, frozen, resync, recover, check, repair, or
                 reshape)
 - mismatch_count: shows the number of discrepancies which were
                   found or repaired by a "check" or "repair"
                   process, respectively.
2013-04-08 15:04:08 -05:00
Zdenek Kabelac
3fd0242a0a libdm: validate params for NULL
Validate passed params and report error
instead of dereferencing NULL passed argument.
2013-04-05 14:13:12 +02:00
Jonathan Brassow
31c24dd9f2 RAID: Code changes missing from previous commit (bbc6378)
Previous commit included changes to WHATSNEW, but the code changes
were missing.  Here is the description from the previous commit:
commit bbc6378b73
Author: Jonathan Brassow <jbrassow@redhat.com>
Date:   Thu Feb 21 11:31:36 2013 -0600

    RAID:  Make 'lvchange --refresh' restore transiently failed RAID PVs

    A new function (dm_tree_node_force_identical_table_reload) was added to
    avoid the suppression of identical table reloads.  This allows RAID LVs
    to reload the on-disk superblock information that contains which devices
    have failed and the bitmaps.  If the failed device has returned, this has
    the effect of restoring the device and initiating recovery.  Without this
    patch, the user had to completely deactivate their RAID LV and re-activate
    it in order to restore the failed device.  Now they simply need to
    suspend and resume (which is done by 'lvchange --refresh').

    The identical table suppression is only avoided if the LV is not PARTAIL
    (i.e. all of it's devices can be seen and read by LVM) and the kernel
    status of the array contains failed devices.  In other words, the function
    will only be called in the case where we may have success in restoring
    a failed device in the array.
2013-03-06 10:17:11 -06:00
Zdenek Kabelac
a4870c79ca thin: use noflush for obtaining transaction_id
Do not flush thin pool data, when reading transation_id status.
2013-02-04 19:05:56 +01:00
Zdenek Kabelac
d2eae42c0e libdm: support newer thin pool status parameters
Support read_only and discards information.
2013-02-04 19:01:10 +01:00
Jonathan Brassow
c8242e5cf4 RAID: Add RAID status accessibility functions
Similar to the way thin* accesses its kernel status, we add a method
for RAID to grab the various values in its status output without the
higher levels (LVM) having to understand how to parse the output.
Added functions include:
        - lib/activate/dev_manager.c:dev_manager_raid_status()
          Pulls the status line from the kernel

        - libdm/libdm-deptree.c:dm_get_status_raid()
          Parses status line and puts components into dm_status_raid struct

        - lib/activate/activate.c:lv_raid_dev_health()
          Accesses dm_status_raid to deliver raid dev_health string

The new structure and functions can provide a more unified way to access
status information.  ('lv_raid_percent' could switch to using these
functions, for example.)
2013-02-01 11:31:47 -06:00
Alasdair G Kergon
06abb2dd4c logging: classify log_debug messages
Place most log_debug() messages into a class.
2013-01-07 22:30:29 +00:00
Zdenek Kabelac
97f8454ecc libdm: deactivate failed node in preload
If the resume of preloaded node fails, do not leave such
node in the table - since it may not be easy to detach such
node later when the node is i.e. internal.

i.e. failing activation of the thin pool with mismatching
chunk size may leave -tpool device in the table, which
could have been then removed only by dmsetup command.
2012-12-02 17:59:40 +01:00
Zdenek Kabelac
1946a45329 libdm: reset delay flag for devs used by thin
Patch clears the flag if thin pool is stacked over mirror.

Since thin pool could be used to stack device over mirrors,
it needs resume properly i.e. mirrors with corelog which are otherwise
unconditionally skipped (for pvmove functionality).
2012-10-03 15:04:41 +02:00
Jonathan Brassow
4047e4dfb1 RAID: Add support for RAID10
This patch adds support for RAID10.  It is not the default at this
stage.  The user needs to specify '--type raid10' if they would like
RAID10 instead of stacked mirror over stripe.
2012-08-24 15:34:19 -05:00
Zdenek Kabelac
ff86c6ed00 cleanup: keep MKNOD type cast clean
Setup major already a dev_t type before it gets shifted.
2012-08-23 14:37:21 +02:00
Zdenek Kabelac
286cd2006b cleanup: drop unneeded included header files
This headers were not resolving anything used for compiled .c files.
Remove unused util.c file.
2012-08-23 14:37:20 +02:00
Alasdair G Kergon
4dab0d3175 comments: misc updates
Miscellaneous clarifications to comments.
2012-08-07 18:34:30 +01:00
Zdenek Kabelac
c4db22bd4f libdm: support reserve and release metadata snap msg
Add support for new message types for thinp target 1.1
2012-07-18 14:34:19 +02:00
Zdenek Kabelac
dcd4afc716 libdm: add support for external origin and discard 2012-07-18 14:33:37 +02:00
Zdenek Kabelac
6fc4c99b2f cleanup: use dev_t type 2012-06-22 13:50:21 +02:00
Zdenek Kabelac
6f3cd63551 cleanup: replace memset with struct initilization
Simplifies the code, properly detects too long socket paths,
drops unused parameter.
2012-06-22 13:23:03 +02:00
Alasdair Kergon
f1aabd5c60 Set delay_resume_if_new on deptree snapshot origin.
(Must avoid activating snapshot origin more than once concurrently.)
2012-05-15 21:27:24 +00:00
Alasdair Kergon
61712a1f0d add major:minor to table size changed debug message 2012-05-15 20:03:12 +00:00
Alasdair Kergon
b96c213356 indicate when deptree detects but ignores size change in debug msg 2012-05-15 14:10:54 +00:00
Zdenek Kabelac
462de06d96 Return success for deactivation of thin pool
if the thin_check fail on thin pool - still return successful deactivation,
since lvremove would currently fail.

TODO: find some way to not run check with lvremove.
2012-03-04 17:36:23 +00:00
Zdenek Kabelac
b3103ef328 Remove part of FIXME
(and reindent a code below)
2012-03-04 16:05:42 +00:00
Zdenek Kabelac
7162a25b0b Support 16GB for thin pool metadata
Add some hack math to allow 16GB devices to be passed as thinpool metadata.
Since kernel has put in limit to not allow which are just bigger then
some predefined constant in kernel but not matching 16GB so any device bigger
is rejected.

FIXME: Current code still might need more tweaks to be more generic.
2012-03-02 21:53:17 +00:00
Zdenek Kabelac
4bcaf8086e Purge remaining trim bits from code 2012-03-02 21:43:26 +00:00
Zdenek Kabelac
7e35dfff3d Added dm_tree_node_set_callback() for preload and deactivation hooks
Run users hook after preload for the node is finished,
or after the node has been deactivated.
2012-03-02 17:31:21 +00:00
Zdenek Kabelac
6a5706a3a5 Remove support for TRIM message
It's been unsupporte for now - and it's not going to be
implemented for thin pool kernel driver - so dropping
appearence of TRIM from libdm and lvm.
2012-03-02 13:26:08 +00:00
Jonathan Earl Brassow
ad48a46fc9 Make conversion from a synced 'mirror' to 'raid1' not cause a full resync.
It was not possible to pass down the DM_[FORCE|NO]SYNC flags to
'dm_tree_node_add_raid_target'.  This meant that converting to 'raid1' from
'mirror' would cause a full resync.  (It also meant that '--nosync' was
ineffective when creating a 'raid1' LV.)

I've taken the 'reserved' parameter in 'dm_tree_node_add_raid_target' and
used it for the "flags" parameter.  Now it is possible to pass the sync
flags and any other flags that may come up.
2012-02-13 20:13:39 +00:00
Zdenek Kabelac
4d95ccc696 Check for deps pointer before dererence
As _deps() call may return NULL - check for it.
2012-02-10 14:48:28 +00:00
Zdenek Kabelac
3b5834d78b Add validation of name and uuid
Do not accept NULL pointers.
2012-02-10 14:42:28 +00:00
Zdenek Kabelac
a6292f2a6d Remove unneeded assignments
Variables have (or will have) those values set.
2012-02-08 11:36:18 +00:00
Zdenek Kabelac
fc5c61df97 Ensure whole info is initialised
Since _create_dm_tree_node is copying whole structure,
make sure all members are initialized.
2012-01-25 21:50:50 +00:00
Zdenek Kabelac
4173a22832 Thin send messages on activation resume code path
Using PRELOAD part would lead to problems when the problem
would happen before vg_write and vg_commit.
Also this change is necessary for snapshot creation sequence.
2012-01-25 08:46:21 +00:00
Alasdair Kergon
5c9eae9647 Reorder fns in libdm-deptree.
Tweak dm_config interface and remove FIXMEs.
2012-01-23 17:46:31 +00:00
Zdenek Kabelac
9568f1b5c3 Thin handle empty thin volume case
Report both values as 0 in case the volume is unused.
2012-01-19 15:22:32 +00:00
Zdenek Kabelac
5fd459f0ab Thin use consistentely metadata
Do not shortcut to 'meta' and stay with 'metadata'
Also matches kernel doc for dm API then.
2012-01-19 15:21:23 +00:00
Alasdair Kergon
2e5ff5d11c Add dm_uuid_prefix/dm_set_uuid_prefix for non-lvm users to override hard-coded
LVM- prefix.

Try harder not to leave stray empty devices around (locally or remotely) when
reverting changes after failures while there are inactive tables.
2012-01-10 02:03:31 +00:00
Zdenek Kabelac
077c4d1a35 Add Thin API for parsing thin status
Add dm_get_status_thin_pool and dm_get_status_thin functions to
parse 'params' argument which is received via dm_get_next_target.

Returns filed structure allocated from given mempool.
2011-12-21 12:52:38 +00:00
Zdenek Kabelac
6744c143a5 Thin remove unused define
Remove DM_THIN_ERROR_DEVICE_ID from API.
Remove API warning.
Drop code that was using DM_THIN_ERROR_DEVICE_ID (already commented)
Remove debug message which slipped in through some previous commit.
2011-11-12 22:44:10 +00:00
Zdenek Kabelac
19e3f8c30b Thin fix condition check for transation_id
id2 must be checked.
(missed in yesterday commit set).
2011-11-04 12:39:45 +00:00
Zdenek Kabelac
2e732e9628 Thin api change for passing message into libdm
Avoid exposing another struct to the libdm user and
use only simple dm_tree_node_add_thin_pool_message with
2 overloaded uint64_t values.
2011-11-03 14:45:01 +00:00
Zdenek Kabelac
4d25c81bdd Thin api change for dm_tree_node_add_thin_target
A little code shuffling and adding support for
DM_THIN_ERROR_DEVICE_ID which might be eventually be used
for activation of thin which is going to be deleted.
For now we do not need it lvm.
2011-11-03 14:43:21 +00:00
Zdenek Kabelac
25de9addb6 Thin fix compile warns
Test for dm_snprintf < 0.
Add header for moved backup.
2011-10-30 22:52:08 +00:00
Zdenek Kabelac
bbcd37e4b8 Thin segment transaction_id moved
Add a new node flag send_messages that is used to simplify
test when to call _node_send_messages().

Add call to _node_send_messages when pool is deeper in the tree.
2011-10-30 22:04:57 +00:00
Zdenek Kabelac
c590a9cdbc Thin error messages clenaup and some indent 2011-10-28 20:19:26 +00:00
Zdenek Kabelac
4ce43894d2 Trying to fix the retry logic
There should be no need for retry for our internal devices - it would be hinding
our own bug in the tree processing.
Update error messages to show also also device name.
No WHATS_NEW - in release fix.
2011-10-28 20:11:21 +00:00
Zdenek Kabelac
3d6782b3ff Just replace stack, return 0 with return_0 2011-10-20 10:39:07 +00:00
Zdenek Kabelac
ac08d9c028 Add last param 0 for thin-pool
So now the table suppression works for thin-pool.
2011-10-20 10:35:55 +00:00
Zdenek Kabelac
e9156c2bb9 Adapt to thin kernel target API
Since kernel target uses low_water_mark - use this name in libdm as well.
2011-10-20 10:33:30 +00:00
Zdenek Kabelac
7b199dc599 Use const pointers in thin API were appropriate 2011-10-20 10:31:27 +00:00
Zdenek Kabelac
3f53c059e9 Add _BLOCK_ to define
Use DM_THIN_MIN_DATA_BLOCK_SIZE and
DM_THIN_MAX_DATA_BLOCK_SIZE to make it more obvious, for which
this define is useful in thin API.
2011-10-20 10:28:41 +00:00
Zdenek Kabelac
2a0d806b3c Use structure copy
Since the code evolved a bit with current structures we could use C to
copy struct members.
2011-10-19 16:45:02 +00:00
Zdenek Kabelac
759b9592ba Update error message
Drop INTERNAL_ERROR from public API functions.
Improve some messages.
2011-10-19 16:42:14 +00:00
Zdenek Kabelac
11f64f0aeb Use generic name for message sending function
Drop _thin_pool prefix for _node_send_message so it could be extended later.
Replace current_id with trans_id name.
2011-10-19 16:40:59 +00:00
Zdenek Kabelac
97d0f72c92 Just indent changes
Some tabs & spaces.
2011-10-19 16:36:39 +00:00
Zdenek Kabelac
660a42bc78 Add internal expected_errno dm_tast var
Certain errno codes could be expected in some situations thus
add experimental support for them.

When expected errno is set after ioctl error - function skips error
printing and exits succefully.

Currently only useful for thin pool messages.
2011-10-19 16:36:01 +00:00
Zdenek Kabelac
25e6ab87d8 Add thin_pool dm message support
Experimental support for kernel message via resume sequence.
2011-10-17 14:16:25 +00:00
Zdenek Kabelac
5668fe04d9 Add _thin_validate_device_id 2011-10-17 14:15:26 +00:00
Zdenek Kabelac
5668fd6a7a Swap parameters
Use metadata uuid first (match kernel target).
2011-10-17 14:15:01 +00:00
Zdenek Kabelac
df6b1b8fe6 Drop old check for transaction_id
(revert)
2011-10-17 14:14:33 +00:00
Milan Broz
ad2432dc68 Fix alignment warning in bitcount calculation for raid segment. 2011-10-17 13:15:35 +00:00
Zdenek Kabelac
0395dd2250 Use pool for dm_tree allocation
Using the same pool allocation strategy as we use for vg,
so dm_tree structure is part of the pool itself.
2011-10-14 13:34:19 +00:00
Jonathan Earl Brassow
83c606ae30 This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs).  This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.

A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed.  The segments of the image which is not split off to form the new
LV are transferred to the top-level LV.  This is done so that the original
LV can maintain its major/minor, UUID, and name.  The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed.  (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment.  If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev.  This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs.  This is especially bad during failure
conditions.

When the mirror is suspended, the top-level along with its sub-LVs are
suspended.  The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed.  When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies.  This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
	_add_dev
	dm_tree_add_dev_with_udev_flags
	<*** DM / LVM divide ***>
	_add_dev_to_dtree
	_add_lv_to_dtree
	_create_partial_dtree
	_tree_action
	dev_manager_activate
	_lv_activate_lv
	_lv_resume
	lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure.  Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags.  This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure.  In fact,
'_add_dev' simply reuses the udev_flags given for the initial device!  This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism.  The current code provides no such
mechanism.  Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags.  The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.

My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children.  It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies.  It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.

This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 14:45:40 +00:00
Zdenek Kabelac
565a4bfc49 Move defines to header
Make limits for thin data_block_size and device_id part of public API.

FIXME: read them possible from some kernel header file in the future ?
But we may need to support different values for different versions ?
2011-10-06 11:05:56 +00:00
Zdenek Kabelac
460c599143 Name changes
typo zeroeing->zeroing
add size low_water_mark->low_water_mark_size so it's more obvious its sector
related variable.
2011-10-04 16:22:38 +00:00
Zdenek Kabelac
e0ea24be1f Add intial code to check transaction_id
Fix typy in transaction_id.
Add this as node property, so it could be easily checked on resume.

Code is not yet finished.
2011-10-03 18:34:52 +00:00
Zdenek Kabelac
a5a31ce947 Move priority check in front
Just a minor code mode - make a test for priority before
more complex uuid checks.
2011-10-03 18:29:48 +00:00
Zdenek Kabelac
9a8f192a38 Update error path tracing for _resume_node
dm_task_create & dm_task_set_name produces it's own log_error
Add missing stacks for dm_task_set_cookie, dm_task_run,
dm_task_get_info.
2011-10-03 18:28:25 +00:00
Zdenek Kabelac
1419bf1c98 Transaction_id is property of thin_pool
Remove Transaction_id from thin target.
Store device_id for thin target.
2011-10-03 18:26:07 +00:00
Zdenek Kabelac
4251236efc Add supporting function for thinp
New dm_tree_node_add_thin_pool_target() and  dm_tree_node_add_thin_target()
This API is highly experimental and unstable for now.
2011-09-29 08:53:48 +00:00
Zdenek Kabelac
ee05be0872 Just add warning about potential problem exteding dm_segtypes
Since raid target is using now dm_segtypes also for search purpose.
2011-09-29 08:50:54 +00:00
Alasdair Kergon
10d0d9c7c4 Introduce revert_lv for better pvmove cleanup.
(One further fix needed to remove the stray pvmove LVs left behind.)
2011-09-27 22:43:40 +00:00
Peter Rajnoha
c3e5b4976d Add log_error even for general device in use when we can't do the sysfs checks. 2011-09-26 10:17:51 +00:00
Peter Rajnoha
787200efd6 Add dm_tree_retry_remove to use retry logic for device removal in a dm_tree. 2011-09-22 17:36:50 +00:00
Peter Rajnoha
125712bea0 Replace open_count check with holders/mounted_fs check on lvremove path.
Before, we used to display "Can't remove open logical volume" which was
generic. There 3 possibilities of how a device could be opened:
  - used by another device
  - having a filesystem on that device which is mounted
  - opened directly by an application

With the help of sysfs info, we can distinguish the first two situations.
The third one will be subject to "remove retry" logic - if it's opened
quickly (e.g. a parallel scan from within a udev rule run), this will
finish quickly and we can remove it once it has finished. If it's a
legitimate application that keeps the device opened, we'll do our best
to remove the device, but we will fail finally after a few retries.
2011-09-22 17:33:50 +00:00
Zdenek Kabelac
beecb1e160 Remove unused passed parameters 2011-09-07 08:37:48 +00:00
Alasdair Kergon
40e5fd8b3a spaces->tabs 2011-08-19 17:02:48 +00:00
Alasdair Kergon
415c0690af restrict dm_tree_node_add_null_area 2011-08-19 16:26:02 +00:00
Jonathan Earl Brassow
f439e65b64 Add support for m-way to n-way up-convert in RAID1 (no linear to n-way yet)
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way.  It does not yet support upconverting linear to n-way.

The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild.  This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'.  (There is no
mirror-on-mirror layering required.)
2011-08-18 19:41:21 +00:00
Jonathan Earl Brassow
6d04311efa Add the ability to split an image from the mirror and track changes.
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access.  The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed.  This
operation can be thought of as a partial split.  The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap.  The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed.  The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
	~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
	~> lvconvert --merge vg/lv_rimage_<n>

The internal mechanics of this are relatively simple.  The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'.  This is what will be used if a partial activation of an array
is ever required.  (It would also be possible to use 'error' targets in
place of the '- -'.)  If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array.  So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only.  To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
2011-08-18 19:38:26 +00:00
Jonathan Earl Brassow
b2fa9b43dc Add some log_error msg's and fix potential segfault
Thanks to kabi for spotting these - especially the possibility for
segfault if a loop runs all the way through without finding a match.
2011-08-11 19:17:10 +00:00
Jonathan Earl Brassow
cac52ca4ce Add basic RAID segment type(s) support.
Implementation described in doc/lvm2-raid.txt.

Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
2011-08-02 22:07:20 +00:00