1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-18 10:04:20 +03:00

231 Commits

Author SHA1 Message Date
Zdenek Kabelac
a900d150e4 thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:

* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.

* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).

* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)

* Full thin-pool suspend happened, when taken a thin-volume snapshot.

With this patch the new method relocates message passing into suspend
state.

This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.

Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.

When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.

This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.

Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.

Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.

Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.

Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.

Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-03 16:13:14 +02:00
Zdenek Kabelac
5bef18f2eb libdm: support for posting messages in suspend
Add support for sending message in suspend tree for thin-pools.
When this operation is requested whole subtree suspend is then skipped.

This is experimantal support for new lvm2 code for sending message
in suspend phase where 'thin-pool origin-only suspend' will send
messages instead of really suspending thin-pool tree.

When suspening thin volume origin-only - only thin volume is suspended,
then messages are posted and thin-pool suspend is skipped.
2015-07-03 16:13:14 +02:00
Zdenek Kabelac
21c0b1134f libdm: enhance tracing messages
Use new _node_name() and print name major:minor for thin-pool device.
2015-07-01 13:44:28 +02:00
Zdenek Kabelac
04ae5007e3 libdm: add helper function to print _node_name
_node_name() prepares into dm_tree internal buffer device
name and it (major:minor) for easy usage for debug messages.

To avoid any allocation a small buffer in struct dm_tree is preallocated
to store this message.
2015-07-01 13:41:40 +02:00
Zdenek Kabelac
69132f55ea libdm: add dm_tree_node_set_thin_pool_read_only
Support thin-pool tree node with activation in read-only mode.
(Native kernel API).
2015-06-18 15:15:39 +02:00
Zdenek Kabelac
9a06ae7b35 libdm: better debug message
Print reason for failing ioctl if thin pool message fails.
2015-06-15 14:48:04 +02:00
Zdenek Kabelac
5232fd13f3 cleanup: cast minor to dev_t
Let the arithmetic run with a single dev_t type (Coverity).
2015-05-08 15:15:10 +02:00
Zdenek Kabelac
2908ab3eed thin: errrorwhenfull support
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed  (F) also as health
attribute.)
2015-01-14 14:52:05 +01:00
Zdenek Kabelac
20b22cd023 libdm: still better API
Do not use 'any' policy name as a value in config tree - so we stick
with 'policy_settings' and extra 'policy_name' for libdm params.

Update lvm2 API as well.

Example of supported metadata:

 policy = "mq"
 policy_settings {
      migration_threshold = 2048
      sequential_threshold = 512
      random_threshold = 4
      read_promote_adjustment = 10
 }
2014-11-11 00:54:03 +01:00
Zdenek Kabelac
f12e3da639 cleanup: gcc warnings 2014-11-10 22:05:49 +01:00
Zdenek Kabelac
824019531c libdm: tunning cache API
Support new PASSTHROUGH 'feature' flag.

Add dm_config_node to pass in policy args.

Really use origin_uuid instead of using extra call
to pass seg_areas.

Switch to 64bit feature flag bit set so there is
enough space in future for new bits...
2014-11-10 22:05:48 +01:00
Zdenek Kabelac
89233544e0 libdm: allow to activate any pool with tid == 0
When transaction_id is set 0 for thin-pool, libdm avoids validation
of thin-pool, unless there are real messages to be send to thin-pool.
This relaxes strict policy which always required to know
in front transaction_id for the kernel target.

It now allows to activate thin-pool with any transaction_id
(when transaction_id is passed in)

It is now upto application to validate transaction_id from life
thin-pool volume with transaction_id within it's own metadata.
2014-11-04 15:28:00 +01:00
Zdenek Kabelac
8f518cf197 libdm: add check transaction_id after message
Add extra safety detection for thin pool transaction id
and query pool status after confirmed message.

In case there is a missmatch, immeditelly abort further
processing.
2014-08-26 14:12:20 +02:00
Alasdair G Kergon
7cff640d9a activation: Fix upgrades using uuid suffixes.
2.02.106 added suffixes to some LV uuids in the kernel.

If any of these LVs is activated with 2.02.105 or earlier,
and then a later version is used, the LVs appear invisible and
activation commands fail.

The code now has to check the kernel for both old and new uuids.
2014-07-30 21:55:11 +01:00
Jonathan Brassow
442820aae3 activation: Remove empty DM device when table fails to load.
As part of better error handling, remove DM devices that have been
sucessfully created but failed to load a table.  This can happen
when pvmove'ing in a cluster and the cluster mirror daemon is not
running on a remote node - the mapping table failing to load as a
result.  In this case, any revert would work on other nodes running
cmirrord because the DM devices on those nodes did succeed in loading.
However, because no table was able to load on the non-cmirrord nodes,
there is no table present that points to what needs to be reverted.
This causes the empty DM device to remain on the system without being
present in any LVM representation.

This patch should only be considered a partial fix to the overall
problem.  This is because only the device which failed to load a
table is removed.  Any LVs that may have been loaded as requirements
to the DM device that failed to load may be left in place.  Complete
clean-up will require tracking those devices which have been created
as dependencies and removing them along with the device that failed
to load a table.
2014-05-28 10:17:15 -05:00
Zdenek Kabelac
bfbf6b7c12 cleanup: libdm drop already zeroed elements
Drop zeroing of zalloc-ed memory.
2014-04-08 11:00:16 +02:00
Zdenek Kabelac
6190ded5f1 libdm: simplify segtype search
For cache target use directly SEG_CACHE.
Hide dm_segtypes as internal static variable _dm_segtypes,
since noone is supposed to use it.
2014-04-08 11:00:13 +02:00
Zdenek Kabelac
bd2500e62e libdm: track implicit dependecies
When the node enters dtree with implicit dependency, it
automatically has udev flags from parent node
and could not be changed later when the node has been
entered again via i.e lvm's preload tracking.

Resolve this by tracking whether the node has been
created by implicit dependency tracking or has been
entered explicitely. Implicit node could be later
upgraded by an explicit _add_dev() with proper udev_flags.

For implicit devices add special udev flags to avoid
any scan and udev rule processing if we resume such device.

Patch allows easier removing of orphan nodes.
2014-04-08 11:00:12 +02:00
Zdenek Kabelac
e2ea3cd7ba cleanup: cache use const char policy
Policy should be const char pointer.
2014-04-01 20:54:09 +02:00
Zdenek Kabelac
a920bc1a40 cleanup: indent, drop unneeded braces 2014-02-24 21:13:35 +01:00
Zdenek Kabelac
203affffc7 libdm: enhance thin transaction_id validation
Reuse _node_send_messages for just checking
for valid transaction_id with preload.

This allows earlier detection of incosistent thin pool.

Code does the same thing, except for sending messages.
2014-02-24 21:06:31 +01:00
Zdenek Kabelac
c7b7cb60e4 libdm: hardening transaction_id validation
Improve testing of transation_id to not allow other difference
then either kernel TID is equal or is lower by oned and there
are queued messages for transaction.

Mark messages as submitted if the transaction_id is already matching.

Do not try to deactivate node on failure here and leave it on
proper error path of the caller.
2014-02-24 21:04:50 +01:00
Zdenek Kabelac
6116333ccc libdm: proper traversion of revert list
Deactivation of top level node has to happen,
before traversing subtree.

Swap list logic and rather append new nodes to the head
and then use normal iteration.

(in-release update)
2014-02-24 21:01:59 +01:00
Zdenek Kabelac
1911c61639 libdm: call preload callback only when success
Do not call node's preload callback, if there is
any failure during preload.
2014-02-24 21:01:13 +01:00
Zdenek Kabelac
c132fc3ff6 libdm: drop unneded assignment 2014-02-24 20:59:10 +01:00
Zdenek Kabelac
6e2f706233 cleanup: use struct initializer 2014-02-15 11:36:53 +01:00
Zdenek Kabelac
a508786664 cleanup: indent spaces 2014-02-15 11:36:53 +01:00
Zdenek Kabelac
c651c614ec cache: using unsigned argc
Convert using unsigned for _argc.
2014-02-15 11:36:53 +01:00
Zdenek Kabelac
da268eb4cc cache: convert libdm to use plain function call
Avoid introducing libdm structure allocated in library user.
Use direct call with all currently supported args.
When new arg is added, new function will cover it.
2014-02-15 11:36:53 +01:00
Zdenek Kabelac
7ec8e691c4 libdm: use 64bit type for raid index
Used properly signed 64bit constant for shifting.
2014-02-15 11:36:37 +01:00
Jonathan Brassow
df181cc51e cache: Add DM interface for retrieving a cache's status
This patch defines a structure for holding all of the device-mapper
cache target's status information.  The associated function provides
an easy way for higher levels (LVM) to consume the information.

This patch finishes the device-mapper interface for the cache and
cachepool segment types (i.e. the cache target).
2014-01-27 05:30:42 -06:00
Jonathan Brassow
1ff7e214e0 cache: New 'cache' segment type
This patch adds the cache segment type - the second of two necessary
to create cache logical volumes.  This segment type references the
cachepool (the small fast device) and the origin (the large slow device);
linking them to create the cache device.  The cache device is the
hierarchical device-mapper device that the user ulitmately makes use
of.

The cache segment sources the information necessary to construct the
device-mapper cache target from the origin and cachepool segments to
which it links.
2014-01-27 05:29:35 -06:00
Zdenek Kabelac
0638d1d82e libdm: preload revert after failing callback
Revert activated volumes if callback fails.
This is currently used only for thin_check failure support.

When thin_check detects failure in thin metadata device, it deactivate
volumes in reversed order that have been preloaded for thin pool activation.
After this change lvm command will not leave active pool subvolumes
in dm table.
2014-01-17 10:48:49 +01:00
Zdenek Kabelac
d98511c717 cleanup: indent 2014-01-17 10:48:49 +01:00
Zdenek Kabelac
af7297c73e libdm: pass dnode to callback
Pass dnode  pointer instead of rather unknown child pointer.
The pointer is currently unused and passing child pointer
is quite undefined, while dnode has at least some usability.
2014-01-08 11:57:43 +01:00
Jonathan Brassow
ca51435153 Misc/RAID: Enable resume_lv to handle some renaming conflicts.
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps.  For
example, if there is a 3-way mirror:
	[0][1][2]
and we remove device#0, the devices will be shifted down
	[1][2]
and renamed.
	[0][1]

This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though.  This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name.  The solution is to check for a conflicting name before
attempting to rename.  If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.

Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV.  This "hack"
only worked for single machine use-cases of LVM.  Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
2013-09-09 15:07:28 -05:00
Alasdair G Kergon
83fb622598 deptree: don't remove live node on resume failure
When resuming a node needed by a higher layer of the tree,
if the resume fails, only remove it if the node did not
originally have a live table.

Ref. 97f8454eccefe29464336ba1823448f4d1fa009b
2013-07-23 13:33:35 +01:00
Zdenek Kabelac
5658ec2bdc libdm: thin pool target sends messages once
Clear send_messages flag when they have been delivered successfully.
There is no need to validate it for all other activations of the same
node in the dm_tree.

Also add extra debug message which shows the reason for skipping
sending of messages because the transaction_id has already the matching
value.
2013-07-15 15:45:28 +02:00
Zdenek Kabelac
47419d21ac cleanup: stack usage
Shortening code with macros return_0, return_NULL.
Add some missing stack prints in error paths.
2013-07-01 23:11:14 +02:00
Jonathan Brassow
8ac9791c36 RAID: s/int/uint32_t for dev_count in dm_status_raid struct
Device count is never negative.  Change 'dev_count' to be
uint32_t instead of int.
2013-06-17 12:58:38 -05:00
Zdenek Kabelac
861fd1108f libdm: move thin max size to header
Move max size of thin metadata into define.
Increase a bit the size to match the kernel size.
(16978542592->17112760320)
2013-06-11 14:21:00 +02:00
Jonathan Brassow
562c678ee2 DM RAID: Add ability to throttle sync operations for RAID LVs.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs.  The options are available for 'lvcreate' and
'lvchange' and are as follows:
  --minrecoveryrate <Rate> [bBsSkKmMgG]
  --maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device.  If a suffix is not given,
kiB/sec/device is assumed.  Setting the rate to 0 removes the preference.
2013-05-31 11:25:52 -05:00
Zdenek Kabelac
e4dfa785d1 libdm: compensate suspend counter for live table
This patch may not be fully correct. It tries to solve
the imbalanced suspend counter.

The problem starts when some LV is created and fails in resume path.
(i.e. resuming to large PV (enforced) over small loop devices)

This fails in _resume_node() after dm_task_run(). And while
existing device with empty table is left in inactive table,
further calls are reporting this device is in suspend state.

When later the lvm2 tries to rollback created device and deactivate it,
it will end with internal error, when we try to decrement
never incremented suspend counter.

As an 'easy fix' for now update suspend counter only for live nodes.

TODO: explore better fix.
2013-05-30 17:35:23 +02:00
Zdenek Kabelac
cb587fd100 libdm: free mem pool on err path
Since we use get_status also in dmeventd, which may use one pool
for a single device, in case it would be repeatedly returning error,
it may not be freeing the pool and would cause slow but steady growth.
To stay safe in the error path release any allocated memory.
2013-05-27 10:30:55 +02:00
Zdenek Kabelac
4707ac7200 libdm: add dm_get_status_snapshot
Add dm_get_status_snapshot() for parsing snapshot status.
2013-05-27 10:30:51 +02:00
Zdenek Kabelac
3ba3bc0d66 cleanup: drop backtrace
After log_error/log_warn there is no point to show <backtrace>
in debug log trace from the next code line.
2013-05-27 10:28:32 +02:00
Jonathan Brassow
2e0740f7ef RAID: Add writemostly/writebehind support for RAID1
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics.  The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value.  If no trailing
character is given, it will set the flag.
Synopsis:
        lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
        lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv

The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set.  It is signified with a 'w'.  If the device
has failed, the 'p'artial flag has priority.

Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   Rwi---r-m    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-w    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r--    1 linear   4.00m

Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   rwi---r-p    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-p    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r-p    1 linear   4.00m

A new reportable field has been added for writebehind as well.  If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--     512
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--

Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--
2013-04-15 13:59:46 -05:00
Jonathan Brassow
faeea37057 RAID: Revert previous commit that allowed identical table loads.
Revert commit 31c24dd9f2ad7b5f7913a18c9f11a00d7b3474a1.  This commit
was used to force a RAID device-mapper table to be loaded into the
kernel despite the fact that it was identical to the one already
loaded.  The effect allowed a RAID array with a transiently failed
device to refresh and reintegrate the failed device.  This operation
is better done in the kernel on a 'resume'.  Since,
'lvchange --refresh' already performs a suspend/resume cycle, the
above commit is not needed once the kernel change is made.  Reverting
the commit removes an unnecessary (at least for now) change to the
device-mapper interface.
2013-04-11 15:57:14 -05:00
Jonathan Brassow
38f8f4a958 RAID: Capture new RAID kernel sync_action status fields
I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0.  It is backwards compatible with the
old status line - initializing the new fields to '0'.  The new
structure is also more amenable to future changes.  It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features.  It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable).  This allows future fields to be appended.

The new fields that are available are:
 - sync_action : shows what the sync thread in the kernel is doing
                 (idle, frozen, resync, recover, check, repair, or
                 reshape)
 - mismatch_count: shows the number of discrepancies which were
                   found or repaired by a "check" or "repair"
                   process, respectively.
2013-04-08 15:04:08 -05:00
Zdenek Kabelac
3fd0242a0a libdm: validate params for NULL
Validate passed params and report error
instead of dereferencing NULL passed argument.
2013-04-05 14:13:12 +02:00