1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-06 17:18:29 +03:00
Commit Graph

572 Commits

Author SHA1 Message Date
Zdenek Kabelac
1bd4b0059b cleanup: use display_percent
Replace occurence of %.2f with call of display_percent function.
2017-06-24 17:44:42 +02:00
Zdenek Kabelac
1095322901 thin: properly check for status for max sizes metadata
When metadata LV size was over DM_THIN_MAX_METADATA_SIZE sectors,
the info() routine was incorrectly trying to match bigger size,
while we do never pass any bigger device.

Fixing a case, where lvs should be displaying status for metadata
LV with 16GB size.
2017-04-12 21:34:08 +02:00
Heinz Mauelshagen
547bdb63e1 dev_manager: remove reshape message
The dm-raid target doesn't support a "reshape" message.
2017-03-03 22:10:21 +01:00
Heinz Mauelshagen
e2354ea344 lvconvert: add infrastructure for RaidLV reshaping support
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces infrastructure prerequisites to be used
by raid_manip.c extensions in followup patches.

This base is needed for allocation of out-of-place
reshape space required by the MD raid personalities to
avoid writing over data in-place when reading off the
current RAID layout or number of legs and writing out
the new layout or to a different number of legs
(i.e. restripe)

Changes:
- add members reshape_len to 'struct lv_segment' to store
  out-of-place reshape length per component rimage
- add member data_copies to struct lv_segment
  to support more than 2 raid10 data copies
- make alloc_lv_segment() aware of both reshape_len and data_copies
- adjust all alloc_lv_segment() callers to the new API
- add functions to retrieve the current data offset (needed for
  out-of-place reshaping space allocation) and the devices count
  from the kernel
- make libdm deptree code aware of reshape_len
- add LV flags for disk add/remove reshaping
- support import/export of the new 'struct lv_segment' members
- enhance lv_extend/_lv_reduce to cope with reshape_len
- add seg_is_*/segtype_is_* macros related to reshaping
- add target version check for reshaping
- grow rebuilds/writemostly bitmaps to 246 bit to support kernel maximal
- enhance libdm deptree code to support data_offset (out-of-place reshaping)
  and delta_disk (legs add/remove reshaping) target arguments

Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
2017-02-24 05:20:58 +01:00
Zdenek Kabelac
12bbfbe89d comment: update
Fix can -> cannot.
2017-02-05 17:55:37 +01:00
Zdenek Kabelac
fb3f4ed72d cleanup: rename to use track_ prefix
Since we use 'track_' prefix for other deps tracking,
convert skip_external_lv to use same logical meaning.
(just converts  1->0  0->1)
2017-02-05 17:55:37 +01:00
Zdenek Kabelac
1f5dde38a7 cleanup: more use of lvseg_name
Use existing function lvseg_name().
2017-01-03 14:55:16 +01:00
Zdenek Kabelac
8c17233af5 debug: add debug message showing new lv
Make trace easier to follow knowing which LV was added to dtree.
2016-12-18 19:38:51 +01:00
Zdenek Kabelac
034931f68d activate: further _info API refinement
Another cleanup of internal _info() API simplifying code.
Also make sure 'error' on _info() call is properly passed upward
(return 0 is error path).
2016-12-18 19:38:51 +01:00
Zdenek Kabelac
69434c2eca cache: improve activation with -real
When cache volume may be converted from normal to -real layer LV
we need to improve logic for call cache_check.

With this patch, we register call for cache_check only when metadata LV
is not yet present in active table slot (should match initial table
load).
This avoids unwanted checking when cache would become layer device
online.
2016-12-18 19:30:50 +01:00
Zdenek Kabelac
bdfc96cb08 raid: fix activation of tracked image
Activation of raid has brough up also splitted image with tracing
(without taking lock for this).

So when raid is now activate - such image is not put into
table (with _rmeta).  When user needs such device, just active it.
2016-12-18 19:10:38 +01:00
Zdenek Kabelac
3331199cc9 cleanup: simplier code 2016-12-05 17:12:42 +01:00
Zdenek Kabelac
81ef4eb4f8 cleanup: indent and messsages updates 2016-12-05 17:12:42 +01:00
Zdenek Kabelac
114f7e6285 dev_manager: use setup_task_run for mknod
Simplify info run for use only for INFO & STATUS.
Drop handling MKNODES within _info_run() call
and use more advanced _setup_task_run() directly.

This allows to further simplify _info_run().
2016-12-05 17:12:39 +01:00
Zdenek Kabelac
5163b8f697 dev_manager: extend setup_task
Integrate also query for inactive table and
handle dm_task_run() and dm_task_get_info()
(thus switching to setup_task_run)

Add one exception case for DM_DEVICE_TARGET_MSG.

This allows further shortening and simplification of all
other users of this function.
2016-12-05 17:11:49 +01:00
Zdenek Kabelac
ed93f0973a activation: lv_info_with_seg_status unify status selection
Start moving selection of status taken for a LV into a single place.
The logic for showing info & status has been spread over multiple
places and were doing too complex decision going agains each other.

Unify selection of status of origin & cow scanned device.

TODO: in future we want to grab status for LV and layered LV and have
both statuses present for display - i.e. when 'old snapshot'
of thinLV is takes and there is ongoing merge - at some moment
we are not capable to show all needed info.
2016-12-05 17:09:13 +01:00
Zdenek Kabelac
5ba2d58d28 activation: improve error handling for status reading
When lvm2 wants to see a status, it needs to validate,
segment for status reading is matching whan lvm2 expects in
metadata.

Also ensure status failure will not cause '0' from info reading
when actual info was collected properly.
Failure in 'status' reading is considered to be
a 'log_warn()' event only.
2016-12-05 17:05:17 +01:00
Zdenek Kabelac
4a4b22e114 activation: status check switch to warn
When we can't parse status, switch to warning as this is not
considered an errornous case.  LVS is not supposed to return
error status code when  device is not what it's been expected to
be - but it should be WARNING a user there is something unexpected.
2016-12-05 17:04:24 +01:00
Zdenek Kabelac
1a4f13eb6e cleanup: add some dots and use display_lvname
Just some more VG/LV printing.
2016-11-25 15:01:27 +01:00
Zdenek Kabelac
1d58074d9f debug: more stacktrace corrections
Continue previous patch dropping some unneeded stack traces
after printed log_error/warn messages.
2016-11-25 14:58:28 +01:00
Zdenek Kabelac
a7691cdebb cleanup: debug trace and indent change 2016-11-11 16:58:20 +01:00
Zdenek Kabelac
1bdcb01f63 cleanup: add dots to debug messages 2016-11-11 16:58:20 +01:00
Zdenek Kabelac
d8fc4d093e conf: support zero for missing_stripe_filler
Make it easier to replace missing segments with 'zero' returning
target - otherwise user would have to create some extra target
to provide zeros as /dev/zero can't be used (not a block device).

Also break code loop when segment is found and make it an INTERNAL_ERROR
where it's missing.
2016-11-11 16:58:16 +01:00
Zdenek Kabelac
05dc70e22e cache: fixed copy&paste from last commit 2016-09-09 21:26:38 +02:00
Zdenek Kabelac
454b891f6d cache: fix reporting of dirty cache
When cache has zero used blocks it's been wrongly reported as 100.00% dirty.
Fix it and report 0.00.
2016-09-09 20:53:36 +02:00
Zdenek Kabelac
458918b319 thin: fix link validation for unused thin-pool
This patch fixes link validation for used thin-pool.
Udev rules correctly creates symlinks only for unused new thin-pool.
Such thin-pool can be used by foreing apps (like Docker) thus
has  /dev/vg/lv link.
However when thin-pool becomes used by thinLV - this link is no
longer exposed to user - but internal verfication missed this
and caused messages like this to be printed upon 'vgchange -ay':

The link /dev/vg/pool should have been created by udev but it was not
found. Falling back to direct link creation.

And same with 'vgchange -an':
The link /dev/vg/pool should have been removed by udev but it is still
present. Falling back to direct link removal.

This patch ensures only unused thin-pool has this link.
2016-07-01 00:44:46 +02:00
Zdenek Kabelac
f7f395667e lvstatus: enhance seg_status to handle snapshot
Add more code to properly store status for snapshot segment
maintaining lvm2 fiction of COW and snapshot internal volumes.

The key issue here is however not though-through reporting
logic - as there is no single answer for whole line state.
It not counting with layer and we may need few more ioctl to
cover all reporting needs depending upon what is actually
needed.

In reality we need to 'cache' more ioctl status queries for
individual LVs and their segments (so they checked at most once).

The other 'hard' topic for conversion is mirror segment handling.

Also we definitelly need to relocate some logic into segment's methods,
yet it might be complex as we have not clear border between targets.

TODO: define more clearly how are reporting fields defined in case
we 'stack' volumes like   -   cache of stacked  thin LV snapshot origin.
2016-05-27 15:47:24 +02:00
Zdenek Kabelac
a4f8d1165c setup_task: add with_flush
To get better control when flushing is used add extra arg when
setting up dm task.

By default now check dm device status without flush.
(At this moment this should effect only thin and cache volumes).

Also switch dev_manager_thin_pool_status() to use more
readable 'flush' parameter instead of 'no_flush'.
2016-05-27 15:47:24 +02:00
Alasdair G Kergon
b896f7de1e raid0: Standardise meta_areas checks before access. 2016-05-23 22:55:13 +01:00
Alasdair G Kergon
bf8d00985a raid0: Add raid0 segment type.
This remains experimental and quite restrictive so should only be used
for testing at this stage.  (E.g. lvreduce is not supported.)
2016-05-23 16:46:38 +01:00
Zdenek Kabelac
9c083d34af debug: use display_lvname
Add some tracing message
2016-05-19 18:40:14 +02:00
Zdenek Kabelac
99e96f3ce9 coverity: fix error paths
Patch 74e704bb44 missed to update
error path. Since now we just need to 'return_0' as 'dmt is NULL
and thus may not be destroyed.
2016-04-22 01:12:34 +02:00
Zdenek Kabelac
7e5881dd13 snapshot: improve merge
Improve code for snapshot merge for readabilty
and also reduce number of tests needed to decide
if merging can or cannot be started.

(Further improving 9cccf5245a)
2016-04-18 23:05:17 +02:00
Zdenek Kabelac
f5cf6cc9ed debug: fix error message
log_error is replaced with warn since the result
of function is not result in command error.
2016-04-18 12:32:56 +02:00
Zdenek Kabelac
9cccf5245a thin: improve recognizing of merge in progress
Rewrite too condensed condition in more readable form,
wher decision are clearly separated and commented and
also add debug messages for them.
2016-04-18 12:32:52 +02:00
Zdenek Kabelac
e2cd882f2e thin: check runtime table line for merging
To recognize in runtime if we are merging or not
to make consistent decision between suspend and resume
add function to parse thin table line when add
merging thin device to the table.
2016-04-18 11:18:35 +02:00
Alasdair G Kergon
6954de22e2 activate: Improve snapshot merge initiation
A snapshot merge into its origin cannot be initiated while the devices
are in use.  If there is outside interference (such as from udev),
the suspend (preload) and resume stages can reach conflicting decisions
about whether or not to proceed.

Try to make the logic more robust by checking the inactive or live
table during resume.  (This is still not perfect.)
2016-04-15 02:33:38 +01:00
Alasdair G Kergon
150cff9cce activation: Log when pending snap merge postponed. 2016-04-14 23:42:56 +01:00
Zdenek Kabelac
1be74cfd7f cleanup: gcc warn about comparing int with uint 2016-04-12 11:47:51 +02:00
Zdenek Kabelac
5cfa6cb347 cleanup: simplier to read condition
Make more readable what we are looking for and just test for
KERNEL version at one place.
2016-04-08 20:20:16 +02:00
Zdenek Kabelac
74e704bb44 cleanup: reuse _setup_task
Shorten code and use common code from _setup_task.
Reorder naming of major:minor sscanf (as later it's been
also used swapped there was no real bug).
2016-04-08 20:20:16 +02:00
Zdenek Kabelac
a09d65891f dev_manager: device_is_usable does not flush
When scanning if device is being usable as PV,
we call STATUS - but this status should not cause
any flushing.
Skip also open_count information as it's not needed.
2016-04-08 20:20:04 +02:00
Alasdair G Kergon
f40dfb48ad activation: Skip another non-prefixed info ioctl.
_percent() also does a lookup by dm uuid.
Also get kernel version from cmd->kernel_vsn.
2016-04-08 16:27:12 +01:00
Zdenek Kabelac
8b2108e6b1 activation: do not check for devs without LVM-
Devices without "LVM-" uuid prefix have been generated by very old
version of lvm2 2.00 and 2.01.
Since version 2.02 all lvm2 devices are using prefix "LVM-".

However checking for present of ancient non prefixed devices does
take extra IOCTL per every call and for majority of todays user
it will not find anything new.

So use the assumption that users with kernel 3.X and newer are not
really using such old versions of lvm2 (year <2005) and with their
new kernel they are also using new version of lvm2 and skip
checking for them.

This change also makes trace logs more readable.
2016-04-07 22:32:08 +02:00
Zdenek Kabelac
5b2227c2c1 preload: preserve flushing state
When leaving preload routine it should not altet state of flush required
when it's been already set to 1 and drop it to 0.

The API here is unclean, but current usage expects the same
variable pointer is for all preload calls and combines 'flush_required'
across all of them.
2016-04-06 11:31:02 +02:00
Zdenek Kabelac
7c1937f8df suspend: fix suspend with noflush limitation
Commit 844b009584 tried to move
limit for usage of noflush to 'preload' however this was not
correctly processed.

Intead explicitly check for which types we do not want noflush
and also add debug message in this case.
2016-04-06 11:31:02 +02:00
Zdenek Kabelac
d4c03cf7b2 mirror: fix flushing for mirror target
Fix regression caused by commit ba41ee1dc9.
The idea was to use no_flush for changed device only for thin volumes
and thin pools but also to merge this with change made in commit
844b009584.

However the resulting condition has caused misbehavior for the mirror
suspend - as that has been before the ONLY allowed target type
that could have been suspended with noflush.

Result was badly working repair for --type mirror that has been
passing 'flush' to the repaired mirror target whenever preload
wrongly set flush_required.

The origin code has required the flush_required to be set whenever
deivce size is changed.

Now it first detects if device size got smaller
'dm_tree_node_size_changed(root) < 0' - this requires flush.
Otherwise it checks device is not thin volume nor thin pool and its
size has changed (got bigger) and requires flush.

This mean upsize of thin-pool or thin volume will not require flush.
2016-04-06 11:31:02 +02:00
Alasdair G Kergon
60befab773 Revert "thin: display highest mapped sector"
This reverts commit fc7dacaa4c.

Let's put this information into a separate field.  It doesn't meet the
definition of the existing field.
2016-04-01 20:09:38 +01:00
Zdenek Kabelac
5034bb8d18 cleanup: use local var to read struct 2016-03-31 12:21:40 +02:00
Zdenek Kabelac
b10253ab4d cleanup: use TARGET define 2016-03-31 12:21:40 +02:00
Zdenek Kabelac
fc7dacaa4c thin: display highest mapped sector
Use  meta%  to expose highest mapped sector in thinLV.
so showing there 100.00% means thinLV maps latest sector.

Currently using a 'trick' with total_numerator to pass-in
device size when  'seg==NULL'

TODO: Improve device status API per target - current 'percentage'
is not really extensible.
2016-03-31 12:20:43 +02:00
Zdenek Kabelac
8bbec41bd4 thin: no thin-pool flush when reading metadata status
Previous fix missed the fact the we do query for 'percent' with
seg value either set or unset (API overload...)
When 'seg' was unset, we still issue flush with status.
Fix it by cheking segtype by target_type.

As we check for segtype - we could also skip whole percentage
if the 'segtype' is unknown by code directly.

Reported-by: Ming-Hung Tsai <mingnus gmail com
2016-03-31 12:15:47 +02:00
Alasdair G Kergon
1216efdf15 activate: Use macros for target and module names. 2016-03-22 17:46:15 +00:00
Zdenek Kabelac
5c415afd85 cache: check for cache fail during flush
Just WARN if the cache can't be flushed because it's failed.
2016-03-10 18:38:53 +01:00
Zdenek Kabelac
ddfec5b51a coverity: use same arithmetic for both major and minor
Run all arithmetic in the same 'dev_t' type.
2016-02-23 21:40:17 +01:00
Zdenek Kabelac
68955a8102 coverity: ensure non-null pointers are used
Here is too complex for Coverity to guess
those pointers cannot be NULL, but it's
very easy to add little checks here.
2016-02-23 21:40:16 +01:00
Zdenek Kabelac
dbc71dc05e gcc: cleanup some sign warnings
When comparing unsigned with int, the comparision is made
as 'unsigned' type, so make it rather explicit which type
is being compared.
2016-02-23 12:25:25 +01:00
Zdenek Kabelac
293aabe4cd cache: enforce header check
Currently it's been checked for 'zero' header for thin-pool,
but lets use it always for cache as well - since it's relatively 'cheap'
detection of read 'error' problems as thin/cache tools
currently do not work fast enough in this case.
2016-02-23 12:25:25 +01:00
Zdenek Kabelac
f501f083bf thin: fix read size compare
Fix the compare with 'unsigned' sizeof() and error read -1 result.
So the read error is correctly recognized.
2016-02-23 12:22:18 +01:00
Zdenek Kabelac
f31d596c0d thin: report needs_check and fail state
Fix reporting of Fail thin-pool target status
as attr[8] letter 'F'.

Report  'needs_check' status from thin-pool target via
attr field [4] (letter 'c'/'C'), and also via CheckNeeded field.

TODO: think about better name here?
TODO: lots of prop_not_implemented_set
2016-02-18 16:49:34 +01:00
Zdenek Kabelac
fcbef05aae doc: change fsf address
Hmm rpmlint suggest fsf is using a different address these days,
so lets keep it up-to-date
2016-01-21 12:11:37 +01:00
Zdenek Kabelac
8b16efd17c debug: correct stack tracing
Here the 'goto' is correct path, as  !device_is_usable
is traceable with <backtrace>.

Keep the 'stack' for unusable device.
2015-12-04 22:10:30 +01:00
Zdenek Kabelac
86e7894ecc cleanup: use dm_get_status_mirror
Use libdm function to parse mirror status report.
2015-12-01 13:03:16 +01:00
Zdenek Kabelac
6336ef98d4 lib: pass mem pool to check_transient_status
check_transient_status() may need to allocate some memory,
so pass in already existing mem pool.
2015-12-01 13:01:28 +01:00
Zdenek Kabelac
922fccc656 cleanup: using display_lvname
Use for showing vgname/lvname in messages.
No functional change.
2015-11-26 09:27:37 +01:00
Zdenek Kabelac
b7b59ad932 cleanup: remove unused code
Remove long outstand unused code lines, which were already
been obsoleted by other code.

Statuses and snapshot tree creation is already handled differently.

Also drop some 'extra' log_error() and use only stack;
since error has already been reported.
2015-11-26 09:27:37 +01:00
Zdenek Kabelac
528695ec20 cleanup: avoid allocation for vg_name
Since we do not use dev_manager in a way we would have destroyed VG
content while  in-use - we could safely keep just pointer.
So dropping strdup.

Also it seems we actually no longer use vg_name for anything
so it may possibly go away completely unless it would be useful
for debugging...
2015-11-26 09:27:37 +01:00
Zdenek Kabelac
0285066e10 thin: fix previous update of partial tree building
We do want to preserve 'active' thin-pool,
so add this 'fake' layer only when activating.

TODO:  think how to use thin-pool without fake LV layer.
2015-11-24 23:24:11 +01:00
Zdenek Kabelac
94c9453659 thin: work with active thin-pool
When 'lvextend -L+XX vg/thinpool'  do not leave inactive table
loaded for 'wrapping' LV on top of resized thin-pool
(ATM we use linear  LV for this with same size as thin-pool).
2015-11-23 23:41:36 +01:00
Zdenek Kabelac
a45cc0fe14 raid: fix the string compare
Coverity noticed this condition is always false and the error
path could never be visited.

So check for all mismatches of supported messages
and actually mark log_error as internal error.
2015-11-10 21:40:28 +01:00
Zdenek Kabelac
164d7e72bf devmanager: validate target params
Coverity: ensure we do not read through NULL pointers for
target_type and params.
2015-11-09 10:19:20 +01:00
Zdenek Kabelac
80c3fb786c thin: fix error path mem leak
Coverity: when parsing of thin-pool status would have failed,
it could have leaked memory pool and dmt struct.
2015-11-09 10:19:19 +01:00
Zdenek Kabelac
ba41ee1dc9 thin: limit no-flush using only for thin-pool
For this release keep usage of 'noflush' only for thin-volume/pool.

For rest of keep - keep usage of 'noflush' flag purely for
non-resized mirrors.
2015-10-26 23:57:31 +01:00
Zdenek Kabelac
f898cf7539 dev_manager: no flush for extension
Recognize the target only 'extends' and do not enforce
'flush' in this case.  Only the size reduction
still requires flush (so disables usage of no_flush flag).

If some other targets do require flush before suspend,
they have to explicitly ask for it.
2015-10-25 21:09:31 +01:00
Zdenek Kabelac
844b009584 dev_manager: enabled no_flush for suspend
While the activation code tries to evaluate which target
really needs flush with suspend and which may go without flush,
it has stayed effectively disabled by original commit:
33f732c5e9 since here
it only allows to pass non-pvmoving  'mirrors'.

So remove check for mirror LV type and only disable
no_flush for 'pvmove'..

TODO: Looking into history - it also seemed like raid target
would have always required flushing but it's been later
removed without clean explanation.

If some more targets really do need 'no_flush' it should
been handle at their 'level' - since we now stack multiple
targets over itself.
2015-10-25 21:07:37 +01:00
Alasdair G Kergon
39a97d86f0 segtypes: Add and use new segtype macros.
Includes fixing an inverted raid10 segtype check in _raid_add_target_line.
2015-09-24 14:59:07 +01:00
Alasdair G Kergon
214e2cddf6 segtypes: Use SEG_TYPE_NAME_ string constants. 2015-09-22 19:04:12 +01:00
Zdenek Kabelac
ee8200f1c6 cleanup: use just 2 decimal digits 2015-09-03 23:34:37 +02:00
Zdenek Kabelac
872ea3b987 thin: do not flush when quering for thin percent
Since we may easily get blocked when checking for percentage
of thin-pool - do not flush and just show current values.
This avoids holding VG locked when pool is overfilled.
2015-09-03 23:34:36 +02:00
Zdenek Kabelac
a01eb9c451 thin: detect unusable thins
Try to detect thin-pool which my block lvm2 command from furher
processing (i.e. lvextend).

Check if pool is read-only or out-of-space and in this case thins
will skipped from being scanned (so user may miss some PVs located
on thin volumes).
2015-09-03 23:34:36 +02:00
Peter Rajnoha
ac3143c093 config: {thin,cache}_{check,repair}_options are never undefined
Require global/{thin,cache}_{check,repair}_options to be always defined.
If not defined directly by user in the configuration and if there's no
concrete default option to use, make "" (empty string) the default one -
it's then clearly visible in the "lvmconfig --type default" (and
generated lvm.conf) and also it makes its handling in the code more
straightforward so we don't need to handle undefined values.

This means, if there are no default values for these settings defined,
we end up with this generated now:
  {thin,cache}_{check,repair}_options = [ "" ]

So the value is never undefined and if it is, it's an error.

(The cache_repair_options is actually not used in the code at the moment,
but once the code using this setting is in, it will follow the same logic
as used for thin_repair_options.)
2015-07-14 10:13:41 +02:00
Peter Rajnoha
3b6840e099 config: replace find_config_tree_node with find_config_tree_array where appropriate 2015-07-08 13:03:08 +02:00
Zdenek Kabelac
0ac20a8fdb cache: support clear-needs-check
Support newer cache tool which support new option
--clear-needs-check-flag.

Code does same as for thin_check.
2015-07-07 09:57:27 +02:00
Zdenek Kabelac
a900d150e4 thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:

* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.

* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).

* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)

* Full thin-pool suspend happened, when taken a thin-volume snapshot.

With this patch the new method relocates message passing into suspend
state.

This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.

Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.

When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.

This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.

Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.

Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.

Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.

Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.

Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-03 16:13:14 +02:00
Peter Rajnoha
0a203070f5 cleanup: missing target_type check in device_is_usable filter 2015-06-17 14:27:48 +02:00
Peter Rajnoha
5577f2f4f0 cleanup: || instead of |
More efficient with same result here.
2015-06-17 14:12:58 +02:00
Peter Rajnoha
1e6a926e85 filter: filter-usable: consider snapshot and origin LV as unusable if its component is suspended
Note that this is just a quick fix and it needs more robust fix to
encompass any combination, not just the (old) snapshot one!

This started with this report:
  https://bugzilla.redhat.com/show_bug.cgi?id=1219222

If we have devices/ignore_suspended_devices=1 set based on which we
filter out suspended devices as unusable (or if we ignore suspended
devices by force, e.g. during lvconvert called from dmeventd) and
when we have snapshot and snapshot origin devices in the play, we
need to look at their components unerneath (*-real and *-cow) to
check if they're not suspended. If they are, the snapshot/snapshot
origin is not usable as well and hence it needs to be filtered out
by filter-usable.c code which does suspended device filtering.

Not going into much details here, more details are in the bugzilla
mentioned above. However, this is a quick fix since snapshot
and this exact situation is not the only one. So this is
something that needs to be revisited and fixed properly
with full dm tree and checking the whole stack to state
whether the device at the very top is usable or not.
2015-06-17 13:37:53 +02:00
Zdenek Kabelac
e7eb5b0696 debug: better tracing messages
Enhance traced output.
2015-06-15 14:48:06 +02:00
David Teigland
95da21cc18 config: fix check_options array
The code used it as both a single string, and as
an array of strings in different places.  Fix it
so that it's an array of strings everywhere.
2015-04-23 10:35:34 -05:00
Zdenek Kabelac
d2d3f0d747 cleanup: use macro lv_is_visible() 2015-01-28 13:45:27 +01:00
Zdenek Kabelac
b3a348c03c report: use same info also for lv_attr
Recently the single 'status' code has been used for number of cache
features.

Extend the API a little bit to allow usage also for lv_attr_dup.

As the function itself is used in lvm2api - add a new function:
lv_attr_dup_with_info_and_seg_status() that is able to use
grabbed info & status information.

report_init() is now using directly passed lvdm struct pointer
which holds the infomation whether lv_info() was correctly obtained or
there was some error when trying to read it.

Move 'healt' attribute to status.
TODO convert raid function to use the already known status.
2015-01-20 14:58:41 +01:00
Zdenek Kabelac
0869631d7d lv_status: enable lv_status for thinpool
Support also status for thin pools.
2015-01-14 14:50:08 +01:00
Zdenek Kabelac
d0f26440ee cleanup: properly align code lines
Misaligned indetion in branches.
2015-01-14 14:50:08 +01:00
Peter Rajnoha
c0e17bca90 dev_manager: do not mark snapshot origins as unusable devices just because of possible blocked mirror underneath
At first, all snapshot-origins where marked as unusable unconditionally
here, but we can't cut off whole snapshot-origin use in a stack just
because of this possible mirror state. This whole "device_is_usable"
check was even incorrectly part of persistent filter before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (where filter cleanup was
done).

The persistent filter is used only if obtain_device_list_from_udev=0,
which means that the former check for snapshot-origin here had not even
been hit with default configuration for a few years before commit
a843d0d97c66aae1872c05b0f6cf4bda176aae2 (the check for snapshot-origin and
skipping of this LV was introduced with commit a71d6051ed
back in 2010).

The obtain_device_list_from_udev=1 (and hence not using persistent
filter and hence not hitting this check for snapshot-origins and skipping) has been
in action since commit edcda01a1e (that is 2011).
So for 3 years this condition was not even checked with default configuration,
making it superfluous.

This all changed in 2014 with commit 8a843d0d97
where "filter-usable" is introduced  and since then all snapshot-origins
have been marked as unusable more often than before and making snapshot-origins
practically unusable in a stack.

This patch removes this incorrect check from commit a71d6051ed
which caused snapshot-origins to be unusable more often recently.

If we want to fix this eventually in a correct way, we need to look
down the stack and if snapshot-origin is hit and there's a blocked
mirror underneath, only then mark the device as unusable. But mirrors
in stack are not supported anymore so it's questionable whether it's
worth spending more time on this at all...
2015-01-09 11:24:16 +01:00
Zdenek Kabelac
4dc602f79b dev_manager: fix mknodes
Fix regression introduced with a2c1024f6a

_setup_task(mknodes ? name : NULL...

has been replaced with:

_setup_task(type != MKNODES ? name : NULL....

Use '=='
2014-11-22 09:57:31 +01:00
Zdenek Kabelac
428b9fcd87 cleanup: validate pointers
Mostly on almost impossible to happen paths - but stay safe.
2014-11-13 17:49:42 +01:00
Zdenek Kabelac
c3e2990359 cleanu: drop duplicate const 2014-11-13 13:15:58 +01:00
Zdenek Kabelac
fba86dd42b cache: improve pending_delete
We need to stop guessing deleted names - so rather collect
deleted  UUID into a string list - and then remove them properly
in _clean_tree. Restore origin _clean_tree behaviour them for
currently unconverted removal of snapshots.

Pending delete feature now properly tracks whole subtree of cache
(so i.e. data or metadata as raid volumes).
It properly replaces all related volumes with 'errors' in suspend
preload, then resume them as error and remove collected UUIDs
from root - since they are not longer part of any volume deps.
2014-11-13 11:54:41 +01:00
Peter Rajnoha
359dc6fa76 coverity: commit ba2302346 - report log_sys_error properly
log_sys_error uses errno, hence we need to report the first
failure before reporting another failure that uses errno as well.
2014-11-12 15:16:54 +01:00
Peter Rajnoha
ce8730b508 coverity: fix possible integer overflow
LVM2.2.02.112/lib/metadata/cache_manip.c:73: overflow_before_widen: Potentially overflowing expression "*pool_metadata_extents *vg->extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->len * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
LVM2.2.02.112/lib/activate/dev_manager.c:217: overflow_before_widen: Potentially overflowing expression "seg_status->seg->le * extent_size" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).
2014-11-12 10:03:27 +01:00
Peter Rajnoha
60cc666c94 coverity: fix compiler warning
LVM2.2.02.112/lib/activate/dev_manager.c:196:5: warning: 'dmtask' may be used uninitialized in this function [-Wmaybe-uninitialized]

In _info_run fn:

switch (type) {
	case INFO:
		...
	case STATUS:
		...
	case MKNODES:
		...
}

The "type" is enum and currently only those three types are supported,
but if we added a new type in the future, this would end up with a bug
(if we forgot to add the new "case" in that "switch"). So let's make
sure proper internal error is printed:

	default:
		log_error(INTERNAL_ERROR "_info_run: unhandled info type");
                return 0;
2014-11-12 09:55:12 +01:00
Zdenek Kabelac
57c618b0ed cache: fix clean_tree
Fix 8121074fda - the patch
incorrectly removed also other top-level nodes.

It needs to deactivate purely subnodes of _corig.
2014-11-12 09:40:27 +01:00
Peter Rajnoha
ba23023464 coverity: fix resource leaks
LVM2.2.02.112/tools/toollib.c:1991: leaked_storage: Variable "iter" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/filters/filter-usable.c:89: leaked_storage: Variable "f" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/activate/dev_manager.c:1874: leaked_handle: Handle variable "fd" going out of scope leaks the handle.
2014-11-12 09:19:14 +01:00
Peter Rajnoha
9704515c1e dev_manager: only support status for cache segment at the moment
When getting status for LV segment types, we need to be sure
that proper segment is selected for the status ioctl.

When reporting fields that require status ioctl,
the "_choose_lv_segment_for_status_report" fn in tools/reporter.c
must be completed properly to choose the proper segment for all
the LV types (at the moment, it just takes the first LV segment
by default).

This works fine with cache LVs surely. The other segment types
need more auditing. We use this status ioctl only for cache status
fields at the moment only, so restrict it to the cache only.

Once the _choose_lv_segment_for_status_report is completed
properly, release the restriction in _get_segment_status_from_target_params.
2014-11-11 15:02:21 +01:00
Zdenek Kabelac
8121074fda cache: pending_delete fixes 2014-11-11 13:32:41 +01:00
Zdenek Kabelac
9a6e3683a2 cache: never create new table entry for deleted cache 2014-11-11 13:32:41 +01:00
Peter Rajnoha
a2c1024f6a dev_manager: enhance dev_manager_info to acquire LV segment status if requested, add lv_info_with_seg_status fn 2014-11-11 13:04:02 +01:00
Zdenek Kabelac
ca509c9746 dev_manager: workaround to allow top-level _tmeta, _tdata 2014-11-11 00:53:37 +01:00
Zdenek Kabelac
02f49caa35 debug: log tree type is created
Print tree type and use internal_error for unknown type.
2014-11-10 22:05:49 +01:00
Zdenek Kabelac
e5d3f81285 cleanup: indents comments backtraces 2014-11-10 22:05:49 +01:00
Zdenek Kabelac
f5e265a07f cache: use LV_PENDING_DELETE 2014-11-10 22:05:49 +01:00
Zdenek Kabelac
00a45ca491 thin: new pool is activated without overlay
Activate of new/unused/empty thin pool volume skips
the 'overlay' part and directly provides 'visible' thin-pool LV to the user.

Such thin pool still gets 'private' -tpool UUID suffix for easier
udev detection of protected lvm2 devices, and also gets udev flags to
avoid any scan.

Such pool device is 'public' LV with regular /dev/vgname/poolname link,
but it's still 'udev' hidden device for any other use.

To display proper active state we need to do few explicit tests
for this condition.

Before it's used for any lvm2 thin volume, deactivation is
now needed to avoid any 'race' with external usage.
2014-11-04 15:29:22 +01:00
Zdenek Kabelac
ee627884de thin: no validation skip of new thin pools
Allowing 'external' use of thin-pools requires to validate even
so far 'unused' new thin pools.

Later we may have 'smarter' way to resolve which thin-pools are
owned by lvm2 and which are external.
2014-11-04 15:28:00 +01:00
Zdenek Kabelac
1b439a0b8e cleanup: rename function
Make more clear dm_info type.
2014-11-03 14:19:34 +01:00
Zdenek Kabelac
d6c5445bea cleanup: correcting tracing
Use log_error for real error.
2014-11-03 14:19:34 +01:00
Zdenek Kabelac
29bd3cccc8 cache: support activation of empty cache-pool
When the cache pool is unused, lvm2 code will internally
allow to activate such cache-pool.

Cache-pool is activate as metadata LV, so lvm2 could easily
wipe such volume before cache-pool is reused.
2014-11-03 14:19:33 +01:00
Zdenek Kabelac
ab49120465 cache: lv_cache_status
Replace lv_cache_block_info() and lv_cache_policy_info()
with lv_cache_status() which directly returns
dm_status_cache structure together with some calculated
values.

After use  mem pool stored inside lv_status_cache structure
needs to be destroyed.
2014-11-03 14:19:33 +01:00
Zdenek Kabelac
13e6369d7f cleanup: add arg to _setup_task
Add init of  no_open_count into _setup_task().
Report problem as warning (cannot happen anyway).

Also drop some duplicated debug messages - we have already
printed the info about operation so make log a bit shorter.
2014-11-03 14:19:33 +01:00
Peter Rajnoha
00d8ab8492 refactor: make it possible to select what to check exactly when calling device_is_usable fn
Currently, there are 5 things that device_is_usable function checks
(for DM devices only, of course):
  - is device empty?
  - is device blocked? (mirror)
  - is device suspended?
  - is device composed of an error target?
  - is device name/uuid reserved?

If answer to any of these questions is "yes", then the device is not usable.
This patch just adds possibility to choose what to check for exactly - the
device_is_usable function now accepts struct dev_usable_check_params make
this selection possible. This is going to be used by subsequent patches.
2014-09-30 13:11:58 +02:00
Zdenek Kabelac
84cdf85bd2 cleanup: constify activation usage of lv pointer
Let's enforce cheking of write access to LV by compiler.
Activation part does never need to write anything to LV
so keep LV pointer const.
2014-09-24 10:54:47 +02:00
Alasdair G Kergon
979be63f25 mirrors: Fix checks for mirror/raid/pvmove LVs.
Try to enforce consistent macro usage along these lines:

lv_is_mirror - mirror that uses the original dm-raid1 implementation
               (segment type "mirror")
lv_is_mirror_type - also includes internal mirror image and log LVs

lv_is_raid - raid volume that uses the new dm-raid implementation
             (segment type "raid")
lv_is_raid_type - also includes internal raid image / log / metadata LVs

lv_is_mirrored - LV is mirrored using either kernel implementation
                 (excludes non-mirror modes like raid5 etc.)

lv_is_pvmove - internal pvmove volume
2014-09-16 00:13:46 +01:00
Alasdair G Kergon
2360ce3551 cleanup: Use lv_is_ macros.
Use lv_is_* macros throughout the code base, introducing
lv_is_pvmove, lv_is_locked, lv_is_converting and lv_is_merging.

lv_is_mirror_type no longer includes pvmove.
2014-09-15 21:33:53 +01:00
Zdenek Kabelac
25fe716b12 cleanup: indent and stacktrack
Add missing stacktrace on error path
and newline indent.
2014-08-26 14:13:07 +02:00
Alasdair G Kergon
7cff640d9a activation: Fix upgrades using uuid suffixes.
2.02.106 added suffixes to some LV uuids in the kernel.

If any of these LVs is activated with 2.02.105 or earlier,
and then a later version is used, the LVs appear invisible and
activation commands fail.

The code now has to check the kernel for both old and new uuids.
2014-07-30 21:55:11 +01:00
Zdenek Kabelac
c0c1ada88e pool: callback handle cache
Extend the callback functionality to handle also cache pools.

cache_check is now executed on cachepool metadata when
it's activated and deactivated.
2014-07-11 12:57:45 +02:00
Jonathan Brassow
be75076dfc activation: Add "degraded" activation mode
Currently, we have two modes of activation, an unnamed nominal mode
(which I will refer to as "complete") and "partial" mode.  The
"complete" mode requires that a volume group be 'complete' - that
is, no missing PVs.  If there are any missing PVs, no affected LVs
are allowed to activate - even RAID LVs which might be able to
tolerate a failure.  The "partial" mode allows anything to be
activated (or at least attempted).  If a non-redundant LV is
missing a portion of its addressable space due to a device failure,
it will be replaced with an error target.  RAID LVs will either
activate or fail to activate depending on how badly their
redundancy is compromised.

This patch adds a third option, "degraded" mode.  This mode can
be selected via the '--activationmode {complete|degraded|partial}'
option to lvchange/vgchange.  It can also be set in lvm.conf.
The "degraded" activation mode allows RAID LVs with a sufficient
level of redundancy to activate (e.g. a RAID5 LV with one device
failure, a RAID6 with two device failures, or RAID1 with n-1
failures).  RAID LVs with too many device failures are not allowed
to activate - nor are any non-redundant LVs that may have been
affected.  This patch also makes the "degraded" mode the default
activation mode.

The degraded activation mode does not yet work in a cluster.  A
new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
to make that work.  Currently, there is limited space for this
extra flag and I am looking for possible solutions.  One possible
solution is to usurp LCK_CONVERT, as it is not used.  When the
locking_type is 3, the degraded mode flag simply gets dropped and
the old ("complete") behavior is exhibited.
2014-07-09 22:56:11 -05:00
Peter Rajnoha
cfed0d09e8 report: select: refactor: move percent handling code to libdm for reuse 2014-06-17 16:27:21 +02:00
Zdenek Kabelac
2f260c9909 activation: retry cleanup deactivation
Enable 'retry' deactivation also in 'cleanup' phase.
It shouldn't be mostly needed - however udev now produces
more and more completelny non-synchronizable device opens,
so even for orphan devices we can't easily predict where
udevd opens devices.

So it's more preferable here to log error about device being open
and retry clean, but let the command proceed.
2014-06-10 10:51:24 +02:00
Zdenek Kabelac
cb7bba9ffe dev_manager: disable extra udev loop
Disable code which has postprocessed whole tree and reset udev flags.
We need to find out which case was troublesome - since this loop
was just hidding bug in other code parts (most probably preload tree)
2014-05-23 21:36:55 +02:00
Zdenek Kabelac
9eab84aa2b debug: catch invalid request for tree
In general for non-toplevel LVs we shouldn't allow any _tree_action.
For now error on request for cache_pool activation which
doesn't even exist in dm-table.
2014-04-08 11:00:15 +02:00
Zdenek Kabelac
a018c57f0b cache: never activate cache pool
Since cache-pool is purely lvm abstraction layer LV, it never
need any device node, so do not add even 'error' device for it.
2014-04-01 20:17:10 +02:00
Zdenek Kabelac
c0f1eb5f0f dev_manager: check prohibited devices earlier
Reorder detection for internal device - since this test
is much simpler then target analysis, check it sooner.

Replace test for '68' with sizeof & ID_LEN

Add FIXME about device alias problem with is_reserved_lvname,
since this test fails on devices like /dev/dm-X
so we need to convert tests to UUID.
2014-03-12 19:38:34 +01:00
Zdenek Kabelac
4cc5c689b8 thin: add pool uuid suffix for pool volume
Even though we make pool volume as a public visible LV,
we still do not want tools to look at this volume.

While we do not create /dev/vg/lv link, device is still
accessible via /dev/mapper/vg-lv and there is no easy
way to recognize it's private without lvm2 metadata.

Enhance UUID with -pool suffix and directly skip
any LV with a suffix in  device_is_usable() call.

TODO: enhance other targets with this logic.
blkid may probably use same simple logic.
2014-03-12 00:16:27 +01:00
Zdenek Kabelac
6a0d97a65c lvm: change build_dm_uuid API
Pass directly 'lv' into this build routine,
so we can eventually add more private UUID suffixes.
2014-03-12 00:16:20 +01:00
Zdenek Kabelac
4d64e91efd thin: do not check of empty pool with messages
The empty pool is also the pool which has yet queued list of messages
and transaction_id == 1.

Problem is exposed when pool is created inactive.

lvcreate -L10 -T vg/pool -an
lvcreate -V10 -T vg/pool
2014-03-12 00:15:22 +01:00
Zdenek Kabelac
9974136b90 cleanup: indent 2014-02-17 22:25:53 +01:00
Zdenek Kabelac
f0f4248333 activation: drop test r/w vg state for activing LV
VG status read/write is meant to influence only VG metadata.
It's not related to the read/write status of the LV itself.
2014-02-15 11:34:54 +01:00
Jonathan Brassow
96626f64fa cache: Code to allow the create/remove of cache LVs
This patch allows users to create cache LVs with 'lvcreate'.  An origin
or a cache pool LV must be created first.  Then, while supplying the
origin or cache pool to the lvcreate command, the cache can be created.

Ex1:
Here the cache pool is created first, followed by the origin which will
be cached.
~> lvcreate --type cache_pool -L 500M -n cachepool vg /dev/small_n_fast
~> lvcreate --type cache -L 1G -n lv vg/cachepool /dev/large_n_slow

Ex2:
Here the origin is created first, followed by the cache pool - allowing
a cache LV to be created covering the origin.
~> lvcreate -L 1G -n lv vg /dev/large_n_slow
~> lvcreate --type cache -L 500M -n cachepool vg/lv /dev/small_n_fast

The code determines which type of LV was supplied (cache pool or origin)
by checking its type.  It ensures the right argument was given by ensuring
that the origin is larger than the cache pool.

If the user wants to remove just the cache for an LV.  They specify
the LV's associated cache pool when removing:
~> lvremove vg/cachepool

If the user wishes to remove the origin, but leave the cachepool to be
used for another LV, they specify the cache LV.
~> lvremove vg/lv

In order to remove it all, specify both LVs.

This patch also includes tests to create and remove cache pools and
cache LVs.
2014-02-04 16:50:16 -06:00
Jonathan Brassow
75b8ea195c cache: New functions for gathering info on cache devices
Building on the new DM function that parses DM cache status, we
introduce the following LVM level functions to aquire information
about cache devices:
- lv_cache_block_info: retrieves information on the cache's block/chunk usage
- lv_cache_policy_info: retrieves information on the cache's policy
2014-01-28 12:24:51 -06:00
Zdenek Kabelac
c3d82d717c Revert "tree_action: destroy devices from failing activation"
This reverts commit 24639be558.

Ok - seems we could be here a bit too active - and we
may remove devices which are unsuable for reasons we are not
aware of - thus taking down whole device could be way to big hammer.

So we still need some solution to recover from failing preload
and activation - but it needs more tunning.
2013-12-17 15:21:28 +01:00
Zdenek Kabelac
24639be558 tree_action: destroy devices from failing activation
When activation fails - we may leak large tree of partially loaded
devices in the dm table (i.e. failure in snapshot activation)

The best we can do here is try to deactivate whole device and
remove as much inactive table entries as we can.
2013-12-17 14:08:54 +01:00
Zdenek Kabelac
664a695561 thin: merge support for device tree
When thin snapshot merge is requested, tree must detect
if user tries to active such LV while origin or
snapshost is still active.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
572983d793 thin: read table line with thin device id
Add functions to parse thin table line to
obtain thin device id.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
6bf6430ae9 cleanup: convert log_error with log_warn
Collapse 2 ifs and replace log_error() with log_warn(), since\
the reported message is not causing tools error.
(and cannot be probably triggered anyway).
2013-11-28 12:48:01 +01:00
Zdenek Kabelac
79991aa769 snapshot: drop find_merging_snapshot
Drop find_merging_snapshot() function. Use find_snapshot()
called after check for lv_is_merging_origin() which
is the commonly used code path - so we avoid duplicated
tests and potential risk of derefering NULL point
in unhandled error path.
2013-11-28 12:42:43 +01:00
Jonathan Brassow
d5896f0afd Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O.  When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices.  This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.

Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel.  While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely.  This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.

I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs).  If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command.  This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror.  They
can then release their lock and allow a repair command to commence.  In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.

This patch brings solution #1.  If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
2013-10-22 19:14:33 -05:00
Peter Rajnoha
039bdad732 activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.

We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.

This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.

For example: lvcreate --thinpool POOL --zero n -L 1G vg

- first, the usual LV is created to do a clean up for pool metadata
  spare. The LV is activated, zeroed, deactivated.

- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
  to avoid any scanning in udev

- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
  udev rule, but since the LV is just a usual LV, we can't make a
  difference. The LV_TEMPORARY internal LV flag helps here. If we
  create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
  and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
  with "invisible" and non-top-level LVs) - udev is directed to
  skip WATCH rule use.

- if the LV_TEMPORARY flag was not used, there would normally be
  a WATCH event generated once the LV is closed after "zeroed"
  stage. This will make problems with immediated deactivation that
  follows.
2013-10-23 14:09:37 +02:00
Peter Rajnoha
ce7489ed22 activation: add support for flagging an LV to skip udev scanning during activation
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.

This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
2013-10-08 13:43:14 +02:00
Zdenek Kabelac
85b9c12e92 cleanup: release all memory in error path
Just ensure no memory will stay in pool even in error path.
2013-09-23 11:35:15 +02:00
Alasdair G Kergon
c0f987949b activation: Fix segfault with inactive pvmove LV.
Set flag to avoid recursion back through an inactive pvmove LV when
populating deptree.
2013-08-28 22:56:23 +01:00