1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-06 17:18:29 +03:00
Commit Graph

123 Commits

Author SHA1 Message Date
Zdenek Kabelac
54e43fd3de Use pool for dm_tree allocation
Using the same pool allocation strategy as we use for vg,
so dm_tree structure is part of the pool itself.
2011-10-14 13:34:19 +00:00
Jonathan Earl Brassow
857339758f This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs).  This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.

A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed.  The segments of the image which is not split off to form the new
LV are transferred to the top-level LV.  This is done so that the original
LV can maintain its major/minor, UUID, and name.  The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed.  (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment.  If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev.  This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs.  This is especially bad during failure
conditions.

When the mirror is suspended, the top-level along with its sub-LVs are
suspended.  The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed.  When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies.  This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
	_add_dev
	dm_tree_add_dev_with_udev_flags
	<*** DM / LVM divide ***>
	_add_dev_to_dtree
	_add_lv_to_dtree
	_create_partial_dtree
	_tree_action
	dev_manager_activate
	_lv_activate_lv
	_lv_resume
	lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure.  Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags.  This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure.  In fact,
'_add_dev' simply reuses the udev_flags given for the initial device!  This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism.  The current code provides no such
mechanism.  Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags.  The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.

My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children.  It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies.  It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.

This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 14:45:40 +00:00
Zdenek Kabelac
9f9b3e1e28 Move defines to header
Make limits for thin data_block_size and device_id part of public API.

FIXME: read them possible from some kernel header file in the future ?
But we may need to support different values for different versions ?
2011-10-06 11:05:56 +00:00
Zdenek Kabelac
1fef12cd31 Name changes
typo zeroeing->zeroing
add size low_water_mark->low_water_mark_size so it's more obvious its sector
related variable.
2011-10-04 16:22:38 +00:00
Zdenek Kabelac
c6d777289b Add intial code to check transaction_id
Fix typy in transaction_id.
Add this as node property, so it could be easily checked on resume.

Code is not yet finished.
2011-10-03 18:34:52 +00:00
Zdenek Kabelac
27596fa624 Move priority check in front
Just a minor code mode - make a test for priority before
more complex uuid checks.
2011-10-03 18:29:48 +00:00
Zdenek Kabelac
f8b4957694 Update error path tracing for _resume_node
dm_task_create & dm_task_set_name produces it's own log_error
Add missing stacks for dm_task_set_cookie, dm_task_run,
dm_task_get_info.
2011-10-03 18:28:25 +00:00
Zdenek Kabelac
2daddac019 Transaction_id is property of thin_pool
Remove Transaction_id from thin target.
Store device_id for thin target.
2011-10-03 18:26:07 +00:00
Zdenek Kabelac
222bbab442 Add supporting function for thinp
New dm_tree_node_add_thin_pool_target() and  dm_tree_node_add_thin_target()
This API is highly experimental and unstable for now.
2011-09-29 08:53:48 +00:00
Zdenek Kabelac
3589d75998 Just add warning about potential problem exteding dm_segtypes
Since raid target is using now dm_segtypes also for search purpose.
2011-09-29 08:50:54 +00:00
Alasdair Kergon
e0948b5825 Introduce revert_lv for better pvmove cleanup.
(One further fix needed to remove the stray pvmove LVs left behind.)
2011-09-27 22:43:40 +00:00
Peter Rajnoha
d1f949465f Add log_error even for general device in use when we can't do the sysfs checks. 2011-09-26 10:17:51 +00:00
Peter Rajnoha
3157f8e8fa Add dm_tree_retry_remove to use retry logic for device removal in a dm_tree. 2011-09-22 17:36:50 +00:00
Peter Rajnoha
638409a573 Replace open_count check with holders/mounted_fs check on lvremove path.
Before, we used to display "Can't remove open logical volume" which was
generic. There 3 possibilities of how a device could be opened:
  - used by another device
  - having a filesystem on that device which is mounted
  - opened directly by an application

With the help of sysfs info, we can distinguish the first two situations.
The third one will be subject to "remove retry" logic - if it's opened
quickly (e.g. a parallel scan from within a udev rule run), this will
finish quickly and we can remove it once it has finished. If it's a
legitimate application that keeps the device opened, we'll do our best
to remove the device, but we will fail finally after a few retries.
2011-09-22 17:33:50 +00:00
Zdenek Kabelac
83ca5e6d5c Remove unused passed parameters 2011-09-07 08:37:48 +00:00
Alasdair Kergon
37086f40a1 spaces->tabs 2011-08-19 17:02:48 +00:00
Alasdair Kergon
6ab3b611c7 restrict dm_tree_node_add_null_area 2011-08-19 16:26:02 +00:00
Jonathan Earl Brassow
4fad401cd2 Add support for m-way to n-way up-convert in RAID1 (no linear to n-way yet)
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way.  It does not yet support upconverting linear to n-way.

The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild.  This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'.  (There is no
mirror-on-mirror layering required.)
2011-08-18 19:41:21 +00:00
Jonathan Earl Brassow
cc00073da7 Add the ability to split an image from the mirror and track changes.
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access.  The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed.  This
operation can be thought of as a partial split.  The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap.  The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed.  The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
	~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
	~> lvconvert --merge vg/lv_rimage_<n>

The internal mechanics of this are relatively simple.  The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'.  This is what will be used if a partial activation of an array
is ever required.  (It would also be possible to use 'error' targets in
place of the '- -'.)  If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array.  So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only.  To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
2011-08-18 19:38:26 +00:00
Jonathan Earl Brassow
470f8b1266 Add some log_error msg's and fix potential segfault
Thanks to kabi for spotting these - especially the possibility for
segfault if a loop runs all the way through without finding a match.
2011-08-11 19:17:10 +00:00
Jonathan Earl Brassow
01d49d0e71 Add basic RAID segment type(s) support.
Implementation described in doc/lvm2-raid.txt.

Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
2011-08-02 22:07:20 +00:00
Alasdair Kergon
f8b6b122ca Downgrade error message - it isn't strictly an internal error in the
library, and the known cause within lvm2 got fixed.
2011-07-08 19:13:05 +00:00
Zdenek Kabelac
3dd7f795f5 Report internal error when parameters are missing on table load
When some target is passing empty parameters to some dm target,
report this as an internal error to better catch some broken
table construction (some mirror conversions seem to be doing
this for now).
2011-06-30 09:24:58 +00:00
Alasdair Kergon
ed04c9d702 Extend debug log messages to distinguish between the 3 states:
trust udev; verify udev; perform dev node operations directly.
2011-06-27 22:38:53 +00:00
Alasdair Kergon
ea687806b3 Move udev_only logic inside stacked node op code.
(We still need to treat add+readhead+del as a no-op.)
Rename udev_fallback to verify_udev_operations.
Rename --udevfallback to --verifyudev
2011-06-27 21:43:58 +00:00
Alasdair Kergon
ec98c72b04 Return immediately dm_lib_exit() if called more than once.
(Avoiding calling it twice would involve some untangling.)
Decrement the new suspended_counter if removing a suspended device.
2011-06-24 19:33:41 +00:00
Peter Rajnoha
ffd363ef38 Add check for library fallback in _deactivate_node.
This fn calls rm_dev_node directly - an exceptional case. It needs to check
the DM_UDEV_DISABLE_LIBRARY_FALLBACK flag directly (it's called in dm_task_run
normally where it's checked already).
2011-06-22 12:56:02 +00:00
Alasdair Kergon
ccadc42707 Maintain a count of the number of suspended devices in libdevmapper
and use this for the LVM critical section logic.  Also report an error if
code tries to load a table while any device is known to be in the
suspended state.
(If the variety of problems these changes are showing up can't be fixed
before the next release, the error messages can be reduced to debug
level.)
2011-06-13 03:32:45 +00:00
Alasdair Kergon
bb7b8d8780 Fix --mirrorlog mirrored. 2011-06-11 12:55:31 +00:00
Alasdair Kergon
76564e3168 Major pvmove fix to issue ioctls in the correct order when multiple LVs
are affected by the move.  (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.

The current fix was selected so as to minimise the testing surface.  I
hope eventually to replace it with a cleaner one that extends the
deptree code.

Some lvconvert scenarios still suffer from related problems.
2011-06-11 00:03:06 +00:00
Milan Broz
bb685d4e6c Fix another occurrence of linux kernel version check. 2011-06-09 15:52:59 +00:00
Zdenek Kabelac
fb5983fb8f Remove double braces
Clang gives notice about possible confusion as commonly double bracces are
used when some assignment is done inside them.
2011-03-29 20:19:03 +00:00
Alasdair Kergon
9a45e3dbbb Fix dm_udev_wait calls in dmsetup to occur before readahead display not after.
Include an implicit dm_task_update_nodes() within dm_udev_wait().
2011-03-02 00:29:57 +00:00
Zdenek Kabelac
e382b2b990 Add debug message for open_count failure
Report  open_count problem as debug.

Function using _node_has_closed_parents decides whether
it's error or could be ignored.
2011-02-18 16:13:56 +00:00
Zdenek Kabelac
28b075f78d Remove dead assignment in _mirror_emit_segment_line
Remove unused 'r' assignment.
2010-11-29 12:42:10 +00:00
Zdenek Kabelac
be2679a691 Remove dead assignment in dm_tree_node_add_mirror_target_log
'seg' is never used - remove it.
2010-11-29 11:26:00 +00:00
Zdenek Kabelac
d1626c2433 Do not call dm_task_destroy with NULL 2010-11-23 18:29:06 +00:00
Alasdair Kergon
a65df0e598 Add dm_zalloc and use it and dm_pool_zalloc throughout. 2010-09-30 21:06:50 +00:00
Alasdair Kergon
2d3164a59f Use __attribute__ consistently throughout. 2010-07-09 15:34:40 +00:00
Alasdair Kergon
ed2630dce5 Add printf format attributes to yes_no_prompt & dm_{sn,as}printf and fix a calle 2010-07-02 21:16:50 +00:00
Peter Rajnoha
44a82780b9 Use early udev synchronisation and update of dev nodes for clustered mirrors.
When using clustered mirrors, we need device nodes to be created during
processing of device tree, not at its end like we normally do (we need to
access the nodes in cmirror prematurely). Therefore we use a new flag called
"immediate_dev_node" stored in deptree's load_properties struct to instruct the
device tree processing code to immediately synchronize with udev and flush all
stacked node operations so the nodes are prepared for use.

For now, the immediate_dev_node is used for clustered mirrors during
processing the dm_tree_preload_children code only. We can add more later if
needed.
2010-06-21 08:54:32 +00:00
Zdenek Kabelac
333cf73ceb Fix copy&paste detection of kernel release version.
Add log_error to avoid return_0 without log_error.
2010-05-25 08:40:36 +00:00
Alasdair Kergon
bc653452a7 Replace strncmp kernel version number checks with proper ones 2010-05-24 23:11:34 +00:00
Alasdair Kergon
fa7fa300e3 Choose between clustered log versions based on kernel version.
Add fixmes for broken strcmp.
2010-05-24 17:46:47 +00:00
Zdenek Kabelac
c4655582e6 Update Copyright date for resently modifed files 2010-05-24 09:04:27 +00:00
Zdenek Kabelac
bc1024812a Replicator: check open_count for parents of presuspend_node
For deactivation of Replicator check in advance that all heads
have open_count == 0. For this presuspend_node is used as all
head nodes are linking this control node.
2010-05-21 12:30:35 +00:00
Zdenek Kabelac
451be927a8 Replicator: support deactivate of replicator-dev nodes
Introducing dm_tree_node_set_presuspend_node() for presuspending child
node (i.e. replicator control target) before deactivation of parent node
(i.e. replicator-dev target).

This patch presents no functional change to current dtree - only
replicator target currently sets presuspend node for dev nodes.
2010-05-21 12:27:02 +00:00
Zdenek Kabelac
4839e5d913 Replicator: libdm support
Introducing new API calls:
dm_tree_node_add_replicator_target()
dm_tree_node_add_replicator_dev_target().

Define new typedef dm_replicator_mode_t.
2010-05-21 12:24:15 +00:00
Alasdair Kergon
adc1729212 Only fail if the top-level LV fails to be deactivated - allow deactivation
of its dependencies to fail.
2010-04-07 23:51:34 +00:00
Alasdair Kergon
fdef9679b6 Issue a message if the new type of deactivation failure happens.
If this can happen during 'normal' operations, I need to know.
2010-04-07 21:25:09 +00:00