1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-03 05:18:29 +03:00
Commit Graph

2742 Commits

Author SHA1 Message Date
Jonathan Earl Brassow
d098140177 Add policy based automated repair of RAID logical volumes
The RAID plug-in for dmeventd now calls 'lvconvert --repair' to address failures
of devices in a RAID logical volume.  The action taken can be either to "warn"
or "allocate" a new device from any spares that may be available in the
volume group.  The action is designated by setting 'raid_fault_policy' in
lvm.conf - the default being "warn".
2011-12-06 19:30:15 +00:00
Milan Broz
707c49ab77 Fix FIXME and comment :-) 2011-12-03 11:36:10 +00:00
Milan Broz
a8402e1978 Switch locking bits to match RHEL5 version.
FIXME:
There is a problem with overloaded bit 0x80 in locking flag,
the bit flags array must be extended or changed.
2011-12-03 11:34:35 +00:00
Jonathan Earl Brassow
9711057499 Don't allow two images to be split and tracked from a RAID LV at one time
Also, don't allow a splitmirror operation on a RAID LV that is already tracking
a split, unless the operation is to stop the tracking and complete the split.
Example:
~> lvconvert --splitmirrors 1 --trackchanges vg/lv /dev/sdc1
# Now tracking changes - image can be merged back or split-off for good
~> lvconvert --splitmirrors 1 -n new_name vg/lv /dev/sdc1
# ^ Completes split ^

If a split is performed on a RAID that is tracking an already split image and
PVs are provided, we must ensure that
 1) the already split LV is represented in the PVs
 2) we are careful to split only the tracked image
2011-12-01 00:21:04 +00:00
Jonathan Earl Brassow
a927e401f1 Do not allow users to change the name of RAID sub-LVs or the name of the
RAID LV if it is tracking changes for a split image.
2011-12-01 00:09:34 +00:00
Petr Rockai
485922fe20 Update comment on LCK_*DMEVENTD*. 2011-11-30 17:02:37 +00:00
Petr Rockai
9a4b04139d Fix clvmd to respect DMEVENTD_MONITOR_IGNORE. Fixes a bug where dmeventd
actions caused clvmd to turn off monitoring of the volume causing the action.
2011-11-30 17:00:57 +00:00
Jonathan Earl Brassow
2ba1e8fccc The LV_REBUILD flag is not internal - bad comments in metadata-exported.h updated 2011-11-30 02:20:13 +00:00
Jonathan Earl Brassow
0c506d9a40 Support the ability to replace specific devices in a RAID array.
RAID is not like traditional LVM mirroring.  LVM mirroring required failed
devices to be removed or the logical volume would simply hang.  RAID arrays can
keep on running with failed devices.  In fact, for RAID types other than RAID1,
removing a device would mean substituting an error target or converting to a
lower level RAID (e.g. RAID6 -> RAID5, or RAID4/5 to RAID0).  Therefore, rather
than removing a failed device unconditionally and potentially allocating a
replacement, RAID allows the user to "replace" a device with a new one.  This
approach is a 1-step solution vs the current 2-step solution.

example> lvconvert --replace <dev_to_remove> vg/lv [possible_replacement_PVs]

'--replace' can be specified more than once.

example> lvconvert --replace /dev/sdb1 --replace /dev/sdc1 vg/lv
2011-11-30 02:02:10 +00:00
Peter Rajnoha
910440212b Fix last checkin for replicator. 2011-11-29 12:10:38 +00:00
Alasdair Kergon
8dd6036da4 Add activation/use_linear_target enabled by default. (prajnoha)
LVM metadata knows only of striped segments - not linear ones.
The activation code detects segments with a single stripe and switches
them to use the linear target.

If the new lvm.conf setting is set to 0 (e.g. in a test script), this
'optimisation' is turned off.
2011-11-28 20:37:51 +00:00
Alasdair Kergon
c122e5e7a3 Move y/n prompts to stderr and repeat if response has both 'n' and 'y'.
(Note that in a future release we might make this stricter and insist
on exactly 'y' or 'n'.)
2011-11-23 01:34:38 +00:00
Zdenek Kabelac
647c8edf82 Drop pool memory allocated in lv_has_target_type
Remove FIXMES - there should not be any pool free call since
the memory pool is from device manager, and pool is detroyed
after the operation, so doing extra free here would not help here.

However lv_has_target_type() is using cmd mempool so here the extra
call for dm_pool_free makes sence.
2011-11-18 19:42:03 +00:00
Zdenek Kabelac
749c2dc4ab Remove constant expression check
"result_independent_of_operands: ((dev->dev & 0xfff00UL) >> 8) ==
18446744073709551615UL /* -1 */ is always false regardless of the values
of its operands (logical operand of if)."

'dev->dev' is set in dev-cache.c _insert() and it's not expectable
st_rdev would have '-1'

This code has been introduced with drbd support commit and code never
worked - so eliminated.
2011-11-18 19:36:10 +00:00
Zdenek Kabelac
900f5f8187 Replace dynamic buffer allocations for PATH_MAX
Use static buffer instead of stack allocated buffer.
This reduces stack size usage of lvm tool and the
change is very simple.

Since the whole library is not thread safe - it should not
add any new problems - and if there will be some conversion
it's easy to convert this to use some preallocated buffer.
2011-11-18 19:31:09 +00:00
Zdenek Kabelac
8deeeb07ea Unlock memory for vg_write
For write we do not need to hold memory locked.
This relaxes many conditions and avoid problems when allocating
a lot of memory for writting metadata buffers.
(In case of huge MDA size this would lead to mismatch between
locked and unlocked memory region size).

Add also internal check we are not writing in critical section.
2011-11-18 19:28:00 +00:00
Zdenek Kabelac
37f274ced9 Query before removing inactive snapshots
Removal of an inactive origin removes also all related snapshots.

When we now support 'old' external snapshots with thin volumes,
removal of pool will not only drop all thin volumes, but as
a consequence also all snapshots - which might be seen a bit
unexpected for the user - so add a query to confirm such action.

lvremove -f will skip the prompt.
2011-11-18 19:25:20 +00:00
Zdenek Kabelac
91e4512619 Adjusted mirror region size only for mirrors and raids
Update region_size only for mirror and raid targets.
This fixes warning messages when vg is using small
extent size like 1KiB and no mirror/raid is created,
but the user still got the message:

$> vgcreate -s 1K vg  <pvs>
$> lvcreate -L10K vg
Using reduced mirror region size of 4 sectors
2011-11-15 17:32:12 +00:00
Zdenek Kabelac
5f129d15b1 Thin update prompt message
Enhance message with info about how many thin volumes are going to
be removed with thin pool removal.
2011-11-15 17:29:52 +00:00
Zdenek Kabelac
8542953f74 Reorder AND test condition
Take the easiest condition for checking first since they must
apply all together, check local conditions first before doing
more expensive tests.
2011-11-15 17:27:41 +00:00
Zdenek Kabelac
3de08fc9de Thin clean
Reuse seg pointer already set in _add_lv_to_dtree to have the
value of first_seg(lv) (and is used in other parts of this function).
2011-11-15 17:25:05 +00:00
Zdenek Kabelac
25de8ca372 Thin supports only thin volumes as snapshot origins
It's currently of the scope to properly solve the snapshoting
of internal thin devs so prevent non-toplevel snapshots here.
2011-11-15 17:23:51 +00:00
Zdenek Kabelac
ed2368538a Simplify iteration
Since nothing is removed in dm_list snapshot_segs during the loop,
there is no reason to use _safe iteration, so switch to simplier
dm_list_iterate().
2011-11-15 17:21:02 +00:00
Zdenek Kabelac
8ec016236a Thin fix tpool layer
Since we support snapshots of thin volumes, we could have more layers,
so we have to check whether tpool layer is going to be inserted.

As the _add_segment_to_dtree() is the only place that adds tpool
segment, we may just check pointer (no strcmp for layer).

Switch to use  seg_is_  function instead of lv_is_.
2011-11-15 17:15:03 +00:00
Peter Rajnoha
5680d14ecd Avoid 'mda inconsistency' by properly registering UNLABELLED_PV flag (2.02.86).
When a PV label write is deferred to a vg_write call (as introduced by a patch
in 2.02.86), the PV is flagged with the internal UNLABELLED_PV flag. However,
when calling vg_archive before vg_write, we still have the PV labelled with the
UNLABELLED_PV flag which was not recognised as a proper flag while exporting
VG metadata:

  # vgcreate vg /dev/sda
  No physical volume label read from /dev/sda
  Metadata inconsistency: Not all flags successfully exported.
  Metadata inconsistency: Not all flags successfully exported.
  Writing physical volume data to disk "/dev/sda"
  Physical volume "/dev/sda" successfully created
  Volume group "vg" successfully created
2011-11-15 11:54:15 +00:00
Zdenek Kabelac
dd0c58c69b Add missing stack reporting
also remove unneeded {}
2011-11-12 22:53:23 +00:00
Zdenek Kabelac
3af072cc63 Thin use items iterator and stack reporting 2011-11-12 22:52:18 +00:00
Zdenek Kabelac
651ef6be82 Missing stack printing 2011-11-12 22:51:20 +00:00
Zdenek Kabelac
6744c143a5 Thin remove unused define
Remove DM_THIN_ERROR_DEVICE_ID from API.
Remove API warning.
Drop code that was using DM_THIN_ERROR_DEVICE_ID (already commented)
Remove debug message which slipped in through some previous commit.
2011-11-12 22:44:10 +00:00
Milan Broz
64f1fd749f Fix major number filter structure boundary test. 2011-11-11 16:59:30 +00:00
Milan Broz
a3390bb507 Remove unneeded parameter. 2011-11-11 16:41:37 +00:00
Milan Broz
0abb3d7c11 And now add files for real. 2011-11-11 15:24:48 +00:00
Milan Broz
d1b36fbe7f Fix function name in previous patch. 2011-11-11 15:14:05 +00:00
Milan Broz
07113beea3 Do not scan device if it is part of active multipath.
Add filter which tries to check if scanned device is part
of active multipath.

Firstly, only SCSI major number devices are handled in filter.

Then it checks if device has exactly one holder (in sysfs) and
if it is device-mapper device and DM-UUID is prefixed by "MPATH-".

If so, this device is filtered out.

The whole filter can be switched off by setting
mpath_component_detection in lvm.conf.

https://bugzilla.redhat.com/show_bug.cgi?id=597010

Signed-off-by: Milan Broz <mbroz@redhat.com>
2011-11-11 15:11:08 +00:00
Zdenek Kabelac
7891cead11 Thin send create_snap message
Start creating snapshots for real.
Update test suite to check it happens.
2011-11-10 15:30:59 +00:00
Zdenek Kabelac
6e89eb9a52 Small comment and indent updates 2011-11-10 12:43:05 +00:00
Zdenek Kabelac
f201498f99 Thin test min thin_pool size for at least 1 chunk 2011-11-10 12:42:36 +00:00
Zdenek Kabelac
39fc633957 Thin align volume size on chunk boundary size
If the extent_size is smaller then the chunk_size we may try
to find better aligment (wasting less space).

i.e. using  4KB extent_size and  64KB chunk size will
lead to creation of 64KB aligned thin volume.
2011-11-10 12:42:15 +00:00
Zdenek Kabelac
74e53e8bc0 Thin disable pool create without activation 2011-11-10 12:39:01 +00:00
Alasdair Kergon
3da4ed712e Must not override alloc policy specified by user. 2011-11-07 13:54:54 +00:00
Zdenek Kabelac
65e88e6b3c Thin add error message for double delete
Add few more internal error messages.
2011-11-07 11:04:45 +00:00
Zdenek Kabelac
97d7e5aedb Thin supports snapshots
Full support for thin snapshots.
Create and remove is supported.

TODO: lvconvert support is not yes available.
2011-11-07 11:03:47 +00:00
Zdenek Kabelac
11721819a7 Thin reindent code
Drop indention level
Add extra internal error.
2011-11-07 10:59:07 +00:00
Zdenek Kabelac
87371d48cc Thin revert code for exclusive pool activation
There are no limits on thin-pool activation now.
Revert code that is no longer needed.
2011-11-07 10:58:13 +00:00
Zdenek Kabelac
4079a8f298 Avoid lvextend to overflow
Add extra check to extent_count overflow.
Use internal define MAX_EXTENT_COUNT instead UINT32_MAX.
2011-11-04 22:49:53 +00:00
Zdenek Kabelac
83baa0b778 Thin pool allocation simplified
Support allocation of metadata from the same PV, if the VG
is build only from one PV.

As thinp is not mirror - we do not require 2 PVs
for basic thin usage as user is losing only perfomance.
2011-11-04 22:45:52 +00:00
Zdenek Kabelac
bd15208cd7 Thin add thin_pool_metadata_require_separate_pvs
Allow to set different policy for pool from mirrors.
2011-11-04 22:44:21 +00:00
Zdenek Kabelac
b8cac455bd Thin supports poolmetadatasize setting
Add option to set pool metadatasize.
For passing size parameter reuse region_size.
2011-11-04 22:43:10 +00:00
Alasdair Kergon
13dc67cda7 Add missing lvrename mirrored log recursion in for_each_sub_lv. 2011-11-04 01:31:23 +00:00
Zdenek Kabelac
1cae10a36c Thin keep pool device in the same state
Leave the optimalisation to be done differently and preserve
availability state of the pool device.
2011-11-03 15:58:20 +00:00
Zdenek Kabelac
9aa24bd034 Thin no device is created - so nothing to revert here 2011-11-03 15:46:51 +00:00
Zdenek Kabelac
466a8ebf9d Thin removing unused detach_pool_messages 2011-11-03 14:57:04 +00:00
Zdenek Kabelac
92384bfd0b Thin using update_pool_lv
Replace detach_pool_messages with update_pool_lv.
Move creation code from to 'if' condition into 1.
Ensure creation has finished all previous message operations.
2011-11-03 14:56:20 +00:00
Zdenek Kabelac
73b7bf961b Thin genering update_pool_lv function
Function to trigger pool message passing via resume,
or resize of the pool itself independently on other thins.
2011-11-03 14:53:58 +00:00
Zdenek Kabelac
a0c4e85c48 Add -tpool layer in activation tree
Let's put the overlay device over real thin pool device.
So we can get the proper locking on cluster.
Overwise the pool LV would be activate once implicitely
and in other case explicitely, confusing locking mechanism.
This patch make the activation of pool LV independent on
activation of thin LV since they will both implicitely use
real -thin pool device.
2011-11-03 14:52:09 +00:00
Zdenek Kabelac
2e732e9628 Thin api change for passing message into libdm
Avoid exposing another struct to the libdm user and
use only simple dm_tree_node_add_thin_pool_message with
2 overloaded uint64_t values.
2011-11-03 14:45:01 +00:00
Zdenek Kabelac
dc964ab0d3 Thin uses _tdata instead of _tpool for data LV
Switch to different suffix and keep -tpool reserved for overlay device name.
2011-11-03 14:38:36 +00:00
Zdenek Kabelac
daa39a1f64 Thin clean using delete_id consitently 2011-11-03 14:37:23 +00:00
Zdenek Kabelac
1f5c98270d Thin code cleanup
Use iterate_items for list processing.
2011-11-03 14:36:40 +00:00
Zdenek Kabelac
25de9addb6 Thin fix compile warns
Test for dm_snprintf < 0.
Add header for moved backup.
2011-10-30 22:52:08 +00:00
Zdenek Kabelac
7654abc26f Thin creation without activation
All thins are created with the next activation and VG is updated
without messages. Only some basic commands works.
(i.e. lvcreate -an  -V10 -T mvg/pool)
There can be some combination to confuse this system.

This functionality for snapshots is going to be interesting.
2011-10-30 22:07:38 +00:00
Zdenek Kabelac
f0df05e1dd Cleanup unsuccessfully created thin LV
If something fails during creation of thin LV remove such LV
and deactivate in case it's been already tried to activate
(i.e. thin kernel driver fails for some reason.)
2011-10-30 22:02:18 +00:00
Zdenek Kabelac
96279ac1c0 Make detach_pool_message visible for tools
Move there also vg_write and vg_commit.
2011-10-30 22:01:39 +00:00
Zdenek Kabelac
f8d46bd256 Thin cleanups
Fix/cleanup several error messages.
Remove test for seg_is_thin which could never be true there.
Replace (1<<24) with predefined constant.
2011-10-30 22:00:57 +00:00
Zdenek Kabelac
5cc2f9a257 Avoid creation of /dev/vg/thinpool 2011-10-28 20:34:45 +00:00
Zdenek Kabelac
0968dfcd03 Thin support for stripe
Support stripe options to create thin data pool LV.

TODO: combine chunk size and stripe size.
2011-10-28 20:32:54 +00:00
Zdenek Kabelac
daa10ad0fd Thin pool resize support for data LV
Support for extension of pool data LV.

TODO: figure out thin volume for suspend/resume in cluster.
2011-10-28 20:31:01 +00:00
Zdenek Kabelac
e5b12b305f Thin support for lvrename
Rename pool's metadata lv _tmeta together with pool and _tdata.
2011-10-28 20:29:32 +00:00
Zdenek Kabelac
a1d5aaf725 Thin pool activation change
To ensure we properly handle LV cluster locking - explicitely do
not allow to change the availability of the thin pool that is in use
for some thin LV.

As soon as the thin volume is created the only way to activate pool
is via implicit dependency.

Ignore thinpool open count for lv/vgchange operations.
2011-10-28 20:28:00 +00:00
Zdenek Kabelac
2721175d55 Thin output data_block_size via outsize
Use outsize to get nice size hint.
2011-10-28 20:25:08 +00:00
Zdenek Kabelac
2b71bcd0cb Improve lv_extend stack reporting
and some code cleanup with setting return value.
2011-10-28 20:23:24 +00:00
Zdenek Kabelac
c590a9cdbc Thin error messages clenaup and some indent 2011-10-28 20:19:26 +00:00
Zdenek Kabelac
dd3bb2bac3 Remove thin code from mirror/raid lv_extend 2011-10-28 20:18:32 +00:00
Zdenek Kabelac
2fa836e843 Extend virtual segment instead of adding new one
Before adding a new virtual segment to LV, check first whether
the last segment isn't already of the same type. In this case
extend last segment instead of creating the new one.

Thin volumes should have always only 1 virtual segment, but it
helps also to virtual snapshot or error segtype..
2011-10-28 20:17:55 +00:00
Zdenek Kabelac
bd4b840879 Add last_seg
Implement a function to return the last segment in a LV.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2011-10-28 20:12:54 +00:00
Zdenek Kabelac
7ad1c43b48 Add find_config_tree_str_allow_empty
Add function to allow read of empty strings as valid arguments.
Add a warning message if string argument has ignored value.
2011-10-28 20:06:49 +00:00
Jonathan Earl Brassow
682309e0b8 Disallow 'mirrored' log for cluster mirrors.
Git commit ID 0864378250 was meant to disallow
'mirrored' logs for cluster mirrors.  However, when add_mirror_log is used
to create the log (as is now the case when using 'lvcreate' or converting only
the log) the check is bypassed.

This patch adds the check to add_mirror_log.
2011-10-25 13:17:04 +00:00
Zdenek Kabelac
eafbdf3029 Don't print char type[8] as a plain string
pvck prints 'extra' character from the label since there is no '\0'
after the struct label entry and just uint64_t follows directly.
So avoid it by limiting 8 chars to be printed.

https://www.redhat.com/archives/lvm-devel/2011-January/msg00109.html

Signed-off-by: Paul Bolle <pebolle tiscali nl>
2011-10-24 10:24:39 +00:00
Zdenek Kabelac
f2c56bc3b6 Drop mempool parameter from read functions
Use implicit vgmem pool.
2011-10-23 16:05:45 +00:00
Zdenek Kabelac
72ff89d279 Always use vg memory pool for allocated lv segment
Remove mem pool parameter from alloc_lv_segment()
Since we should always allocate LV segment from the vg mempool.
2011-10-23 16:02:01 +00:00
Zdenek Kabelac
9e453cab1c Reduce stack size usage in print_log
As the buf2[] and locn[] can't be used at the same time, safe 1 page from
stack memory.
2011-10-22 16:52:00 +00:00
Zdenek Kabelac
aef13649ea Remove old thin code from _lv_insert_empty_sublvs
Since thin is not able to use _lv_insert_empty_sublvs,
remove its appearence from this function.

Start to use extend_pool() function for desired functionality
and modify lv_extend() for this.
2011-10-22 16:48:59 +00:00
Zdenek Kabelac
dc225f58a9 Remove extra empty check
dm_list_splice handles empty list itself, no need to duplicate code.
2011-10-22 16:46:34 +00:00
Zdenek Kabelac
5d8f78a6c0 Consistently use metadata LV as the first in MDA
Cosmetic cleanup.
Mark LV as thin pool before calling attach_pool functions.
2011-10-22 16:45:25 +00:00
Zdenek Kabelac
f4c77bd0e3 Recoded way to insert thin pool into vg
Code in _lv_insert_empty_sublvs  was not able to provide proper
initialization order for thin pool LV.

New function extend_pool() first adds metadata segment to pool LV which
is still visible. Such LV is activate and cleared.

Then new meta LV is created and metadata segments are moved there.
Now the preallocated pool data segment is attached to the pool LV
and layer _tpool is created. Finaly segment is marked as thin_pool.
2011-10-22 16:44:23 +00:00
Zdenek Kabelac
06b8248d63 Make move_lv_segment non-static
This function could be useful for other _manip source files.

Use dm_list manipulation function for provided functionality,
which make the code more readable and avoid touching list
internal details here.
2011-10-22 16:42:10 +00:00
Alasdair Kergon
dbd60cf576 Pass exclusive LV locks to all nodes in the cluster.
This was the intended behaviour, as described in the lvchange man page, so you
have complete control through volume_list in lvm.conf, but the code seems to
have been treating -ae as local-only for a very long time.
2011-10-21 15:49:45 +00:00
Zdenek Kabelac
f0c9160df4 Store transaction_id with created thin lv
So we know the creation history and this should be useful with vgcfgrestore.
2011-10-21 11:38:35 +00:00
Zdenek Kabelac
4d925f5785 Remove double-hack for setting metadata size
Drop the second lv_extend and set 128MB directly in the first hack place.
2011-10-21 09:55:50 +00:00
Zdenek Kabelac
3bc417488d Thin pool now support chunk size as well
Use chunksize option to specify data_block_size for thin pool target.
Drop low_water_mark to zero.
2011-10-21 09:55:07 +00:00
Zdenek Kabelac
22f40c4efe Ensure right activation order
Couple FIXMEs put into the code for parts of the code which may be
improved later, since we might be able to add 'lazy' device creation later.
For now require exclusive activation.
2011-10-20 10:35:14 +00:00
Zdenek Kabelac
79c1f9fcf4 Reindent code
Avoid 1 indent level and use check for empty list only for
add of transaction_id message.
2011-10-20 10:32:29 +00:00
Zdenek Kabelac
7b199dc599 Use const pointers in thin API were appropriate 2011-10-20 10:31:27 +00:00
Zdenek Kabelac
d1a259d867 Print low_water_mark only when it has some value
Do not expose low_water_mark in mda yet, if it has no use.
We do not allow to be set via current lvm tool code.
Usage needs to be clarified first.
2011-10-20 10:30:39 +00:00
Zdenek Kabelac
3f53c059e9 Add _BLOCK_ to define
Use DM_THIN_MIN_DATA_BLOCK_SIZE and
DM_THIN_MAX_DATA_BLOCK_SIZE to make it more obvious, for which
this define is useful in thin API.
2011-10-20 10:28:41 +00:00
Zdenek Kabelac
759b9592ba Update error message
Drop INTERNAL_ERROR from public API functions.
Improve some messages.
2011-10-19 16:42:14 +00:00
Zdenek Kabelac
8de912b677 Simple validation of messages in mda
Check we do not combine multiple messages for same LV target
and switch to use  'delete_id' to make it clear for what this device_id
is being used.
2011-10-19 16:39:09 +00:00
Zdenek Kabelac
3dcce042f6 Drop messages referencing deleted LV
lvremove may remove problematic LV for thin target.
2011-10-19 16:37:30 +00:00
Zdenek Kabelac
97d0f72c92 Just indent changes
Some tabs & spaces.
2011-10-19 16:36:39 +00:00
Zdenek Kabelac
b04e977851 Remove test for thin_pool
Since both functions are called during mda read - we don't have full LV info
at this moment.
2011-10-19 16:32:34 +00:00
Zdenek Kabelac
92cdc25882 Drop messages from lvm app context
(revert)
Thinp target uses activation context.
2011-10-17 14:18:07 +00:00
Zdenek Kabelac
1f7edce804 Indent debug message 2011-10-17 14:17:30 +00:00
Zdenek Kabelac
a25434a3a3 Message support for thin provisiong
lvm part of messaging.

Each message is now stored it's own thin pool section:

message1 {
	create = lv
}

Messages are queued to thin pool dm target when this target
is going to be resumed or used through some dependency.

Currently  'delete' message are purely queued and processed
with next thin pool resume operation (i.e. create_thin).

WARNING - thin provisioning support is developmental code.
2011-10-17 14:17:09 +00:00
Jonathan Earl Brassow
a551de6152 Use a more correct macro for 'seg_is_linear'
It is better to check 'seg->area_count == 1' than '!seg->stripe_size'.
2011-10-14 14:21:32 +00:00
Zdenek Kabelac
7f815706ca Fix lv_info open_count test
When verify_udev_operations was disable, code for stacking fs operation for
lvm links was completely disable - but this code was also used for collecting
information, that a new node is being created.

Add a new flag which is set when a creation of lv symlinks is requested which
should restore old behaviour of lv_info function, that has called fs_sync()
before quere for open count on device.
2011-10-14 13:23:47 +00:00
Zdenek Kabelac
7a6600b148 Use constant for the repeated dlid size specification 2011-10-11 10:02:28 +00:00
Zdenek Kabelac
57f4dfc653 Reduce preallocated stack size
Go with just 64KiB for stack.

Closer inspection should be made, whether we actually need to play with
settings at all.

Since default stack size is 8MB and gets mapped via page locking thus,
it seems there is no big help with preallocation of stack to some value.
2011-10-11 09:13:39 +00:00
Zdenek Kabelac
d4f134b8f6 Check for refresh_filter failure
Properly detect if the filters were refreshed properly.

(May needs few more fixes ??)

Filter refresh may fail because it may be out of free file descriptors
when clvmd gets overloaded.
2011-10-11 09:09:00 +00:00
Zdenek Kabelac
8187aff8b9 Add missing log_error for alloc failure 2011-10-11 09:06:09 +00:00
Zdenek Kabelac
df251f14dc Use shorter way for if() 2011-10-11 09:03:33 +00:00
Zdenek Kabelac
3df790d9fd Skip backtrace after log_error 2011-10-11 09:02:20 +00:00
Zdenek Kabelac
2abe28a8c6 Replace with debug
Since the dm_tree_create already reports reason of error,
use log_debug for this message.
2011-10-11 09:01:38 +00:00
Zdenek Kabelac
de75bc6688 Improve backtrace reporting
Add <backtrace> so the function appears logged for the fail path.
2011-10-11 08:59:42 +00:00
Zdenek Kabelac
4007ac814f Change message severity
Using log_warn to report missing symlinks as warning, since the command
itself returns as successful, we should not produce log_error().
log_warn is better fit here.
2011-10-11 08:57:13 +00:00
Zdenek Kabelac
409bf6e6d8 Skip r assignment
Cosmetic, since r is already 0 for the error path, no need to assign it there,
and r is assigned to 1 after switch command.
Also makes the code more readable.
2011-10-11 08:54:01 +00:00
Zdenek Kabelac
5940327f3a Reindent some thin functions 2011-10-11 08:51:56 +00:00
Jonathan Earl Brassow
f60175c308 Add the ability to convert LVs of "mirror" segtype to "raid1" segtype.
Example:
~> lvconvert --type raid1 vg/mirror_lv

Steps to convert "mirror" to "raid1"
1) Allocate a RAID metadata LV for each mirror image from the same PVs
   on which they are located.
2) Clear the metadata LVs.  This involves writing LVM metadata, so we don't
   change any aspects of the mirror LV before this so that the user can easily
   remove LVs from the failed convert attempt while retaining the original
   mirror.
3) Remove the mirror log, if it exists.
4) Add metadata LVs to mirror LV
5) Rename mirror sub-lvs (s/mimage/rimage/)
6) Change flags and segtype from mirror to raid1
2011-10-07 14:56:01 +00:00
Jonathan Earl Brassow
d3582e0252 Add the ability to convert linear LVs to RAID1
Example:
~> lvconvert --type raid1 -m 1 vg/lv

The following steps are performed to convert linear to RAID1:
1) Allocate a metadata device from the same PV as the linear device
   to provide the metadata/data LV pair required for all RAID components.
2) Allocate the required number of metadata/data LV pairs for the
   remaining additional images.
3) Clear the metadata LVs.  This performs a LVM metadata update.
4) Create the top-level RAID LV and add the component devices.

We want to make any failure easy to unwind.  This is why we don't create the
top-level LV and add the components until the last step.  Should anything
happen before that, the user could simply remove the unnecessary images.  Also,
we want to ensure that the metadata LVs are cleared before forming the array to
prevent stale information from polluting the new array.

A new macro 'seg_is_linear' was added to allow us to distinguish linear LVs
from striped LVs.
2011-10-07 14:52:26 +00:00
Jonathan Earl Brassow
a80192b6a7 Allow 'nosync' extension of mirrors.
This patch allows a mirror to be extended without an initial resync of the
extended portion.  It compliments the existing '--nosync' option to lvcreate.
This action can be done implicitly if the mirror was created with the '--nosync'
option, or explicitly if the '--nosync' option is used when extending the device.

Here are the operational criteria:
1) A mirror created with '--nosync' should extend with 'nosync' implicitly
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv ; lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 10.00g                         lv_mlog 100.00

2) The 'M' attribute ('M' signifies a mirror created with '--nosync', while 'm'
signifies a mirror created w/o '--nosync') must be preserved when extending a
mirror created with '--nosync'.  See #1 for example of 'M' attribute.

3) A mirror created without '--nosync' should extend with 'nosync' only when
'--nosync' is explicitly used when extending.
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv; lvs vg
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 20.00m                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 5.02 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 5.02g                         lv_mlog   0.39
vs.
[EXAMPLE]# lvs vg; lvextend -L +5G vg/lv --nosync; lvs vg
  LV   VG   Attr     LSize  Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   mwi-a-m- 20.00m                         lv_mlog 100.00
  Extending 2 mirror images.
  Extending logical volume lv to 5.02 GiB
  Logical volume lv successfully resized
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log     Copy%  Convert
  lv   vg   Mwi-a-m- 5.02g                         lv_mlog 100.00

4) The 'm' attribute must change to 'M' when extending a mirror created without
'--nosync' is extended with the '--nosync' option.  (See #3 examples above.)

5) An inactive mirror's sync percent cannot be determined definitively, so it
must not be allowed to skip resync.  Instead, the extend should ask the user if
they want to extend while performing a resync.
[EXAMPLE]# lvchange -an vg/lv
[EXAMPLE]# lvextend -L +5G vg/lv
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  vg/lv is not active.  Unable to get sync percent.
Do full resync of extended portion of vg/lv?  [y/n]: y
  Logical volume lv successfully resized

6) A mirror that is performing recovery (as opposed to an initial sync) - like
after a failure - is not allowed to extend with either an implicit or
explicit nosync option.  [You can simulate this with a 'corelog' mirror because
when it is reactivated, it must be recovered every time.]
[EXAMPLE]# lvcreate -m1 -L 5G -n lv vg --nosync --corelog
  WARNING: New mirror won't be synchronised. Don't read what you didn't write!
  Logical volume "lv" created
[EXAMPLE]# lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                             100.00
[EXAMPLE]# lvchange -an vg/lv; lvchange -ay vg/lv; lvs vg
  LV   VG   Attr     LSize Pool Origin Snap%  Move Log Copy%  Convert
  lv   vg   Mwi-a-m- 5.00g                               0.08
[EXAMPLE]# lvextend -L +5G vg/lv
  Extending 2 mirror images.
  Extending logical volume lv to 10.00 GiB
  vg/lv cannot be extended while it is recovering.

7) If 'no' is selected in #5 or if the condition in #6 is hit, it should not
result in the mirror being resized or the 'm/M' attribute being changed.


NOTE:  A mirror created with '--nosync' behaves differently than one created
without it when performing an extension.  The former cannot be extended when
the mirror is recovering (unless in-active), while the latter can.  This is
a reasonable thing to do since recovery of a mirror doesn't take long (at
least in the case of an on-disk log) and it would cause far more time in
degraded mode if the extension w/o '--nosync' was allowed.  It might be
reasonable to add the ability to force the operation in the future.  This
should /not/ force a nosync extension, but rather force a sync'ed extension.
IOW, the user would be saying, "Yes, yes... I know recovery won't take long
and that I'll be adding significantly to the time spent in degraded mode, but
I need the extra space right now!".
2011-10-06 15:32:26 +00:00
Jonathan Earl Brassow
b19f01212e Fix splitmirror in cluster having different DM/LVM views of storage.
This patch also does some clean-up of the splitmirrors code.

I've attempted to clean-up the splitmirrors code to make it easier to
understand with fewer operations.  I've tried to reduce the number of
metadata operations without compromising the intermediate stages which
are necessary for easy clean-up in the even of failure.

These changes now correctly handle cluster situations - including exclusive
cluster mirrors.  Whereas before, a splitmirror operation would result in
remote nodes having LVM commands report the newly split LV with a proper
name while DM commands would report the old (pre-split) names of the device.
IOW, there was a kernel/userspace mismatch.
2011-10-06 14:55:39 +00:00
Jonathan Earl Brassow
6c0b0e5d9a Revert initial solution to bug 733114 - I/O error message during splitmirror
The original commit comments can be located via this git commit ID:
	7d8e615c0b

There were three possible solutions to the original problem proposed in the
initial check-in.  The one chosen was as follows:
    2) Do like _remove_mirror_images does and suspend the original, then suspend
    the sub-lv (the error target), then resume the sub-lv, and finally resume the
    original LV.  This seems like extra pointless operations to me, but it doesn't
    produce the error message (although, I'm not sure why) and it allows us to
    leave the visible flag in place.
Turns out, the cluster also views the extra suspend/resume operations as
pointless too and ignores them.  So, this solution doesn't work in a cluster.
Further, I've noticed that in addition to the remote cluster nodes still getting
I/O errors from scanning the error target, they also have a different LVM and
DM views of the same LV.  IOW, while the LVM level (gotten from the LVM metadata)
sees the correct name for the newly split LV, device-mapper still maintains the
old names.

Because the original fix failed to completely fix the problem (or work-around it)
and because a better solution must be found to address the additional cluster
issue of device renaming, I am reverting the above mentioned commit.
2011-10-06 14:49:16 +00:00
Jonathan Earl Brassow
83c606ae30 This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs).  This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.

A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed.  The segments of the image which is not split off to form the new
LV are transferred to the top-level LV.  This is done so that the original
LV can maintain its major/minor, UUID, and name.  The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed.  (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment.  If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev.  This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs.  This is especially bad during failure
conditions.

When the mirror is suspended, the top-level along with its sub-LVs are
suspended.  The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed.  When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies.  This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
	_add_dev
	dm_tree_add_dev_with_udev_flags
	<*** DM / LVM divide ***>
	_add_dev_to_dtree
	_add_lv_to_dtree
	_create_partial_dtree
	_tree_action
	dev_manager_activate
	_lv_activate_lv
	_lv_resume
	lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure.  Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags.  This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure.  In fact,
'_add_dev' simply reuses the udev_flags given for the initial device!  This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism.  The current code provides no such
mechanism.  Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags.  The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.

My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children.  It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies.  It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.

This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 14:45:40 +00:00
Zdenek Kabelac
151ed8d935 Add more validation to config parser
Do not leave it for vgvalidate().
2011-10-06 11:06:36 +00:00
Zdenek Kabelac
565a4bfc49 Move defines to header
Make limits for thin data_block_size and device_id part of public API.

FIXME: read them possible from some kernel header file in the future ?
But we may need to support different values for different versions ?
2011-10-06 11:05:56 +00:00
Zdenek Kabelac
c0b9c64a77 Use capital letters 2011-10-04 12:39:59 +00:00
Zdenek Kabelac
01ef6510b0 Missed rename pool->thin_pool
Fix compilation
2011-10-03 19:10:52 +00:00
Zdenek Kabelac
04a4715cb8 Add code to activate thin target
Code to zero pool metadata lv when pool is created.
Add code to create thin target via message sending.

(Revert is missing)
2011-10-03 18:43:39 +00:00
Zdenek Kabelac
d35a117e4b Add simple function for lookup of some free device_id
Initial simple implementation for finding some free device_id.
2011-10-03 18:39:17 +00:00
Zdenek Kabelac
a00cb3a6b0 Add lvm functions for sending messages.
Functions are currently only needed for thin provissioning.
2011-10-03 18:37:47 +00:00
Zdenek Kabelac
97bde15a9f Display transaction_id for thin_pool 2011-10-03 18:31:03 +00:00
Zdenek Kabelac
1419bf1c98 Transaction_id is property of thin_pool
Remove Transaction_id from thin target.
Store device_id for thin target.
2011-10-03 18:26:07 +00:00
Zdenek Kabelac
87663d5f88 Add preload support for thin and thin_pool 2011-10-03 18:24:47 +00:00
Zdenek Kabelac
38796c3d47 Fix bad error message for thinp validation 2011-09-29 09:03:36 +00:00
Zdenek Kabelac
aebf2d5cdc Add experimental code for activation of thinp targets
No dm messages yes - just a base functionality in the steps of other targets.
For now usable only for debugging and tracing.
2011-09-29 08:56:38 +00:00
Alasdair Kergon
10d0d9c7c4 Introduce revert_lv for better pvmove cleanup.
(One further fix needed to remove the stray pvmove LVs left behind.)
2011-09-27 22:43:40 +00:00
Alasdair Kergon
1c26860d82 Abort if _finish_pvmove suspend_lvs fails instead of cleaning up incompletely.
Change suspend_lvs to call vg_revert internally.
Change vg_revert to void and remove superfluous calls after failed vg_commit.
2011-09-27 17:09:42 +00:00
Alasdair Kergon
d71fd30e5d typo 2011-09-27 12:34:14 +00:00
Alasdair Kergon
7c67d33dd4 correct thin_pool width 2011-09-27 12:33:36 +00:00
Zdenek Kabelac
1d526c8585 Show some Thin related info in lvdisplay 2011-09-26 13:11:02 +00:00
Peter Rajnoha
c3e5b4976d Add log_error even for general device in use when we can't do the sysfs checks. 2011-09-26 10:17:51 +00:00
Zdenek Kabelac
f1ab501a58 Fix log_error() usage
Cosmetic - skip <bactrace> when error has been just printed in raid segtype.
Add missing log_error if allocation would fail for unknown segtype.
2011-09-24 21:19:30 +00:00
Jonathan Earl Brassow
efa3621a59 Add 'Volume Type' lv_attr characters for RAID and RAID_IMAGE.
RAID_META is already handled.
2011-09-23 15:17:54 +00:00
Peter Rajnoha
9fa1d30a1c Add activation/retry_deactivation to lvm.conf to retry deactivation of an LV. 2011-09-22 17:39:56 +00:00
Peter Rajnoha
125712bea0 Replace open_count check with holders/mounted_fs check on lvremove path.
Before, we used to display "Can't remove open logical volume" which was
generic. There 3 possibilities of how a device could be opened:
  - used by another device
  - having a filesystem on that device which is mounted
  - opened directly by an application

With the help of sysfs info, we can distinguish the first two situations.
The third one will be subject to "remove retry" logic - if it's opened
quickly (e.g. a parallel scan from within a udev rule run), this will
finish quickly and we can remove it once it has finished. If it's a
legitimate application that keeps the device opened, we'll do our best
to remove the device, but we will fail finally after a few retries.
2011-09-22 17:33:50 +00:00
Jonathan Earl Brassow
40c85cf1d7 When up-converting a RAID1 array, we need to allocate new larger arrays for
seg->areas and seg->meta_areas.  We also need to copy the memory from the
old arrays to the newly allocated arrays.  The amount of memory to copy was
determined by seg->area_count.  However, seg->area_count was being set to the
higher value after copying the 'seg->areas' information, but before copying
the 'seg->meta_areas' information.  This means we were copying more memory
than necessary for 'seg->meta_areas' - something that could lead to a segfault.
2011-09-22 15:33:21 +00:00
Zdenek Kabelac
ce840163c0 Revert patch
Caller of exec must report log_error when rstatus is passed.
2011-09-19 18:38:43 +00:00
Zdenek Kabelac
4eeff46bf2 Use log_error instead of log_verbose when executed command fails 2011-09-19 14:54:23 +00:00
Jonathan Earl Brassow
4026cb6fd1 fix compiler warning.
Compiler says variable may be used uninitialized.  It can't be, but we
initialize the variable to NULL anyway.  Also, remove the double initialization
of another variable.
2011-09-19 14:28:23 +00:00
Zdenek Kabelac
5f3f06db66 Move debug message
so it does not look like we are executing command in the middle of
critical_section in log trace.
2011-09-19 12:48:02 +00:00
Jonathan Earl Brassow
eb607100ef Fix Bug 738832 - core to disk log conversion fails with internal error
This bug showed up when trying to add a log to a mirror whose images are on
multiple devices.  This is an intra-release regression and no WHATS_NEW
entry will be added.  The error was introduce in the following commit:
	2d8a2f35c7

The solution is to recognise in _alloc_init that if there are no mirrors
or stripes specified, then 'new_extents' should be zero.
2011-09-16 18:39:03 +00:00