1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-10-28 20:25:52 +03:00
Commit Graph

85 Commits

Author SHA1 Message Date
Zdenek Kabelac
84cdf85bd2 cleanup: constify activation usage of lv pointer
Let's enforce cheking of write access to LV by compiler.
Activation part does never need to write anything to LV
so keep LV pointer const.
2014-09-24 10:54:47 +02:00
Alasdair G Kergon
979be63f25 mirrors: Fix checks for mirror/raid/pvmove LVs.
Try to enforce consistent macro usage along these lines:

lv_is_mirror - mirror that uses the original dm-raid1 implementation
               (segment type "mirror")
lv_is_mirror_type - also includes internal mirror image and log LVs

lv_is_raid - raid volume that uses the new dm-raid implementation
             (segment type "raid")
lv_is_raid_type - also includes internal raid image / log / metadata LVs

lv_is_mirrored - LV is mirrored using either kernel implementation
                 (excludes non-mirror modes like raid5 etc.)

lv_is_pvmove - internal pvmove volume
2014-09-16 00:13:46 +01:00
Zdenek Kabelac
ae08a3a294 cleanup: skip unused assign
Reset of tmp_names is only needed in else{} path.
2014-09-12 13:51:31 +02:00
Zdenek Kabelac
07b3e6cd74 cleanup: avoid strlen() we know max size
Just use max NAME_LEN size buffer and copy the name.
2014-09-12 13:51:31 +02:00
Zdenek Kabelac
ab7977de7b cleanup: simplify _extract_image_components
Reorder test - first check for writable flag and then allocate.
2014-09-12 13:51:31 +02:00
Zdenek Kabelac
6898131091 cleanup: missing error message 2014-09-12 13:51:31 +02:00
Zdenek Kabelac
3e57143abd cleanup: better error messages 2014-09-12 13:51:30 +02:00
Zdenek Kabelac
08914ed7c1 raid: destroy allocation handle on error path
Don't leak ah memory pool on error path.
2014-09-12 13:51:30 +02:00
Zdenek Kabelac
76c3c94bd2 cleanup: update _alloc_image_component function
Return allocated volume directly instead of 1/0.
2014-09-12 13:51:30 +02:00
Zdenek Kabelac
126463ad1f cleanup: plain code reindent
Just simple reindent and brace changes.
2014-09-12 13:51:30 +02:00
Zdenek Kabelac
ad376e9e00 debug: add missing stack trace on error path 2014-09-12 13:51:29 +02:00
Zdenek Kabelac
c10c16cc35 raid: use _generate_raid_name
Use new function to get implicit name validation
(so we do not exit with internal error on metadata validation).
2014-09-12 13:51:29 +02:00
Zdenek Kabelac
2db0312455 raid: add function for name creation
Add name for construction and validation of raid subvolume
name with a given suffix.

TODO: check if reusable for mirrors as well.
2014-09-12 13:51:29 +02:00
Zdenek Kabelac
40b7b107b1 raid: check result of get_segtype_from_string
Error here is rather highly unpexpected for these types, but
stay consistent with rest of the code and don't use unchecked value.
2014-09-12 13:45:50 +02:00
Zdenek Kabelac
08bde75093 raid: add missing archive call
Before starting to update raid metadata, archive existing unmodified one.
2014-09-12 13:45:49 +02:00
Zdenek Kabelac
569184a3bb raid: add missing vg_revert
After failing vg_write() and suspend_lv() there was missing vg_revert() call.
2014-09-12 13:45:14 +02:00
Zdenek Kabelac
dd1fa0e808 raid: add missing backups
Add backup() calls that were missing after successful update
of metadata.
2014-09-12 13:42:57 +02:00
Zdenek Kabelac
c710f02e01 lv_update_and_reload: replace code sequence
Use lv_update_and_reload() and lv_update_and_reload_origin()
to handle write/suspend/commit/resume sequence.

In few places this properly handle vg_revert() after suspend failure,
and also ensures there is metadata backup after successful vg_commit().
2014-09-09 19:20:09 +02:00
Alasdair G Kergon
99e3c13012 raid: Moved degraded activation code to raid_manip.
Adjust some messages & fn names.
2014-07-22 20:50:29 +01:00
Jonathan Brassow
ed3c2537b8 raid: Allow repair to reuse PVs from same image that suffered a PV failure
When repairing RAID LVs that have multiple PVs per image, allow
replacement images to be reallocated from the PVs that have not
failed in the image if there is sufficient space.

This allows for scenarios where a 2-way RAID1 is spread across 4 PVs,
where each image lives on two PVs but doesn't use the entire space
on any of them.  If one PV fails and there is sufficient space on the
remaining PV in the image, the image can be reallocated on just the
remaining PV.
2014-06-25 22:26:06 -05:00
Jonathan Brassow
b35fb0b15a raid/misc: Allow creation of parallel areas by LV vs segment
I've changed build_parallel_areas_from_lv to take a new parameter
that allows the caller to build parallel areas by LV vs by segment.
Previously, the function created a list of parallel areas for each
segment in the given LV.  When it came time for allocation, the
parallel areas were honored on a segment basis.  This was problematic
for RAID because any new RAID image must avoid being placed on any
PVs used by other images in the RAID.  For example, if we have a
linear LV that has half its space on one PV and half on another, we
do not want an up-convert to use either of those PVs.  It should
especially not wind up with the following, where the first portion
of one LV is paired up with the second portion of the other:
------PV1-------  ------PV2-------
[ 2of2 image_1 ]  [ 1of2 image_1 ]
[ 1of2 image_0 ]  [ 2of2 image_0 ]
----------------  ----------------
Previously, it was possible for this to happen.  The change makes
it so that the returned parallel areas list contains one "super"
segment (seg_pvs) with a list of all the PVs from every actual
segment in the given LV and covering the entire logical extent range.

This change allows RAID conversions to function properly when there
are existing images that contain multiple segments that span more
than one PV.
2014-06-25 21:20:41 -05:00
Peter Rajnoha
3208396ce5 coverity: fix issues reported by coverity 2014-06-24 14:58:53 +02:00
Peter Rajnoha
cfed0d09e8 report: select: refactor: move percent handling code to libdm for reuse 2014-06-17 16:27:21 +02:00
Zdenek Kabelac
9240aca369 raid: cleanup error messages
Add log_error messages on error paths.
2014-05-27 17:08:49 +02:00
Jonathan Brassow
6c6468f91d RAID: Improve an error message
When down-converting a RAID1 LV, if the user specifies too few devices,
they will get a confusing message.
Ex:
[root]# lvcreate -m 2 --type raid1 -n raid -L 500M taft
  Logical volume "raid" created

[root]# lvconvert -m 0 taft/raid /dev/sdd1
  Unable to extract enough images to satisfy request
  Failed to extract images from taft/raid

This patch makes the error message a bit clearer by telling the user
the count they are trying to remove and the number of devices they
supplied.

[root@bp-01 lvm2]# lvcreate --type raid1 -m 3 -L 200M -n lv vg
  Logical volume "lv" created

[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sdb1
  Unable to remove 3 images:  Only 1 device given.
  Failed to extract images from vg/lv

[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bc]1
  Unable to remove 3 images:  Only 2 devices given.
  Failed to extract images from vg/lv

[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcd]1
[root@bp-01 lvm2]# lvs -a -o name,attr,devices vg
  LV   Attr       Devices
  lv   -wi-a----- /dev/sde1(1)

This patch doesn't work in all cases.  The user can specify the right
number of devices, but not a sufficient amount of devices from the LV.
This will produce the old error message:
[root@bp-01 lvm2]# lvconvert -m -3 vg/lv /dev/sd[bcf]1
  Unable to extract enough images to satisfy request
  Failed to extract images from vg/lv
However, I think this error message is sufficient for this case.
2014-04-03 16:57:41 -05:00
Jonathan Brassow
4b6e3b5e5e allocation: Allow approximate allocation when specifying size in percent
Introduce a new parameter called "approx_alloc" that is set when the
desired size of a new LV is specified in percentage terms.  If set,
the allocation code tries to get as much space as it can but does not
fail if can at least get some.

One of the practical implications is that users can now specify 100%FREE
when creating RAID LVs, like this:
~> lvcreate --type raid5 -i 2 -l 100%FREE -n lv vg
2014-02-13 21:10:28 -06:00
Zdenek Kabelac
ef6c5795a0 raid: add temporary activation for raid metadata clear
Use LV_TEMPORARY when activating devices for clearing
raid metadata.
2014-02-04 14:51:05 +01:00
Zdenek Kabelac
8c96afd361 cleanup: use compound literals for wipe_lv
Optimize and cleanup recently introduced new function wipe_lv.
Use compound literals to get nicely initialized wipe_params struct.
Pass in lv as explicit argument for wipe_lv.
Use cmd from lv structure.
Initialize only non-null members so it's easy to see what
is the special arg.
2013-11-28 12:45:52 +01:00
Peter Rajnoha
b6dab4e059 lv_manip: rename set_lv -> wipe_lv and include signature wiping capability
Use common wipe_lv (former set_lv) fn to do zeroing as well as signature
wiping if needed. Provide new struct wipe_lv_params to define the
functionality.

Bind "lvcreate -W/--wipesignatures y" with proper wipe_lv call.

Also, add "yes" and "force" to lvcreate_params so it's possible
to apply them for the prompt: "WARNING: %s detected on %s. Wipe it? [y/n]".
2013-11-27 15:48:15 +01:00
Jonathan Brassow
2691f1d764 RAID: Make RAID single-machine-exclusive capable in a cluster
Creation, deletion, [de]activation, repair, conversion, scrubbing
and changing operations are all now available for RAID LVs in a
cluster - provided that they are activated exclusively.

The code has been changed to ensure that no LV or sub-LV activation
is attempted cluster-wide.  This includes the often overlooked
operations of activating metadata areas for the brief time it takes
to clear them.  Additionally, some 'resume_lv' operations were
replaced with 'activate_lv_excl_local' when sub-LVs were promoted
to top-level LVs for removal, clearing or extraction.  This was
necessary because it forces the appropriate renaming actions the
occur via resume in the single-machine case, but won't happen in
a cluster due to the necessity of acquiring a lock first.

The *raid* tests have been updated to allow testing in a cluster.
For the most part, this meant creating devices with '-aey' if they
were to be converted to RAID.  (RAID requires the converting LV to
be EX because it is a condition of activation for the RAID LV in
a cluster.)
2013-09-10 16:33:22 -05:00
Jonathan Brassow
ca51435153 Misc/RAID: Enable resume_lv to handle some renaming conflicts.
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps.  For
example, if there is a 3-way mirror:
	[0][1][2]
and we remove device#0, the devices will be shifted down
	[1][2]
and renamed.
	[0][1]

This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though.  This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name.  The solution is to check for a conflicting name before
attempting to rename.  If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.

Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV.  This "hack"
only worked for single machine use-cases of LVM.  Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
2013-09-09 15:07:28 -05:00
Jonathan Brassow
f1e3640df3 Misc: Make get_pv_list_for_lv() available to more than just RAID
The function 'get_pv_list_for_lv' will assemble all the PVs that are
used by the specified LV.  It uses 'for_each_sub_lv' to traverse all
of the sub-lvs which may compose it.
2013-08-23 08:40:13 -05:00
Jonathan Brassow
06ac797f42 Clean-up: Replace 'lv_is_active' with more correct/specific variants
There are places where 'lv_is_active' was being used where it was
more correct to use 'lv_is_active_locally'.  For example, when checking
for the existance of a kernel instance before asking for its status.
Most of the time these would work correctly.  (RAID is only allowed on
non-clustered VGs at the moment, which means that 'lv_is_active' and
'lv_is_active_locally' would give the same result.)  However, it is
more correct to use the proper variant and it helps with future
scenarios where targets might be allowed exclusively (or clustered) in
a cluster VG.
2013-05-16 10:36:56 -05:00
Zdenek Kabelac
dd4fdce16c cleanup: drop unused assignment
Assigned values are unused.
2013-04-21 23:14:04 +02:00
Zdenek Kabelac
5e7eae59da lv_manip: check remove_seg_from_segs_using_this_lv()
Add missing check for result of remove_seg_from_segs_using_this_lv().
Failure is reported as internal error.
2013-04-21 23:10:43 +02:00
Zdenek Kabelac
24f8daa13d raid: test for target_pvs
If target_pvs is NULL do not call lv_is_on_pvs()
2013-04-21 23:07:00 +02:00
Jonathan Brassow
2e0740f7ef RAID: Add writemostly/writebehind support for RAID1
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics.  The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value.  If no trailing
character is given, it will set the flag.
Synopsis:
        lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
        lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv

The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set.  It is signified with a 'w'.  If the device
has failed, the 'p'artial flag has priority.

Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   Rwi---r-m    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-w    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r--    1 linear   4.00m

Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
  LV                VG   Attr      #Str Type   SSize
  raid1             vg   rwi---r-p    2 raid1  500.00m
  [raid1_rimage_0]  vg   Iwi---r--    1 linear 500.00m
  [raid1_rimage_1]  vg   Iwi---r-p    1 linear 500.00m
  [raid1_rmeta_0]   vg   ewi---r--    1 linear   4.00m
  [raid1_rmeta_1]   vg   ewi---r-p    1 linear   4.00m

A new reportable field has been added for writebehind as well.  If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--     512
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--

Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
  LV            Attr      WBehind
  lv            rwi-a-r--
  [lv_rimage_0] iwi-aor-w
  [lv_rimage_1] iwi-aor--
  [lv_rmeta_0]  ewi-aor--
  [lv_rmeta_1]  ewi-aor--
2013-04-15 13:59:46 -05:00
Zdenek Kabelac
2e39392daf cleanup: remove unused lvl_idx 2013-04-12 11:26:31 +02:00
Jonathan Brassow
dc2ce71313 clean-up: Remove a FIXME question that has been settled
It is ok for us to use the shorthand 'lv_is_virtual' to detect error
targets in a RAID LV when searching for candidates for device replacement.
2013-02-20 15:03:58 -06:00
Jonathan Brassow
bd0ee420b5 RAID: Allow remove/replace of sub-LVs composed of error segments.
When a device fails, we may wish to replace those segments with an
error segment.  (Like when a 'vgreduce --removemissing' removes a
failed device that happens to be a RAID image/meta.)  We are then left
with images that we will eventually want to remove or replace.

This patch allows us to pull out these virtual "error" sub-LVs.  This
allows a user to 'lvconvert -m -1 vg/lv' to extract the bad sub-LVs.
Sub-LVs with error segments are considered for extraction before other
possible devices so that good devices are not accidentally removed.

This patch also adds the ability to replace RAID images that contain error
segments.  The user will still be unable to run 'lvconvert --replace'
because there is no way to address the 'error' segment (i.e. no PV
that it is associated with).  However, 'lvconvert --repair' can be
used to replace the image's error segment with a new PV.  This is also
the most appropriate way to do it, since the LV will continue to be
reported as 'partial'.
2013-02-20 14:58:56 -06:00
Jonathan Brassow
845852d6b4 RAID: Make 'vgreduce --removemissing' work with RAID LVs
Currently it is impossible to remove a failed PV which has a RAID LV
on it.  This patch fixes the issue by replacing the failed PV with an
'error' segment within the affected sub-LVs.  Once there is no longer
a RAID LV using the PV, it can be removed.

Most often, it is better to replace a failed RAID device with a spare.
(You can use 'lvconvert --repair <vg>/<LV>' to accomplish that.)
However, if there are no spares in the volume group and none will be
added, it is useful to be able to removed the failed device.

Following patches address the ability to perform 'lvconvert' operations
on RAID LVs that contain sub-LVs composed of 'error' segments.
2013-02-20 14:52:46 -06:00
Jonathan Brassow
0e4ffd9d3b clean-up: Rename lvm.conf setting 'mirror_region_size' to 'raid_region_size'
We have been using 'mirror_region_size' in lvm.conf as the default region
size for RAID logical volumes as well as mirror logical volumes.  Since,
"raid" is more inclusive and representative than "mirror", I have changed
the name of this setting.  We must still check for the old setting and warn
the user if we are overriding it with the new setting if both happen to be
present.
2013-02-20 14:40:17 -06:00
Alasdair G Kergon
06abb2dd4c logging: classify log_debug messages
Place most log_debug() messages into a class.
2013-01-07 22:30:29 +00:00
Jonathan Brassow
970dfbcd69 RAID: Limit replacement of devices when array is not in-sync.
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule.  This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync.  The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.

Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.

That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
   not be recoverable.

In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s).  If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
2012-12-18 14:40:42 -06:00
Jonathan Brassow
fb0cee9a66 RAID: Do not allow --splitmirrors on RAID10 logical volumes.
RAID10 does not have the ability to split off images for independent
use.  So, 'lvconvert --splitmirrors' will not work and must be
disallowed.
2012-11-21 18:39:26 -06:00
Zdenek Kabelac
f260f99d57 cleanup: switch log_error to log_warn
Use log_warn to print non-fatal warning messages.

Use of log_error would confuse checker for testing
whether proper error has been reported for some real error.
2012-10-17 15:41:35 +02:00
Zdenek Kabelac
9ee071705b cleanup: fix compiler warnings
remove unused vars
move var declarations into the front of functions.
fix some sign warnings
2012-10-12 10:25:07 +02:00
Jonathan Brassow
886656e4ac RAID: Fix problems with creating, extending and converting large RAID LVs
MD's bitmaps can handle 2^21 regions at most.  The RAID code has always
used a region_size of 1024 sectors.  That means the size of a RAID LV was
limited to 1TiB.  (The user can adjust the region_size when creating a
RAID LV, which can affect the maximum size.)  Thus, creating, extending or
converting to a RAID LV greater than 1TiB would result in a failure to
load the new device-mapper table.

Again, the size of the RAID LV is not limited by how much space is allocated
for the metadata area, but by the limitations of the MD bitmap.  Therefore,
we must adjust the 'region_size' to ensure that the number of regions does
not exceed the limit.  I've added code to do this when extending a RAID LV
(which covers 'create' and 'extend' operations) and when up-converting -
specifically from linear to RAID1.
2012-09-27 16:51:22 -05:00
Jonathan Brassow
2a6712ddef RAID1: Clear the LV_NOTSYNCED flag when a RAID1 LV is converted to linear
Failing to clear the LV_NOTSYNCED flag when converting a RAID1 LV to
linear can result in the flag being present after an upconvert - even
if the sync is performed when upconverting.
2012-09-14 16:26:53 -05:00
Jonathan Brassow
116bcb3ea4 RAID1: Like mirrors, do not allow adding images to LV created w/ --nosync
Mirrors do not allow upconverting if the LV has been created with --nosync.
We will enforce the same rule for RAID1.  It isn't hugely critical, since
the portions that have been written will be copied over to the new device
identically from either of the existing images.  However, the unwritten
sections may be different, causing the added image to be a hybrid of the
existing images.

Also, we are disallowing the addition of new images to a RAID1 LV that has
not completed the initial sync.  This may be different from mirroring, but
that is due to the fact that the 'mirror' segment type "stacks" when adding
a new image and RAID1 does not.  RAID1 will rebuild a newly added image
"inline" from the existant images, so they should be in-sync.
2012-09-14 16:12:52 -05:00