IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use separate files for raid1, raid456, raid10.
They need different target versions to work, so support
more precise test selection.
Optimize duplicate tests of target avalability and skip
unsupported test cases sooner.
In cases where PV appears on a new device without disappearing from an old one
first, the device->pvid pointers could become ambiguous. This could cause the
ambiguous PV to be lost from the cache when a different PV comes up on one of
the ambiguous devices.
This is probably not optimal, but makes the lvmetad case mimic non-lvmetad code
more closely. It also fixes vgremove of a partially corrupt VG with lvmetad, as
_vg_write_raw (and consequently, entire vg_write) currently panics when it
encounters a corrupt MDA. Ideally, we'd be able to explicitly control when it is
safe to ignore them.
%FREE allocation has been broken for RAID. At 100%FREE, there is
still an extent left for certain tests. For now, change the test
to warn rather than completely fail.
Condition was swapped - however since it's been based on 'random'
memory content it's been missed as attribute has not been set.
So now we have quite a few possible results when testing.
We have old status without separate metadata and
we have kernels with fixed snapshot leak bug.
(in-release update)
Code uses target driver version for better estimation of
max size of COW device for snapshot.
The bug can be tested with this script:
VG=vg1
lvremove -f $VG/origin
set -e
lvcreate -L 2143289344b -n origin $VG
lvcreate -n snap -c 8k -L 2304M -s $VG/origin
dd if=/dev/zero of=/dev/$VG/snap bs=1M count=2044 oflag=direct
The bug happens when these two conditions are met
* origin size is divisible by (chunk_size/16) - so that the last
metadata area is filled completely
* the miscalculated snapshot metadata size is divisible by extent size -
so that there is no padding to extent boundary which would otherwise
save us
Signed-off-by:Mikulas Patocka <mpatocka@redhat.com>
While stripe size is twice the physical extent size,
the original code will not reduce stripe size to maximum
(physical extent size).
Signed-off-by: Zhiqing Zhang <zhangzq.fnst@cn.fujitsu.com>
Switch to use ext2 to make it usable on older systems.
Previous test has not been able to catching problem.
Multiple tests are now put in.
FIXME: validate what is doing kernel target when
the header is undeleted and same chunk size is used.
It seems snapshot target successfully resumes and
just complains COW is not big enough:
kernel: dm-8: rw=0, want=40, limit=24
kernel: attempt to access beyond end of device
When chunk size is different it fails instantly.
For checking this with lvm2 and this test case use this patch:
--- a/tools/lvcreate.c
+++ b/tools/lvcreate.c
@@ -769,7 +769,7 @@ static int _read_activation_params(struct
lvcreate_params *lp,
lp->permission = arg_uint_value(cmd, permission_ARG,
LVM_READ | LVM_WRITE);
- if (lp->snapshot) {
+ if (0 && lp->snapshot) {
/* Snapshot has to zero COW header */
lp->zero = 1;
lp->wipe_signatures = 0;
---
and switch to use -c 4 for both snapshots
When read-only snapshot was created, tool was skipping header
initialization of cow device. If it happened device has been
already containing header from some previous snapshot, it's
been 'reused' for a newly created snapshot instead of being cleared.
Skip over LVs that have a cache LV in their tree of LV dependencies
when performing a pvmove.
This means that users cannot move a cache pool or a cache LV's origin -
even when that cache LV is used as part of another LV (e.g. a thin pool).
The new test (pvmove-cache-segtypes.sh) currently builds up various LV
stacks that incorporate cache LVs. pvmove tests are then performed to
ensure that cache related LVs are /not/ moved. Once pvmove is enabled
for cache, those tests will switch to ensuring that the LVs /are/
moved.
Test currently fails with make check_cluster - so uses 'should'
CLVMD[4f435880]: Feb 19 23:27:36 Send local reply
format_text/archiver.c:230 WARNING: This metadata update is NOT backed up
metadata/mirror.c:1105 Failed to initialize log device
metadata/mirror.c:1145 <backtrace>
lvconvert.c:1547 <backtrace>
lvconvert.c:3084 <backtrace>
Remove 'skip' argument passed into the function.
We always used '0' - as this is the only supported
option (-K) and there is no complementary option.
Also add some testing for behaviour of skipping.
There are typically 2 functions for the more advanced segment types that
deal with parameters in lvcreate.c: _get_*_params() and _check_*_params().
(Not all segment types name their functions according to this scheme.)
The former function is responsible for reading parameters before the VG
has been read. The latter is for sanity checking and possibly setting
parameters after the VG has been read.
This patch adds a _check_raid_parameters() function that will determine
if the user has specified 'stripe' or 'mirror' parameters. If not, the
proper number is computed from the list of PVs the user has supplied or
the number that are available in the VG. Now that _check_raid_parameters()
is available, we move the check for proper number of stripes from
_get_* to _check_*.
This gives the user the ability to create RAID LVs as follows:
# 5-device RAID5, 4-data, 1-parity (i.e. implicit '-i 4')
~> lvcreate --type raid5 -L 100G -n lv vg /dev/sd[abcde]1
# 5-device RAID6, 3-data, 2-parity (i.e. implicit '-i 3')
~> lvcreate --type raid6 -L 100G -n lv vg /dev/sd[abcde]1
# If 5 PVs in VG, 4-data, 1-parity RAID5
~> lvcreate --type raid5 -L 100G -n lv vg
Considerations:
This patch only affects RAID. It might also be useful to apply this to
the 'stripe' segment type. LVM RAID may include RAID0 at some point in
the future and the implicit stripes would apply there. It would be odd
to have RAID0 be able to auto-determine the stripe count while 'stripe'
could not.
The only draw-back of this patch that I can see is that there might be
less error checking. Rather than informing the user that they forgot
to supply an argument (e.g. '-i'), the value would be computed and it
may differ from what the user actually wanted. I don't see this as a
problem, because the user can check the device count after creation
and remove the LV if they have made an error.
When an origin exists and the 'lvcreate' command is used to create
a cache pool + cache LV, the table is loaded into the kernel but
never instantiated (suspend/resume was never called). A user running
LVM commands would never know that the kernel did not have the
proper state unless they also ran the dmsetup 'table/status' command.
The solution is to suspend/resume the cache LV to make the loaded
tables become active.
Introduce a new parameter called "approx_alloc" that is set when the
desired size of a new LV is specified in percentage terms. If set,
the allocation code tries to get as much space as it can but does not
fail if can at least get some.
One of the practical implications is that users can now specify 100%FREE
when creating RAID LVs, like this:
~> lvcreate --type raid5 -i 2 -l 100%FREE -n lv vg
Replace some in-test use of lvs commands with their check
and get equivalent.
Advantage is these 'checking' commands are not necessarily always
valiadated via extensive valgrind testing and also the output noice
is significantly reduces since the output of check/get is suppressed.
This patch allows users to create cache LVs with 'lvcreate'. An origin
or a cache pool LV must be created first. Then, while supplying the
origin or cache pool to the lvcreate command, the cache can be created.
Ex1:
Here the cache pool is created first, followed by the origin which will
be cached.
~> lvcreate --type cache_pool -L 500M -n cachepool vg /dev/small_n_fast
~> lvcreate --type cache -L 1G -n lv vg/cachepool /dev/large_n_slow
Ex2:
Here the origin is created first, followed by the cache pool - allowing
a cache LV to be created covering the origin.
~> lvcreate -L 1G -n lv vg /dev/large_n_slow
~> lvcreate --type cache -L 500M -n cachepool vg/lv /dev/small_n_fast
The code determines which type of LV was supplied (cache pool or origin)
by checking its type. It ensures the right argument was given by ensuring
that the origin is larger than the cache pool.
If the user wants to remove just the cache for an LV. They specify
the LV's associated cache pool when removing:
~> lvremove vg/cachepool
If the user wishes to remove the origin, but leave the cachepool to be
used for another LV, they specify the cache LV.
~> lvremove vg/lv
In order to remove it all, specify both LVs.
This patch also includes tests to create and remove cache pools and
cache LVs.
Avoid starting conversion of the LV to the thin pool and thin volume
at the same time. Since this is mostly a user mistake, do not try
to just convert to one of those type, since we cannot assume if the
user wanted LV to become thin volume or thin pool.
Before the fix tool reported pretty strange internal error:
Internal error: Referenced LV lvol1_tdata not listed in VG mvg.
Fixed output:
lvconvert --thinpool lvol0 -T mvg/lvol0
Can't use same LV mvg/lvol0 for thin pool and thin volume.
Since we are currently incapable of providing zeroes for
reextended thin volume area, let's disable extension of
such already reduce thin volumes.
(in-release change)
When lvm2 command forks, it calls reset_locking(),
which as an unwanted side effect unlinked lock file from filesystem.
Patch changes the behavior to just close locked file descriptor
in children - so the lock is being still properly hold in the parent.
Test LVM_LVMETAD_PIDFILE for pid for lvm command.
Fix WHATS_NEW envvar name usage
Fix init order in prepare_lvmetad to respect set vars
and avoid clash with system settings.
Update test to really test the 'is running' message.
Thin kernel target 1.9 still does not support online resize of
thin pool metadata properly - so disable it with expectation
for much higher version - and reenable after fixing kernel.
3.12.0 kernel prevents raid test to be usable,
leaving unremovable devices in table.
This needs to be fixed ASAP, meanwhile disable test to make
test machines at least usable.
Add 'can_use_16T' to detect systems where we could
safely use 16T devices without causing system deadlocks.
16T size leads on those to endless loops in udevd
- it calls blkid which tries cached read from such device
- this ends in endless loop.
Related problems:
https://bugzilla.redhat.com/show_bug.cgi?id=1015028
Initial testing of thin pool's metadata with thin repairing tools.
Try to use tools from configuration settings, but allow them
to be overriden by settings of these variables:
LVM_TEST_THIN_CHECK_CMD,
LVM_TEST_THIN_DUMP_CMD,
LVM_TEST_THIN_REPAIR_CMD
FIXME: test reveals some more important bugs:
pvremove -ff also needs --yes
vgremove -ff doesn not remove metadata when there are no real LVs.
vgreduce is not able to reduce VG with pool without pool's PVs
1) When converting from an x-way mirror/raid1 to a y-way mirror/raid1,
the default behaviour should be to stay the same segment type.
2) When converting from linear to mirror or raid1, the default behaviour
should honor the mirror_segtype_default.
3) When converting and the '--type' argument is specified, the '--type'
argument should be honored.
catch such conditions, but errors in the tests caused the issue to go
unnoticed. The code has been fixed to perform #2 properly, the tests
have been corrected to properly test for #2, and a few other tests
were changed to explicitly specify the '--type mirror' when necessary.
A know issue with kmem_cach is causing failures while testing
RAID 4/5/6 device replacement. Blacklist the offending kernel
so that these tests are not performed there.
Since our current vgcfgbackup/restore doesn't deal
with difference of active volumes between current and
restored set of volumes - run test with inactive LVs.
Rewrite check lv_on and add new lv_tree_on
Move more pvmove test unrelated code out to check & get sections
(so they do not obfuscate trace output unnecesserily)
Use new lv_tree_on()
NOTE: unsure how the snapshot origin should be accounted here.
Split pmove-all-segments into separate tests for raid and thins
(so the test output properly shows what has been skipped in test)
lvchange-raid.sh checks to ensure that the 'p'artial flag takes
precedence over the 'w'ritemostly flag by disabling and reenabling
a device in the array. Most of the time this works fine, but
sometimes the kernel can notice the device failure before it is
reenabled. In that case, the attr flag will not return to 'w', but
to 'r'efresh. This is because 'r'efresh also takes precedence over
the 'w'ritemostly flag. So, we also do a quick check for 'r' and
not just 'w'.
The same corner cases that exist for snapshots on mirrors exist for
any logical volume layered on top of mirror. (One example is when
a mirror image fails and a non-repair LVM command is the first to
detect it via label reading. In this case, the LVM command will hang
and prevent the necessary LVM repair command from running.) When
a better alternative exists, it makes no sense to allow a new target
to stack on mirrors as a new feature. Since, RAID is now capable of
running EX in a cluster and thin is not active-active aware, it makes
sense to pair these two rather than mirror+thinpool.
As further background, here are some additional comments that I made
when addressing a bug related to mirror+thinpool:
(https://bugzilla.redhat.com/show_bug.cgi?id=919604#c9)
I am going to disallow thin* on top of mirror logical volumes.
Users will have to use the "raid1" segment type if they want this.
This bug has come down to a choice between:
1) Disallowing thin-LVs from being used as PVs.
2) Disallowing thinpools on top of mirrors.
The problem is that the code in dev_manager.c:device_is_usable() is unable
to tell whether there is a mirror device lower in the stack from the device
being checked. Pretty much anything layered on top of a mirror will suffer
from this problem. (Snapshots are a good example of this; and option #1
above has been chosen to deal with them. This can also be seen in
dev_manager.c:device_is_usable().) When a mirror failure occurs, the
kernel blocks all I/O to it. If there is an LVM command that comes along
to do the repair (or a different operation that requires label reading), it
would normally avoid the mirror when it sees that it is blocked. However,
if there is a snapshot or a thin-LV that is on a mirror, the above code
will not detect the mirror underneath and will issue label reading I/O.
This causes the command to hang.
Choosing #1 would mean that thin-LVs could never be used as PVs - even if
they are stacked on something other than mirrors.
Choosing #2 means that thinpools can never be placed on mirrors. This is
probably better than we think, since it is preferred that people use the
"raid1" segment type in the first place. However, RAID* cannot currently
be used in a cluster volume group - even in EX-only mode. Thus, a complete
solution for option #2 must include the ability to activate RAID logical
volumes (and perform RAID operations) in a cluster volume group. I've
already begun working on this.
Creation, deletion, [de]activation, repair, conversion, scrubbing
and changing operations are all now available for RAID LVs in a
cluster - provided that they are activated exclusively.
The code has been changed to ensure that no LV or sub-LV activation
is attempted cluster-wide. This includes the often overlooked
operations of activating metadata areas for the brief time it takes
to clear them. Additionally, some 'resume_lv' operations were
replaced with 'activate_lv_excl_local' when sub-LVs were promoted
to top-level LVs for removal, clearing or extraction. This was
necessary because it forces the appropriate renaming actions the
occur via resume in the single-machine case, but won't happen in
a cluster due to the necessity of acquiring a lock first.
The *raid* tests have been updated to allow testing in a cluster.
For the most part, this meant creating devices with '-aey' if they
were to be converted to RAID. (RAID requires the converting LV to
be EX because it is a condition of activation for the RAID LV in
a cluster.)
Simulate crash of the system and restarted pvmove after next VG
activation.
Test is catching regression introduced in 2.02.99 for partial tree
creation changes.
After enable_dev, the following commands were not
consistently seeing the pv on it.
Alasdair explained, "whenever enabling/disabling devs
outside the tools (and you aren't trying to test how
the tools cope with suddenly appearing/disappering
devices) use "vgscan""
Patch includes RAID1,4,5,6,10 tests for:
- setting writemostly/writebehind
* syncaction changes (i.e. scrubbing operations)
- refresh (i.e. reviving devices after transient failures)
- setting recovery rate (sync I/O throttling)
while the RAID LVs are under a thin-pool (both data and metadata)
* not fully tested because I haven't found a way to force bad
blocks to be noticed in the testsuite yet. Works just fine
when dealing with "real" devices.
Test moving linear, mirror, snapshot, RAID1,5,10, thinpool, thin
and thin on RAID. Perform the moves along with a dummy LV and
also without the dummy LV by specifying a logical volume name as
an argument to pvmove.
These test the toollib functions that select
vgs/lvs to process based on command line args:
empty, vg name(s), lv names(s), vg tag(s),
lv tags(s), and combinations of all.
1) Since the min|maxrecoveryrate args are size_kb_ARGs and they
are recorded (and sent to the kernel) in terms of kB/sec/disk,
we must back out the factor multiple done by size_kb_arg. This
is already performed by 'lvcreate' for these arguments.
2) Allow all RAID types, not just RAID1, to change these values.
3) Add min|maxrecoveryrate_ARG to the list of 'update_partial_unsafe'
commands so that lvchange will not complain about needing at
least one of a certain set of arguments and failing.
4) Add tests that check that these values can be set via lvchange
and lvcreate and that 'lvs' reports back the proper results.
In those places where mirrors were being created while assuming
a default segment type of "mirror", we include the '--type mirror'
argument to explicitly set the segment type. This will preserve
the mirror testing that is performed even though the default
mirroring segment type is now "raid1".
Suggest to use _tdata and _tmeta devices for that.
This fixes regression from too relaxed change in
f1d5f6ae81
Without this patch there are some empty LVs created before
mirror code recognizes it cannot continue.
(in release fix)
When the merging of snapshot is finished, we need to clean dm table
intries for snapshot and -cow device. So for merging snapshot
we have to activate_lv plain 'cow' LV and let the table
resolver to its work - shortly deactivation_lv() request
will follow - in cluster this needs LV lock to be held by clvmd.
Also update a test - add small wait - if lvremove is not 'fast enough'
and merging process has not been stopped and $lv1 removed in background.
Ortherwise the following lvcreate occasionally finds name $lv1 still in use.
(in release fix)
We check the version number of dm-raid before testing certain
features to make sure they are present. However, this has
become somewhat complicated by the fact that the version #'s
in the upstream kernel and the REHL6 kernel have been diverging.
This has been a necessity because the upstream kernel has
undergone ABI changes that have necessitated a bump in the
'Y' component of the version #, while the RHEL6 kernel has not.
Thus, we need to know that the ABI has not changed but the
features have been added. So, the current version #'ing stands
as follows:
RHEL6 Upstream Comment
======|==========|========
** Same until version 1.3.1 **
------|----------|--------
N/A | 1.4.0 | Non-functional change.
| | Removes arg from mapping function.
------|----------|--------
1.3.2 | 1.4.1 | RAID10 fix redundancy validation checks.
------|----------|--------
1.3.5 | 1.4.2 | Add RAID10 "far" and "offset" algorithm support.
| | Note this feature came later in RHEL6 as part of
| | a separate update/feature.
------|----------|--------
1.3.3 | 1.5.0 | Add message interface to allow manipulation of
| | the sync_action.
| | New status (STATUSTYPE_INFO) fields: sync_action
| | and mismatch_cnt.
------|----------|--------
1.3.4 | 1.5.1 | Add ability to restore transiently failed devices
| | on resume.
------|----------|--------
1.3.5 | 1.5.2 | 'mismatch_cnt' is zero unless [last_]sync_action
| | is "check".
------|----------|--------
To simplify, writemostly/writebehind, scrubbing, and transient device
failure restoration are all tested based on the same version
requirements: (1.3.5 < V < 1.4.0) || (V > 1.5.2). Since kernel
support for writemostly/writebehind has been around for some time,
this could mean a reduction in the scope of kernels tested for this
feature. I don't view this as much of a problem, since support for
this feature was only recently added to LVM. Thus, the user would
have to be using a very recent LVM version with an older kernel.
The mismatch count reported by a dm-raid kernel target used
to be effectively random, unless it was checked after a
"check" scrubbing action had been performed. Updates to the
kernel now mean that the mismatch count will be 0 unless a
check has been performed and discrepancies had been found.
This has been the intended behaviour all along.
This patch updates the test suite to handle the change.
- lvs -o lv_attr has now 10 indicator bits
- use '--ignoremonitoring' instead of the shortcut '--ig' used before (since
it would be ambiguous with new '--ignoreactivationskip')
Test the different RAID lvchange scenarios under snapshot as well.
This patch also updates calculations for where to write to an
underlying PV when testing various syncactions.
If the user would upconvert a linear LV to a mirror without specifying
the segment type ("--type mirror" vs "--type raid1"), the "mirror"
segment type would be chosen without consulting the 'default_mirror_segtype'
setting in lvm.conf. This is now used as the basis for determining
which should be used if left unspecified.
When vgname has not existed in metadata, it has crashed on double free
in format_instance destroy() - since VG was created, used FID and was
released - which also released FID, so further use was accessing bad
memory.
Fix it for this code path before release_vg() so FID will exists
when _vg_read_file_name() returns NULL.
Assumed size of 4M was too large and the test was failing because
'dd' was failing to perform its write.
Calculate the size we need to write with 'dd' instead, so we
don't overrun the device.
aux updates:
prepare_vg now created clustered VG for cluster tests.
since dm-raid doesn't work in cluster, skip the cluster
test when someone checks for dm-raid target until fixed.
Support vgsplit for VGs with thin pools and thin volumes.
In case the thin data and thin metadata volumes are moved to a new VG,
move there also all related thin volumes and check that external origins
are also present in this new VG.
For non udev path use DM_DEFAULT_NAME_MANGLING_MODE.
Skip this test when using real /dev dir, since udev is not able
to create such device name unless mangled...
This fixes a long standing regression since LVM2 2.02.74 (commit 4efb1d9c,
"Update heuristic used for default and detected data alignment.")
The default PE alignment could be used (via MAX()) even if it was
determined that the device's MD stripe width, or minimal_io_size or
optimal_io_size were not factors of the default PE alignment (either 64K
or the newer default of 1MB, etc). This bug would manifest if the
default PE alignment was larger than the overriding hint that the
device provided (e.g. default of 1MB vs optimal_io_size of 768K).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Support for exclusive activation of snapshots revealed some problems.
When snapshot is created, COW LV is activated first (for clearing) and
then it's transformed into snapshot's COW LV, but it has left the lock
for such LV active in cluster and this lock could not have been removed
from dlm, unless snapshot has been removed within same dlm session.
If the user tried to remove snapshot after rebooting node, the lock was
missing, and COW LV could not have been detached.
Patch modifes the approach in this way:
Always deactivate COW LV for clustered vg after clearing (so it's
activated again via imlicit snapshot activation rule when snapshot is activated).
When snapshot is removed, activate COW LV as independend LV, so the lock
will exist for such LV, but only when the snapshot is active.
Also add test case for testing snapshot removal after cluster reboot.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
lvchange --syncaction {check|repair} vg/raid_lv
RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent. 'lvchange' can
now initaite the two scrubbing operations: "check" and "repair". "check"
will go over the array and recored the number of discrepancies but not
repair them. "repair" will correct the discrepancies as it finds them.
'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'. The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.
Additional reporting has been added for 'lvs' to support the new
operations. Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches". These
can be accessed using the '-o' option to 'lvs', like:
lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing. It can be one of the following:
- idle: All sync operations complete (doing nothing)
- resync: Initializing an array or recovering after a machine failure
- recover: Replacing a device in the array
- check: Looking for array inconsistencies
- repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.
The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.
Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well. The role of the 'p'artial character in the lv_attr report field
as expanded. "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
'm'ismatches: Indicates that there are discrepancies in a RAID
LV. This character is shown after a scrubbing
operation has detected that portions of the RAID
are not coherent.
'r'efresh : Indicates that a device in a RAID array has suffered
a failure and the kernel regards it as failed -
even though LVM can read the device label and
considers the device to be ok. The LV should be
'r'efreshed to notify the kernel that the device is
now available, or the device should be 'r'eplaced
if it is suspected of failing.
Instead of check for lv_is_active() for thin pool LV,
query the whole pool via new pool_is_active().
Fixes a problem when we cannot change discards settings
for active pool device where the actual layer for pool
device was inactive, but thin volumes using thin pool
have been active.
Calling pvscan --cache with -aay on a PV without an MDA would spuriously fail
with an internal error, because of an incorrect assumption that a parsed VG
structure was always available. This is not true and the autoactivation handler
needs to call vg_read to obtain metadata in cases where the PV had no MDAs to
parse. Therefore, we pass vgid into the handler instead of the (possibly NULL)
VG coming from the PV's MDA.
Commit bf2741376d started to use
lv_is_active() instead of call for lv_info & info.exists so
we cover also cluster activated devices.
For snapshost the conversion was not correct and introduced
regression by blocking creation of snapshot of inactive LV.
Fix it by assigning lv_is_active() directly.
Note: we still have minor issue to fix - to make
lv_is_???? function able to return error states since
lv_info() may fail.
I'm not sure what 'BUG's were being encountered when the restriction
to limit lvconvert-raid.sh tests to kernels > 3.2 was added. I do know
that there were BUG's that could be triggered when testing snapshots and
some of the earliest DM RAID available in the kernel. I've taken out
the 3.2 kernel restriction and added a dm-raid >= 1.2 restriction instead.
This will allow the test to run on patched production kernels.
Reset counter after thin pool resize failure.
If the pool goes above threshold, support unmounting
of all thin volumes if the lvextend fails to avoid
overfilling of the pool.
Commit 3501f17fd0 enables a limited set
of metadata updates for partial LV/VGs when issuing lvchange or vgchange.
These tests verify those changes operate as intended.
Separate original raid test and new raid10 test,
so the old could be tested on platforms without raid10 support.
Replace test-unfriendly `ls /dev/mapper` with dmsetup ls
Revert changes to origin lvcreate-large test and use separate
test scripts for raid - so they can be properly skipped when
kernel doesn't support raid targets.
MD's bitmaps can handle 2^21 regions at most. The RAID code has always
used a region_size of 1024 sectors. That means the size of a RAID LV was
limited to 1TiB. (The user can adjust the region_size when creating a
RAID LV, which can affect the maximum size.) Thus, creating, extending or
converting to a RAID LV greater than 1TiB would result in a failure to
load the new device-mapper table.
Again, the size of the RAID LV is not limited by how much space is allocated
for the metadata area, but by the limitations of the MD bitmap. Therefore,
we must adjust the 'region_size' to ensure that the number of regions does
not exceed the limit. I've added code to do this when extending a RAID LV
(which covers 'create' and 'extend' operations) and when up-converting -
specifically from linear to RAID1.
When reformatting the 'lvchange_resync' code in commit
05131f5853, a '!' should have been removed
from the condition that checks for the LV_NOTSYNCED flag on a corelog
mirror LV. The presence of this '!' caused the LV_NOTSYNCED flag to be
cleared when it wasn't present and left when it was present.
It is not allowed to add images to a 'mirror' or 'raid1' LV if the
LV_NOTSYNCED flag is set. We add some up-convert tests to ensure this
behavior is being enforced and that the LV_NOTSYNCED flag is being
properly cleared by 'lvchange --resync'.
(Not updating WHATS_NEW because this is intrarelease.)
This patch adds support for RAID10. It is not the default at this
stage. The user needs to specify '--type raid10' if they would like
RAID10 instead of stacked mirror over stripe.
Commit 8767435ef8 allowed RAID 4/5/6
LV to be extended properly, but introduced a regression in device
replacement - a critical component of fault tolerance.
When only 1 or 2 drives are being replaced, the 'area_count' needed
can be equal to the parity_count. The 'area_multiple' for RAID 4/5/6
was computed as 'area_count - parity_devs', which could result in
'area_multiple' being 0. This would ultimately lead to a division by
zero error. Therefore, in calc_area_multiple, it is important to take
into account the number of areas that are being requested - just as
we already do in _alloc_init.
Reducing a RAID 4/5/6 LV or extending it with a different number of
stripes is still not implemented. This patch covers the "simple" case
where the LV is extended with the same number of stripes as the orginal.
When mirrors are up-converted, a transient mirror layer is put in so that
only the new devices are sync'ed. That transient layer must carry the tags
of the original mirror LV, otherwise it will fail to activate when activation
is regulated by lvm.conf:activation/volume_list. The conversion would then
fail.
The fix is to do exactly the same thing that is being done for linear ->
mirror converting (lib/metadata/mirror.c:_init_mirror_log()). We copy the
tags temporarily for the new LV and remove them after the activation.
Snapshots of RAID logical volumes are allowed (including "raid1"). However,
snapshots of "mirror" logical volumes has been disallowed due to unsolvable
issues inherent to the design. The fact that mirroring (dm-raid1.c) must
stop all I/O as the result of a failure and wait for userspace intervention
can lead to a circular dependency if userspace is simultaneously waiting for
snapshots (on mirrors) to make an I/O update before proceeding.
Various snapshot on mirror tests have been removed as a result.
Add make help target.
Add LVM_TEST_PARALLEL to support parallel runs of tests
Work around the problem the dmsetup table/info may return error
by using dmtable and dminfo function that will use 'should'.
(Error happens when some concurently running process removes table
entry while dmsetup command resolves table entries inside the loop.)
Actually restart was failing for different reason - so pass in proper
location of dmeventd for restart from lvm command and avoid using
the one from /sbin location.
Update pv create test with "" around path.
Indent
Shell improvements - use internal function for checks
Use PVs in "" (LV and VG cannot have spaces)
Several test very starting 'dmeventd' without annoucing
it via prepade_dmeventd.
Fix some of test actually.
When down-converting a RAID1 device, it is the last device that is extracted
and removed when the user does not specify a particular device. However,
when a device is specified (and it is not the last), the device is removed and
the remaining sub-LVs are "shifted down" to fill the hole. This cause problems
when resuming the LV because if the shifted devices were resumed (and thus
renamed) before the sub-LV being extracted, there would be a name conflict.
The solution is to resume the extracted sub-LVs first so that they can be
properly renamed preventing a possible conflict.
This addresses bug 801967.
Test if thin_check is present in system and disable its use, when its missing.
Add testing for poolmetadatasize.
FIXME: Allocation policy for metadata pool might need some relaxing.
(For now it needs to put all block on one PV.)
Reduce disc excercise for some test and focus on LVM testing by
using smaller extent size.
Reduce number of teardown_devs calls and use vg/lvremove instead.
Don't sleep for seconds on pvmove.
FIXME: shell/lvconvert-mirror-basic.sh seems to need more checking.
Test fails for smalled extent size then 512k.
make devices invisible to lvm, but the behaviour of those is slightly different
than of actual missing devices. Running vgscan after re-enabling the device
triggers a metadata repair which is not done by vgremove -ff. This is not a
regression, merely an odd behaviour that has been around even before lvmetad.
Failure to do so results in "Performing unsafe table load while X device(s) are
known to be suspended" errors. While fixing the problem in this way works and
is consistent with the way the mirror segment type does it, it would be nice
to find a solution that uses the generic suspend/resume calls.
Also included in this check-in are additions to the test suite that perform
conversions on RAID LVs under a snapshot. These tests are disabled for the
time being due to a kernel bug that is yet to be tracked down.