IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The lvchange-raid[456].sh test checks that mismatches can be detected
properly. It does this by writing garbage to the back half of one of
the legs directly. When performing a "check" or "repair" of mismatches,
MD does a good job going directly to disk and bypassing any buffers that
may prevent it from seeing mismatches. However, in the case of RAID4/5/6
we have the stripe cache to contend with and this is not bypassed. Thus,
mismatches which have /just/ happened to an area that now populates the
stripe cache may be overlooked. This isn't a serious issue, however,
because the stripe cache is short-lived and reasonably small. So, while
there may be a small window of time between the disk changing underneath
the RAID array and when you run a "check"/"repair" - causing a mismatch
to be missed - that would be no worse than if a user had simply run a
"check" a few seconds before the disk changed. IOW, it simply isn't worth
making a fuss over dropping the stripe cache before beginning a "check" or
"repair" (which we actually did attempt to do a while back).
So, to get the test running smoothly, we simply deactivate and reactivate
the LV to force the stripe cache to be dropped and then proceed. We could
just as easily wait a few seconds for the stripe cache to empty also.
During an ongoing reshape, the MD kernel runtime reads stripes relative
to data_offset and starts storing the reshaped stripes (with new raid
layout and/or new stripesize and/or new number of stripes) relative
to new_data_offset. This is to avoid writing over any data in place
which is non-atomic by nature and thus be recoverable without data loss
in the transition. MD uses the term out-of-place reshaping for it.
There's 2 other areas we don't have report capability for:
- number of data stripes vs. total stripes
(e.g. raid6 with 7 stripes toal has 5 data stripes)
- number of (rotating) parity/syndrome chunks
(e.g. raid6 with 7 stripes toal has 2 parity chunks; one
per stripe for P-Syndrome and another one for Q-Syndrome)
Thus, add the following reportable keys:
- reshape_len (in current units)
- reshape_len_le (in logical extents)
- data_offset (in sectors)
- new_data_offset ( " )
- data_stripes
- parity_chunks
Enhance lvchange-raid.sh, lvconvert-raid-reshape-linear_to_striped.sh,
lvconvert-raid-reshape-striped_to_linear.sh, lvconvert-raid-reshape.sh
and lvconvert-raid-takeover.sh to make use of new keys.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Unsure if this is feature or bug of syncaction,
but it needs to be present physically on the media
and it ignores content of buffer cache...
(maybe lvchange should implicitely fsync all disks
that are members of raid array before starting test??)
Use separate files for raid1, raid456, raid10.
They need different target versions to work, so support
more precise test selection.
Optimize duplicate tests of target avalability and skip
unsupported test cases sooner.
Replace some in-test use of lvs commands with their check
and get equivalent.
Advantage is these 'checking' commands are not necessarily always
valiadated via extensive valgrind testing and also the output noice
is significantly reduces since the output of check/get is suppressed.
lvchange-raid.sh checks to ensure that the 'p'artial flag takes
precedence over the 'w'ritemostly flag by disabling and reenabling
a device in the array. Most of the time this works fine, but
sometimes the kernel can notice the device failure before it is
reenabled. In that case, the attr flag will not return to 'w', but
to 'r'efresh. This is because 'r'efresh also takes precedence over
the 'w'ritemostly flag. So, we also do a quick check for 'r' and
not just 'w'.
The same corner cases that exist for snapshots on mirrors exist for
any logical volume layered on top of mirror. (One example is when
a mirror image fails and a non-repair LVM command is the first to
detect it via label reading. In this case, the LVM command will hang
and prevent the necessary LVM repair command from running.) When
a better alternative exists, it makes no sense to allow a new target
to stack on mirrors as a new feature. Since, RAID is now capable of
running EX in a cluster and thin is not active-active aware, it makes
sense to pair these two rather than mirror+thinpool.
As further background, here are some additional comments that I made
when addressing a bug related to mirror+thinpool:
(https://bugzilla.redhat.com/show_bug.cgi?id=919604#c9)
I am going to disallow thin* on top of mirror logical volumes.
Users will have to use the "raid1" segment type if they want this.
This bug has come down to a choice between:
1) Disallowing thin-LVs from being used as PVs.
2) Disallowing thinpools on top of mirrors.
The problem is that the code in dev_manager.c:device_is_usable() is unable
to tell whether there is a mirror device lower in the stack from the device
being checked. Pretty much anything layered on top of a mirror will suffer
from this problem. (Snapshots are a good example of this; and option #1
above has been chosen to deal with them. This can also be seen in
dev_manager.c:device_is_usable().) When a mirror failure occurs, the
kernel blocks all I/O to it. If there is an LVM command that comes along
to do the repair (or a different operation that requires label reading), it
would normally avoid the mirror when it sees that it is blocked. However,
if there is a snapshot or a thin-LV that is on a mirror, the above code
will not detect the mirror underneath and will issue label reading I/O.
This causes the command to hang.
Choosing #1 would mean that thin-LVs could never be used as PVs - even if
they are stacked on something other than mirrors.
Choosing #2 means that thinpools can never be placed on mirrors. This is
probably better than we think, since it is preferred that people use the
"raid1" segment type in the first place. However, RAID* cannot currently
be used in a cluster volume group - even in EX-only mode. Thus, a complete
solution for option #2 must include the ability to activate RAID logical
volumes (and perform RAID operations) in a cluster volume group. I've
already begun working on this.
Creation, deletion, [de]activation, repair, conversion, scrubbing
and changing operations are all now available for RAID LVs in a
cluster - provided that they are activated exclusively.
The code has been changed to ensure that no LV or sub-LV activation
is attempted cluster-wide. This includes the often overlooked
operations of activating metadata areas for the brief time it takes
to clear them. Additionally, some 'resume_lv' operations were
replaced with 'activate_lv_excl_local' when sub-LVs were promoted
to top-level LVs for removal, clearing or extraction. This was
necessary because it forces the appropriate renaming actions the
occur via resume in the single-machine case, but won't happen in
a cluster due to the necessity of acquiring a lock first.
The *raid* tests have been updated to allow testing in a cluster.
For the most part, this meant creating devices with '-aey' if they
were to be converted to RAID. (RAID requires the converting LV to
be EX because it is a condition of activation for the RAID LV in
a cluster.)
Patch includes RAID1,4,5,6,10 tests for:
- setting writemostly/writebehind
* syncaction changes (i.e. scrubbing operations)
- refresh (i.e. reviving devices after transient failures)
- setting recovery rate (sync I/O throttling)
while the RAID LVs are under a thin-pool (both data and metadata)
* not fully tested because I haven't found a way to force bad
blocks to be noticed in the testsuite yet. Works just fine
when dealing with "real" devices.
1) Since the min|maxrecoveryrate args are size_kb_ARGs and they
are recorded (and sent to the kernel) in terms of kB/sec/disk,
we must back out the factor multiple done by size_kb_arg. This
is already performed by 'lvcreate' for these arguments.
2) Allow all RAID types, not just RAID1, to change these values.
3) Add min|maxrecoveryrate_ARG to the list of 'update_partial_unsafe'
commands so that lvchange will not complain about needing at
least one of a certain set of arguments and failing.
4) Add tests that check that these values can be set via lvchange
and lvcreate and that 'lvs' reports back the proper results.
We check the version number of dm-raid before testing certain
features to make sure they are present. However, this has
become somewhat complicated by the fact that the version #'s
in the upstream kernel and the REHL6 kernel have been diverging.
This has been a necessity because the upstream kernel has
undergone ABI changes that have necessitated a bump in the
'Y' component of the version #, while the RHEL6 kernel has not.
Thus, we need to know that the ABI has not changed but the
features have been added. So, the current version #'ing stands
as follows:
RHEL6 Upstream Comment
======|==========|========
** Same until version 1.3.1 **
------|----------|--------
N/A | 1.4.0 | Non-functional change.
| | Removes arg from mapping function.
------|----------|--------
1.3.2 | 1.4.1 | RAID10 fix redundancy validation checks.
------|----------|--------
1.3.5 | 1.4.2 | Add RAID10 "far" and "offset" algorithm support.
| | Note this feature came later in RHEL6 as part of
| | a separate update/feature.
------|----------|--------
1.3.3 | 1.5.0 | Add message interface to allow manipulation of
| | the sync_action.
| | New status (STATUSTYPE_INFO) fields: sync_action
| | and mismatch_cnt.
------|----------|--------
1.3.4 | 1.5.1 | Add ability to restore transiently failed devices
| | on resume.
------|----------|--------
1.3.5 | 1.5.2 | 'mismatch_cnt' is zero unless [last_]sync_action
| | is "check".
------|----------|--------
To simplify, writemostly/writebehind, scrubbing, and transient device
failure restoration are all tested based on the same version
requirements: (1.3.5 < V < 1.4.0) || (V > 1.5.2). Since kernel
support for writemostly/writebehind has been around for some time,
this could mean a reduction in the scope of kernels tested for this
feature. I don't view this as much of a problem, since support for
this feature was only recently added to LVM. Thus, the user would
have to be using a very recent LVM version with an older kernel.
The mismatch count reported by a dm-raid kernel target used
to be effectively random, unless it was checked after a
"check" scrubbing action had been performed. Updates to the
kernel now mean that the mismatch count will be 0 unless a
check has been performed and discrepancies had been found.
This has been the intended behaviour all along.
This patch updates the test suite to handle the change.
- lvs -o lv_attr has now 10 indicator bits
- use '--ignoremonitoring' instead of the shortcut '--ig' used before (since
it would be ambiguous with new '--ignoreactivationskip')
Test the different RAID lvchange scenarios under snapshot as well.
This patch also updates calculations for where to write to an
underlying PV when testing various syncactions.
Assumed size of 4M was too large and the test was failing because
'dd' was failing to perform its write.
Calculate the size we need to write with 'dd' instead, so we
don't overrun the device.
aux updates:
prepare_vg now created clustered VG for cluster tests.
since dm-raid doesn't work in cluster, skip the cluster
test when someone checks for dm-raid target until fixed.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
lvchange --syncaction {check|repair} vg/raid_lv
RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent. 'lvchange' can
now initaite the two scrubbing operations: "check" and "repair". "check"
will go over the array and recored the number of discrepancies but not
repair them. "repair" will correct the discrepancies as it finds them.
'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'. The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.
Additional reporting has been added for 'lvs' to support the new
operations. Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches". These
can be accessed using the '-o' option to 'lvs', like:
lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing. It can be one of the following:
- idle: All sync operations complete (doing nothing)
- resync: Initializing an array or recovering after a machine failure
- recover: Replacing a device in the array
- check: Looking for array inconsistencies
- repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.
The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.
Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well. The role of the 'p'artial character in the lv_attr report field
as expanded. "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
'm'ismatches: Indicates that there are discrepancies in a RAID
LV. This character is shown after a scrubbing
operation has detected that portions of the RAID
are not coherent.
'r'efresh : Indicates that a device in a RAID array has suffered
a failure and the kernel regards it as failed -
even though LVM can read the device label and
considers the device to be ok. The LV should be
'r'efreshed to notify the kernel that the device is
now available, or the device should be 'r'eplaced
if it is suspected of failing.