IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When lock-holding LV differs from actually request locked LV,
we drop origin_only flag as it has no use - it'd be applied
on completely different LV.
Example of problem:
Raid is thin-pool _tdata LV.
Raid run origin_only locking on stacked device.
As lock holder is discovered thinLV.
Whole origin_only operation is then applied only on thinLV
changing the meaning of whole operation.
NOTE: this patch does not change anything for LV that are
already top-level lock holding LVs (i.e. thinLVs, snahoshots/origins).
Disable until we have a proper fix for reshape space allocation,
switching it to begin/end of rimages and activation in the cluster.
Related: rhbz1448116
Related: rhbz1461526
Related: rhbz1448123
Enhance reporting code, so it does not need to do 'extra' ioctl to
get 'status' of normal raid and provide percentage directly.
When we have 'merging' snapshot into raid origin, we still need to get
this secondary number with extra status call - however, since 'raid'
is always a single segment LV - we may skip 'copy_percent' call as
we directly know the percent and also with better precision.
NOTE: for mirror we still base reported number on the percetage of
transferred extents which might get quite imprecisse if big size
of extent is used while volume itself is smaller as reporting jump
steps are much bigger the actual reported number provides.
2nd.NOTE: raid lvs line report already requires quite a few extra status
calls for the same device - but fix will be need slight code improval.
Relative to last comit ddf2a1d656:
adjust the dm-raid target version to 1.12.0 which shows
mandatory kernel MD deadlock fixes related to reshaping
are presant in the kernel.
Related: rhbz1443999
For the test clean-up, I was providing too many devices to the first
command - possibly allowing it to allocate in the wrong place. I was
also not providing a device for the second command - virtually ensuring
the test was not performing correctly at times.
This patch ensures that under normal conditions (i.e. not during repair
operations) that users are prevented from removing devices that would
cause data loss.
When a RAID1 is undergoing its initial sync, it is ok to remove all but
one of the images because they have all existed since creation and
contain all the data written since the array was created. OTOH, if the
RAID1 was created as a result of an up-convert from linear, it is very
important not to let the user remove the primary image (the source of
all the data). They should be allowed to remove any devices they want
and as many as they want as long as one original (primary) device is left
during a "recover" (aka up-convert).
This fixes bug 1461187 and includes the necessary regression tests.
Add the checks necessary to distiguish the state of a RAID when the primary
source for syncing fails during the "recover" process.
It has been possible to hit this condition before (like when converting from
2-way RAID1 to 3-way and having the first two devices die during the "recover"
process). However, this condition is now more likely since we treat linear ->
RAID1 conversions as "recover" now - so it is especially important we cleanly
handle this condition.
Previously, we were treating non-RAID to RAID up-converts as a "resync"
operation. (The most common example being 'linear -> RAID1'.) RAID to
RAID up-converts or rebuilds of specific RAID images are properly treated
as a "recover" operation.
Since we were treating some up-convert operations as "resync", it was
possible to have scenarios where data corruption or data loss were
possibilities if the RAID hadn't been able to sync completely before a
loss of the primary source devices. In order to ensure that the user took
the proper precautions in such scenarios, we required a '--force' option
to be present. Unfortuneately, the force option was rendered useless
because there was no way to distiguish the failure state of a potentially
destructive repair from a nominal one - making the '--force' option a
requirement for any RAID1 repair!
We now treat non-RAID to RAID up-converts properly as "recover" operations.
This eliminates the scenarios that can potentially cause data loss or
data corruption; and this eliminates the need for the '--force' requirement.
This patch removes the requirement to specify '--force' for RAID repairs.
Two of the sync actions performed by the kernel (aka MD runtime) are
"resync" and "recover". The "resync" refers to when an entirely new array
is going through the process of initializing (or resynchronizing after an
unexpected shutdown). The "recover" is the process of initializing a new
member device to the array. So, a brand new array with all new devices
will undergo "resync". An array with replaced or added sub-LVs will undergo
"recover".
These two states are treated very differently when failures happen. If any
device is lost or replaced while "resync", there are no worries. This is
because any writes created from the inception of the array have occurred to
all the devices and can be safely recovered. Even though non-initialized
portions will still be resync'ed with uninitialized data, it is ok. However,
if a pre-existing device is lost (aka, the original linear device in a
linear -> raid1 convert) during a "recover", data loss can be the result.
Thus, writes are errored by the kernel and recovery is halted. The failed
device must be restored or removed. This is the correct behavior.
Unfortunately, we were treating an up-convert from linear as a "resync"
when we should have been treating it as a "recover". This patch
removes the special case for linear upconvert. It allows each new image
sub-LV to be marked with a rebuild flag and treats the array as 'in-sync'.
This has the correct effect of causing the upconvert to be treated as a
"recover" rather than a "resync". There is no need to flag these two states
differently in LVM metadata, because they are already considered differently
by the kernel RAID metadata. (Any activation/deactivation will properly
resume the "recover" process and not a "resync" process.)
We make this behavior change based on the presense of dm-raid target
version 1.9.0+.
On conversion from raid10 to raid0 (takeover), all rmeta
devices and the rimage devices of mirrored stripes are
detached from the raid10 LV. The remaining rimage areas
are being shifted down into the slots of the detached
ones hence requiring renames to show proper _N suffix
sequences (e.g. 0,1,2,3 instead of 0,2,4,6). Only the
top-level raid10 LV has a cluster lock, not the detached
SubLVs thus their deactivation is impossible and e.g the
rename from *_rimage_6 to *_rimage_3 will fail. Fix by
activating exclusively before deactivating and removing.
Resolves: rhbz1448123
Prohibit activation of reshaping RaidLVs on incompatible
lvm2 runtime by storing e.g. 'raid5+RESHAPE' segment type
strings in the lvm2 metadata. Incompatible runtime not
supporting reshaping won't be able to activate those thus
avoiding potential data corruption.
Any new non-reshaping lvconvert command will reset the
segment type string from 'raid5+RESHAPE' to 'raid5'.
See commits
0299a7af1e and
4141409eb0
for segtype flag support.
When old snapshot is merged, lvm2 still can report some data about
merged 'snapshot' - i.e. it occupied space in VG.
This patch fixes regression from commit:
6fd20be629
and resolved RHBZ: 1460161
When a combination of thin-pool chunk size and thin-pool data size
goes beyond addressable limit, such volume creation is directly
prohibited.
Maximum usable thin-pool size is calculated with use of maximal support
metadata size (even when it's created smaller) and given chunk-size.
If the value data size is found to be too big, the command reports
error and operation fails.
Previously thin-pool was created however lots of thin-pool data LV was
not usable and this space in VG has been wasted.
Only support RAID conversions on active LVs.
If we'd accept e.g. upconverting linear -> raid1 on inactive
linear LVs, any LV flags passed to the kernel aren't properly
cleared thus errouneously passing them on every activation.
Add respective check to lv_raid_change_image_count() and
move existing one in lv_raid_convert() for better messages.
Warn about a PV that has the in-use flag set, but appears in
the orphan VG (no VG was found referencing it.)
There are a number of conditions that could lead to this:
. The PV was created with no mdas and is used in a VG with
other PVs (with metadata) that have not yet appeared on
the system. So, no VG metadata is found by lvm which
references the in-use PV with no mdas.
. vgremove could have failed after clearing mdas but
before clearing the in-use flag. In this case, the
in-use flag needs to be manually cleared on the PV.
. The PV may have damanged/unrecognized VG metadata
that lvm could not read.
. The PV may have no mdas, and the PVs with the metadata
may have damaged/unrecognized metadata.
A PV holding VG metadata that lvm can't understand
(e.g. damaged, checksum error, unrecognized flag)
will appear as an in-use orphan, and will be cleared
by this repair code. Disable this repair until the
code can keep track of these problematic PVs, and
distinguish them from actual in-use orphans.
Reject any stripe adding/removing reshape on raid4/5/6/10 because
of related MD kernel deadlock on single core systems until
we get a proper fix in MD.
Related: rhbz1443999
Since lvmetad is using 'MISSING' in status for 'another' purpose,
we need to support ATM also flag get from this place.
Until fixed better - we accept both flags - alhough lvm2 will
only print in flags.
Switch METADATA_FORMAT flag usage to be stored via segtype
instead of 'status' flag which appeared to cause major
incompatibility troubles.
For backward compatiblity segtype flags are still accepted also
via 'status' bits which were used from version 2.02.169 so metadata
saved by this newer lvm2 version should still work nicely, although
new save version will no longer work on this older lvm2 version.
Allow storing LV status bits with segment type name field.
Switching to this since this field has better support for compatibility
with older version of lvm2 - since such unknown segtype will not cause
complete invisiblity of metadata from older lvm2 code - just the
particular LV will become unusable with unknown type of segment.
Commit 5fe07d3574 failed to set raid5 types
properly on conversions from raid6. It always enforced raid6_ls_6
for types raid6/raid6_zr/raid6_nr/raid6_nc, thus requiring 3 conversions
instead of 2 when asking for raid5_{la,rs,ra,n}.
Related: rhbz1439403
Offer possible interim LV types and display their aliases
(e.g. raid5 and raid5_ls) for all conversions between
striped and any raid LVs in case user requests a type
not suitable to direct conversion.
E.g. running "lvconvert --type raid5 LV" on a striped
LV will replace raid5 aka raid5_ls (rotating parity)
with raid5_n (dedicated parity on last image).
User is asked to repeat the lvconvert command to get to the
requested LV type (raid5 aka raid5_ls in this example)
when such replacement occurs.
Resolves: rhbz1439403
_check_reappeared_pv() incorrectly clears the MISSING_PV flags of
PVs with unknown devices.
While one caller avoids passing such PVs into the function, the other
doesn't. Move the check inside the function so it's not forgotten.
Without this patch, if the normal VG reading code tries to repair
inconsistent metadata while there is an unknown PV, it incorrectly
considers the missing PVs no longer to be missing and produces
incorrect 'pvs' output omitting the missing PV, for example.
Easy reproducer:
Create a VG with 3 PVs pv1, pv2, pv3.
Hide pv2.
Run vgreduce --removemissing.
Reinstate the hidden PV pv2 and at the same time hide a different PV
pv3.
Run 'pvs' - incorrect output.
Run 'pvs' again - correct output.
See https://bugzilla.redhat.com/1434054
There are certain situations (not fully understood)
where is_missing_pv() is false, but pv->dev is NULL,
so this adds a check for NULL pv->dev after is_missing_pv()
to avoid a segfault.
lvconvert parameters not causing a conversion (i.e. no type,
number of stripes, stripesize or regionsize changes) will
remove any allocated reshape space in which case the command
returns success. If reshape space does not exist though,
return error.
When metadata LV size was over DM_THIN_MAX_METADATA_SIZE sectors,
the info() routine was incorrectly trying to match bigger size,
while we do never pass any bigger device.
Fixing a case, where lvs should be displaying status for metadata
LV with 16GB size.
Reshape check failed when regionsize changed and current raid type
was provided with no other change requested (stripes or stripesize).
E.g. "lvconvert --type raid6 --regionsize 256K" on a raid6 LV
with != 256K regionsize.
Enable --type in test script.
Remove any newly allocated sub LV (pair) remnants in case
allocation fails due to lag of (parallel) free PV space
and keep initial raid type.
Resolves: rhbz1438013
SIGINT isn't blocked properly after a sigint_allow(),
sigint_restore() cycle leading to illicit interruptable
metadata updates. These can leave corrupted metadata behind.
Issues addressed in this commit:
sigint_allow() fails to set _oldmasked[] members properly due
to an offset by one bug on indexing the members of the array.
It bails out prematurely comparing to MAX_SIGINTS causing nesting
depths to be one less than MAX_SIGINTS. Fix the comparision.
Correct the related comparison flaw in sigint_restore().
Initialize all sig_atomic_t variables consequently.
Resolves: rhbz1440766
Avoid error message
"Logical Volume *_rimage_0 already exists in volume group,,,"
on takeover conversion from a 2-legged raid1 to raid4
(aiming to reshape it adding images).
Resolves: rhbz1439398
Requesting _raid_remove_images() to commit the
metadata missed to reload the origin causing a
kernel takeover error converting a 2-legged raid1
(with previously removed images) to raid5.
Allow the combination of both arguments keeping
the raid level but changing the regionssize
(e.g. "lvconvert --type raid1 --regionsize 1M RaidLV"
on an existing raid1 LV).
Resolves: rhbz1438396
With monolithic kernels we can't actually modprobe
for cache modules as they are already compiled-in
and policy modules do not export version symbol.
Reported issue on list:
https://www.redhat.com/archives/dm-devel/2017-March/msg00061.html
Fix will try to look for explicit kernel symbols first before
calling modprobe.
Removing some unused new lines and changing some incorrect "can't
release until this is fixed" comments. Rename license.txt to make
it clear its merely an included file, not itself a licence.
This patch fixed lvm2 compilation running on x32 arch.
(Using 64bit x86 cpu features but running on 32b address space,
so consuming less mem in VM).
On x32 arch 'time_t' is 64bit while 'long' is 32bit.
This reverts commit 1e4462dbfb
in favour of an enhanced solution avoiding changes in liblvm
completetly by checking the target versions in libdm and emitting
the respective parameter lines.
The libdevmapper interface compares existing table line retrieved from
the kernel to new table line created to decide if it can suppress a reload.
Any difference between input and output of the table line is taken to be a
change thus causing a table reload.
The dm-raid target started to misorder the raid parameters (e.g. 'raid10_copies')
starting with dm-raid target version 1.9.0 up to (excluding) 1.11.0. This causes
runtime failures (limited to raid10 as of tests) and needs to be reversed to allow
e.g. old lvm2 uspace to run properly.
Check for the aforementioned version range and adjust creation of the table line
to the respective (mis)ordered sequence inside and correct order outside the range
(as described for the raid target in the kernels Documentation/device-mapper/dm-raid.txt).
Starting with dm-raid target version 1.9.0 shrinking of mapped devices is supported.
Check for support being present in lvresize and lvreduce.
Related: rhbz1394048
Quering non-thin-pool segment for discard property may lead
to intenal error if the segment had set 'out-of-range' value,
so only thin-pool is allowed, for other it returns NULL.
If SubLVs to be removed still exist after an image removing
conversion (i.e. "lvconvert --yes --force --stripes N "
with N < total stripes) any request to convert to a different
striped/raid* level has to be rejected until after those freed
SubLVs got removed by running the aforementioned lvconvert again.
Add tests to check conversion to striped/raid* gets rejected.
Enhance a test comment.
Related: rhbz1191935
Related: rhbz1366296
Repairing missing devices does not work reliably
with lvmetad, so disable lvmetad before repair.
A standard lvmetad refresh (pvscan --cache) will
enable lvmetad again.
Sending %d as format argument in lvmetad_vg_remove_pending() will cause
segfaults in config_make_nodes_v() when va_arg() casts to int64_t. Also, it is
clearly advertised in the lvm source code that using plain %d is prohibited, so
let's switch to FMTd64.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Commit f4b30b0dae was about displaying visible LV size
when reshape space is allocated. Take parity devices
into account when displaying the user visible LV size.
Better support for lvdisplay.
By default info about running (in kernel) cache status is printed.
To get 'segtype' info, user runs: 'lvdisplay -m', example:
--- Logical volume ---
LV Path /dev/vg/lvol0
LV Name lvol0
VG Name vg
LV UUID Y4uWuN-TBGk-duer-aPWl-yBWn-iFFR-RU1gg1
LV Write Access read/write
LV Creation host, time linux, 2017-03-01 20:52:39 +0100
LV Cache pool name lvol2
LV Cache origin name lvol0_corig
LV Status available
# open 0
LV Size 12,00 MiB
Cache used blocks 10,42%
Cache metadata blocks 0,49%
Cache dirty blocks 0,00%
Cache read hits/misses 112 / 34
Cache wrt hits/misses 133 / 0
Cache demotions 0
Cache promotions 20
Current LE 3
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Segments ---
Logical extents 0 to 2:
Type cache
Chunk size 64,00 KiB
Metadata format 1
Mode writethrough
Policy smq
Setting migration_threshold=100000
Report CMFmt column with cache metadata format version.
Report KMFmt column with 'kernel cache metadata format version' for device.
(a value reported from status).
(Update 'CacheMode' to name 'Cache' as primary segtype).
Only cache-pool segtype may store cache_metadata_format.
Only supported values are 0,1,2
Format 2 requires LV status uses LV_METADATA_FORMAT.
Format 0 (unselected) or 1 shall not set this 'incompatible' status.
Cache pool read/writes metadata_format within its segment type..
For CachePoolLV unselected metadata format is NOT stored in metadata.
For CacheLV when metadata format is not present/selected in lvm2 metadata,
it's automatically assumed to be the version 1 (backward compatible).
To ensure older lvm2 will not 'miss-read' metadata with new version 2,
such LV is marked with METADATA_FORMAT status flag (segment is
specifying metadata format). So when cache uses metadata format 2,
it will become inaccesible on older system without such support.
(kernel dm cache < 1.10, lvm2 < 2.02.169).
Add new profilable configation setting to let user select
which metadata format of a created cache pool he wish to use.
By default the 'best' available format is autodetected at runtime,
but user may enforce format 1 or 2 ATM.
Code also detects availability for metadata2 supporting cache target.
In case of troubles user may easily Disable usage of this feature
by placing 'metadata2' into global/cache_disabled_features list.
As now we can properly recognize all paramerters for pool creation,
we may drop PASS_ARG_ defines and rely on '_UNSELECTED' or 0 entries
as being those without user given args.
When setting are not given on command line - 'update' function
fill them from profiles or configuration. For this 'profile' arg
was needed to be passed around and since 'VG' itself is not needed,
it's been all replaced with 'cmd, profile, extents_size' args.
Since cache chunk might be huge and there is no technical need
to enforce rounding and there is actually more 'real' VG space
used then necessary - keep rounding on 'chunk' bounrary only
for thin volumes - where it's the space used anyway.
NB: we support conversion of any-size 'existing' LV into cached LV.
Fix missing reset of '*settings' pointer when no args were given.
Handle cache_chunk settings like all other settings, so it is properly
updated only with non-zero settings and the existing cache-pool
chunk_size is not being reconfigured.
User can specify metadata profile which stores important cache
geometry data for easy configuration.
Fix missing support for getting chunk_size, cache_mode, cache_policy
for a cache/cache pools volumes from configuration or metadata profile.
To more easily recognize unselected state from select '0' state
add new 'THIN_ZERO_UNSELECTED' enum.
Same applies to THIN_DISCARDS_UNSELECTED.
For those we no longer need to use PASS_ARG_ZERO or PASS_ARG_DISCARDS.
Basically code moving operation to have a single place resolving
thin_pool_chunk_size_policy.
Supported are generic & performance profiles.
Function is now shared between thin manipulation code and configuration
_CFG logic to obtain defaults and handle correct reporting upward coding
stack.
In addition to the already supported conversion between 2-legged
raid1 and raid5, raid1 and raid4 can be also converted into each
other with 2 legs (raid4/5 are limited to map a 2-legged raid1).
This patch supports the missing raid4 conversion in the sequence
linear -> 2-legged raid1 -> raid4/5, then restripe to more than one
data stripes for performance and resilience reasons and optionally
convert to striped/raid0.
The other conversion sequence is also possible by converting N-way
striped/raid0 to raid4/5, then restripe to 2 legs followed by a
conversion to raid1 and optionally to linear (loosing all resilience).
On conversion from striped to raid0, data LVs are created
and all segments and their respective areas of the striped
LV are moved across to new segments allocated for the raid0
image LVs. This can cause non-canonical segments to be added
to the image LVs.
Add a call to lv_merge_segments() once all segments have been
added to an image LV to compensate for that. This avoids
unsafe table loads on activation.
Fix comments.
Splitting off an image LV of a 2-legged
raid1 LV causes loss of resilience.
Ask user to avoid uninformed loss of all resilience.
Don't ask for N > 2 legged raid1 LVs.
Adjust tests.
Splitting off an image LV of a 2-legged raid1 LV tracking changes
causes loosing partial resilience for any newly written data set.
Full resilience will be provided again after the split off image LV
got merged back in and the new data set got fully synchronized.
Reason being that the data is only stored on the remaining single
writable image during the split.
Ask user to avoid uninformed loss of such partial resilience.
Don't ask for N > 2 legged raid1 LVs.
In case N images fail (N <= parity chunks) _and_
a "vgreduce --removemissing --force VG" was applied
a following repair of the RaidLV fails:
Unable to remove N images: Only 0 devices given.
Failed to remove the specified images from tb/r.
Failed to replace faulty devices in tb/r.
Fix as of this commit results in correct repair:
Faulty devices in tb/r successfully replaced.
seg->extents_copied has to be defined properly on reducing
the size of a raid LV or conversion from raid5 with 1 stripe
to raid1 will fail.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
Reshaping a raid5 LV to one stripe aiming to convert it to
raid1 (and optionally to linear) reports the wrong LV size
when still having reshape space allocated.
The lv_extend/_lv_reduce API doesn't cope with resizing RaidLVs
with allocated reshape space and ongoing conversions. Prohibit
resizing during conversions and remove the reshape space before
processing resize. Add missing seg->data_copies initialisation.
Fix typo/comment.
For the time being raid10 is limited to even number of total stripes
as is and 2 data copies. The number of stripes provided on creation
of a raid10(_near) LV with -i/--stripes gets doubled to define
that even total number of stripes (i.e. images).
Apply the same on disk adding conversions (reshapes) with
"lvconvert --stripes RaidLV" (e.g. 2 stripes = 4 images
total converted to 3 stripes = 6 images total).
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978