IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add vg metadata update functions
- add pre and post activation callback functions for
proper sequencing of sub lv activations during reshaping
- move and enhance _lv_update_reload_fns_reset_eliminate_lvs()
to support pre and post activation callbacks
- add _reset_flags_passed_to_kernel() which resets anyxi
rebuild/reshape flags after they have been passed into the kernel
and sets the SubLV remove after reshape flags on legs to be removed
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add function to support disk adding reshapes
- add function to support disk removing reshapes
- add function to support layout (e.g. raid5ls -> raid5_rs)
or stripesize reshaping
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add function providing state of a reshaped RaidLV
- add function to adjust the size of a RaidLV was
reshaped to add/remove stripes
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces more local infrastructure to raid_manip.c
used by followup patches.
Changes:
- add lv_raid_data_copies returning raid type specific number;
needed for raid10 with more than 2 data copies
- remove _shift_and_rename_image_components() constraint
to support more than 10 raid legs
- add function to calculate total rimage length used by out-of-place
reshape space allocation
- add out-of-place reshape space alloc/relocate/free functions
- move _data_rimages_count() used by reshape space alloc/realocate functions
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces local infrastructure to raid_manip.c
used by followup patches.
Add functions:
- to check reshaping is supported in target attibute
- to return device health string needed to check
the raid device is ready to reshape
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces infrastructure prerequisites to be used
by raid_manip.c extensions in followup patches.
This base is needed for allocation of out-of-place
reshape space required by the MD raid personalities to
avoid writing over data in-place when reading off the
current RAID layout or number of legs and writing out
the new layout or to a different number of legs
(i.e. restripe)
Changes:
- add members reshape_len to 'struct lv_segment' to store
out-of-place reshape length per component rimage
- add member data_copies to struct lv_segment
to support more than 2 raid10 data copies
- make alloc_lv_segment() aware of both reshape_len and data_copies
- adjust all alloc_lv_segment() callers to the new API
- add functions to retrieve the current data offset (needed for
out-of-place reshaping space allocation) and the devices count
from the kernel
- make libdm deptree code aware of reshape_len
- add LV flags for disk add/remove reshaping
- support import/export of the new 'struct lv_segment' members
- enhance lv_extend/_lv_reduce to cope with reshape_len
- add seg_is_*/segtype_is_* macros related to reshaping
- add target version check for reshaping
- grow rebuilds/writemostly bitmaps to 246 bit to support kernel maximal
- enhance libdm deptree code to support data_offset (out-of-place reshaping)
and delta_disk (legs add/remove reshaping) target arguments
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
The MD kernel raid1 personality does no use any writemostly leg as the primary.
In case a previous linear LV holding data gets upconverted to
raid1 it becomes the primary leg of the new raid1 LV and a full
resynchronization is started to update the new legs.
No writemostly and/or writebehind setting may be allowed during
this initial, full synchronization period of this new raid1 LV
(using the lvchange(8) command), because that would change the
primary (i.e the previous linear LV) thus causing data loss.
lvchange has a bug not preventing this scenario.
Fix rejects setting writemostly and/or writebehind on resychronizing raid1 LVs.
Once we have status in the lvm2 metadata about the linear -> raid upconversion,
we may relax this constraint for other types of resynchronization
(e.g. for user requested "lvchange --resync ").
New lvchange-raid1-writemostly.sh test is added to the test suite.
Resolves: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=855895
Previously when lvremove tried to remove 'active' origin,
it had been asking for every 'snapshot' LV separately
and doing individual single snapshot removals first.
To be faster it now deactivates origin before removal
all connected snapshots.
This avoids multiple reloads of dm table for origin volume
which were unnecessary as origin was meant to be removed as well.
. Define a prototype for every lvm command.
. Match every user command with one definition.
. Generate help text and man pages from them.
The new file command-lines.in defines a prototype for every
unique lvm command. A unique lvm command is a unique
combination of: command name + required option args +
required positional args. Each of these prototypes also
includes the optional option args and optional positional
args that the command will accept, a description, and a
unique string ID for the definition. Any valid command
will match one of the prototypes.
Here's an example of the lvresize command definitions from
command-lines.in, there are three unique lvresize commands:
lvresize --size SizeMB LV
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync, --reportformat String, --resizefs,
--stripes Number, --stripesize SizeKB, --poolmetadatasize SizeMB
OP: PV ...
ID: lvresize_by_size
DESC: Resize an LV by a specified size.
lvresize LV PV ...
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync,
--reportformat String, --resizefs, --stripes Number, --stripesize SizeKB
ID: lvresize_by_pv
DESC: Resize an LV by specified PV extents.
FLAGS: SECONDARY_SYNTAX
lvresize --poolmetadatasize SizeMB LV_thinpool
OO: --alloc Alloc, --autobackup Bool, --force,
--nofsck, --nosync, --noudevsync,
--reportformat String, --stripes Number, --stripesize SizeKB
OP: PV ...
ID: lvresize_pool_metadata_by_size
DESC: Resize a pool metadata SubLV by a specified size.
The three commands have separate definitions because they have
different required parameters. Required parameters are specified
on the first line of the definition. Optional options are
listed after OO, and optional positional args are listed after OP.
This data is used to generate corresponding command definition
structures for lvm in command-lines.h. usage/help output is also
auto generated, so it is always in sync with the definitions.
Every user-entered command is compared against the set of
command structures, and matched with one. An error is
reported if an entered command does not have the required
parameters for any definition. The closest match is printed
as a suggestion, and running lvresize --help will display
the usage for each possible lvresize command.
The prototype syntax used for help/man output includes
required --option and positional args on the first line,
and optional --option and positional args enclosed in [ ]
on subsequent lines.
command_name <required_opt_args> <required_pos_args>
[ <optional_opt_args> ]
[ <optional_pos_args> ]
Command definitions that are not to be advertised/suggested
have the flag SECONDARY_SYNTAX. These commands will not be
printed in the normal help output.
Man page prototypes are also generated from the same original
command definitions, and are always in sync with the code
and help text.
Very early in command execution, a matching command definition
is found. lvm then knows the operation being done, and that
the provided args conform to the definition. This will allow
lots of ad hoc checking/validation to be removed throughout
the code.
Each command definition can also be routed to a specific
function to implement it. The function is associated with
an enum value for the command definition (generated from
the ID string.) These per-command-definition implementation
functions have not yet been created, so all commands
currently fall back to the existing per-command-name
implementation functions.
Using per-command-definition functions will allow lots of
code to be removed which tries to figure out what the
command is meant to do. This is currently based on ad hoc
and complicated option analysis. When using the new
functions, what the command is doing is already known
from the associated command definition.
Remove allocate_pvs from raid_manip.c:_region_size_change_request() API
and lv_extend() using it added for temporary test purpose.
Related: rhbz1366296
Add:
- region size checks to raid_manip.c types array and supporting functions
- tests to lvconvert-raid-takeover.sh to check bogus
"lvconvert --type --regionsize N " requests
Related: rhbz1366296
Add:
- conversion support from striped/raid0/raid0_meta to/from raid10;
raid10 goes by the near format (same as used in creation of
raid10 LVs), which groups data copies together with their original
blocks (e.g. 3-way striped, 2 data copies resulting in 112233 in the
first stripe followed by 445566 in the second etc.) and is limited
to even numbers of legs for now
- related tests to lvconvert-raid-takeover.sh
- typo
Related: rhbz1366296
- support shrinking of raid0/1/4/5/6/10 LVs
- enhance lvresize-raid.sh tests: add raid0* and raid10
- fix have_raid4 in aux.sh to allow lv_resize-raid.sh
and other scripts to test raid4
Resolves: rhbz1394048
Commit cfb6ef654d introduced
support to change RAID region size.
Fix:
- don't change region_size until after prompting the user
- use log_print_unless_silent() instead of log_warn()
- avoid superfluous sigint() calls which are already
covered in yes_no_prompt()
- typo
Related: rhbz1392947
Commit cfb6ef654d introduced
support to change RAID region size.
Add:
- missing conditions to support any types to function with
it in lv_raid_convert(); temporary workaround used until
cli validation patches get merged
- tests requesting "-R " to lvconvert-raid-takeover.sh
involving a cleanup of the script
Related: rhbz1392947
Cleanups as of Jons review:
- enhance comment about mandatory raid4 <-> raid5_n activation w/o metadata SubLVs
- remove bogus segment flag setting
- fix to sync related comments on conversions to raid0/striped and amongst raid4/5
- add missing error message for non-synced conversion to raid0/striped
Related: rhbz1366296
Add:
- support to change region size of existing RaidLVs
(all RAID LV types but raid0/raid0_meta)
- lvconvert-raid-regionsize.sh with test variations
for different RAID types and region sizes
Resolves: rhbz1392947
Add:
- support for segment types raid6_{ls,rs,la,ra}_6
(striped raid with dedicated last Q-Syndrome SubLVs)
- conversion support from raid5_{ls,rs,la,ra} to/from raid6_{ls,rs,la,ra}_6
- setting convenient segtypes on conversions from/to raid4/5/6
- related tests to lvconvert-raid-takeover.sh factoring
out _lvcreate,_lvconvert funxtions
Related: rhbz1366296
Add:
- support for segment type raid6_n_6 (striped raid with dedicated last parity/Q-Syndrome SubLVs)
- conversion support from striped/raid0/raid0_meta/raid4 to/from raid6_n_6
- related tests to lvconvert-raid-takeover.sh
Related: rhbz1366296
Add:
- support for segment type raid5_n (striped raid with dedicated last parity SubLVs)
- conversion support from striped/raid0/raid0_meta/raid4 to/from raid5_n
- related tests to lvconvert-raid-takeover.sh
Related: rhbz1366296
With recent commit d6a74025df using
INTERNAL_ERROR while cheking layer LV - it's been noticed mirror
logic currently doesn't do a correct thing during upconversion and
does a full-try instead of checking only allocator capabilities.
This leads to invalid usage of layer.
To keep existing code running before providing a fix, relax
INTERNAL_ERROR just an error and keep the 'code' running.
Once mirror code is fixed, these all check should be switched
to internal errors.
Show proper internal error for failing command when there are some
inconsitencies in sizes of LV and its layer instead of rather
meaningless error code 5.
(Could be hit i.e. if user tried to 'resize' cached LV and then
uncache such LV.)
During rework of resize code this validation check
has been lost (in my resize branch). Upstream
is still not supporting resize of any cache type LV
so needs to be prevented.
When we need to clear dirty cache content of cached LV, there
is table reload which usually is shortly followed by next metadata
change. However udev can't (as of now) process udev event
while device is 'suspended'.
So whenever sequence of 'suspend/resume/suspend' is needed,
we need to wait first for finishing of 'resume' processing before
starting next 'suspend'. Otherwise there is 'race' danger of triggering
unwantend umount by systemd as such event will trigger
SYSTEMD_READY=0 state for a moment for such changed device.
Such race is pretty ugly to trace so we may need to review more
sequencies for missing 'sync'.
(Other option is to enhnace 'udev' rules processing to avoid
such dramatic actions to be happening for suspended devices).
Add to commits 87117c2b25 and 0b8bf73a63 to avoid refreshing two
times altogether, thus avoiding issues related to clustered, remotely
activated RaidLV. Avoid need to repeat "lvchange --refresh RaidLV"
two times as a workaround to refresh a RaidLV. Fix handles removal
of temporary *-missing-* devices created for any missing segments
in RAID SubLVs during activation.
Because the kernel dm-raid target isn't able to handle transiently
failing devices properly we need
"[dm-devel][PATCH] dm raid: fix transient device failure processing"
as well.
test: add lvchange-raid-transient-failures.sh
and enhance lvconvert-raid.sh
Resolves: rhbz1025322
Related: rhbz1265191
Related: rhbz1399844
Related: rhbz1404425
Move individual segment validation to a separate function
executed for 'complete_vg'.
Move some 'extra' validation bits from 'raid' validation to global
segtype validation (so extending existing normal validation)
TODO: still some test are left to be moved.
Reduce some duplication in validation process - there are still
some left thought so still room for improving overal speed.
It could be actually better to use even cache origin in
read-only mode so there could no be some 'acidental'
change being done on this volume.
This however need further tools enhancment - where we would need
to handle whole subtree on 'lvchange -pr/-prw'.
Add this functionality to lvconvert:
'lvconvert --thin cachedLV --thinpool vg/poll'
Converts cachedLV to external origin (which will be read-only).
New thin volume is created in thinpool LV and it's using external
origin as source for unprovisioned chunks.
This conversion happens online (while volume is in use).
Thin LV remains fully writable.
Cached external origin no longer could be written so cache will be used
ONLY for read operations. For this limitation we require cache mode
to be writethrough (as writeback cannot write to read-only volumes).
When thinLV is later removed cached external origin is again
fully usable, just note, LV remain in 'read-only' mode.
When read-write is needed, 'lvchange -prw' has to be used.
Single external origin could be user by multiple thinLV in
multiple differen thin pool.
External origin could be reloaded via more locks.
It's actually even more complex then thin-pool,
as it may be active on more nodes for linear LVs
(and maybe even more types).
External origin is always read-only thus unmodifiable
device so there should not be a problem accesing it
through multiple nodes.
Also for thin-pool check first presence of active thin-pool.
FIXME:
It's not easy to detect on which nodes this device is active
Thus manipulation with such device may require checking every
node and it active state and refresh.
But since such setup is quite complex to prepare and use,
hopefully there are not user trying to 'explore' this usage yet.
To be ready to show status of cache volume, call the status
with layer. Layer is automatically detected in this case when
cache volume is used in 'layered' form (needs -real suffix).
Avoid printing misleading message about single dirty block.
Instead properly detect condition where the 'cleaner' policy
needs to be installed without 'overloading' dirty variable.
Also print warning if we would be clearing read-only volume.
(it really shouldn't happen).
External origin could be activated as stand-alone device.
When the last thin LV is removed, external origin is no longer
the external origin and it's layer property was dropped.
Ensure dm table is correct by reloading external origin
(when it's active).
Activation of raid has brough up also splitted image with tracing
(without taking lock for this).
So when raid is now activate - such image is not put into
table (with _rmeta). When user needs such device, just active it.
Commit 0690392040 revealed a problem
in raid metadata manipulation.
We do two operations in one table reload:
- raid leg/image extraction
- rename remaining raid legs
This should be made in separate steps. Otherwise we do an
uncorrectable table change on error path (leaving tables
for admin and dmsetup).
As a hotfix - restore the previous logic and use a single
new function _lv_update_and_reload_list which activates exclusively
extracted LVs on the list before resuming suspended raid LV.
This restore 'rename' functionality upon resume.
Also still preserve the 'origin_only' logic - although we know
it's not working properly for cluster and LV stacking.
Further fixes are needed.
backup is not 'tested' for success and also it should
actually happen just when command is finished.
We do not target to make backups with each inter-step
metadata change.
RAID is LV property
TODO: only 2 flags are seg->status: PVMOVE & MERGING
At least the second one should be soon elimanted as again
we merge LV not a segment.
This is another place for 'common' use pattern or
reload and activation of deleted devices.
(Moving the exclusive activation to _deactivate_and_remove_lvs()).
TODO: looks like halve of raid function is reloading
just 'origin' - and the other full LV.
Fix order of operation when converting raid1 into old mirror.
Before any later metadata modification are initiated prepare
mirror_log device with all clearing.
Then directly convert raid1 into mirror with mirror_log.
This convertion now properly see as precommitted metadata
new 'mirror' and committed old 'raid' and is able to
preload all LVs.
Drop LV from passed API arg - it's always segment being checked.
Also use_layer is now in full control of lv_info_with_seg_status().
It decides which device needs to be checked to get 'the most info'.
TODO: future version should be able to expose status from
In case any SubLV of a RaidLV transiently fails, it needs
two "lvchange --refresh RaidLV" runs to get it to fully
operational mode again. Reason being, that lvm reloads all
targets for the RaidLV tree but doesn't resume the SubLVs
until after the whole tree has been reloaded in the first
refresh run. Thus the live mapping table of the SubLVs
still point to an "error" mapping and the dm-raid target
can't retrieve any superblock from the MetaLV(s) in processing
the constructor during this preload thus not discovering the
again accessible SubLVs. In the second run, the SubLV targets
map proper (meta)data, hence the constructor discovers those
fine now.
Solve by resuming the SubLVs of the RaidLV before
preloading the respective top-level RaidLV target.
Resolves: rhbz1399844
Avoid code duplication and use exiting commonly used
lv_update_and_reload() function.
There is still one place left where mirror is doing strange
double suspend call - needs there more thinking what's wrong with
that code.
When lvconvert adds a new leg - it's doing it free 'temporary' image
layer - however this temporary 'internal' mirror is also MIRRORED LV.
But the status bit was not properly transfered through layer.
Instead of clearing multiple rmeta device with sequential activation
process and waiting for udev for every _rmeta device separately,
activate all _rmeta devices first and then clear them and deactivate
afterwards.
Also update some tracing messages.
When anyhing goes wrong during clearing process, always try to
deactivate as much _rmeta devices as possible before fail.
(Automatic) repair may not be allowed during the initial sync of an upconverted
linear LV, because the data on the failing, primary leg hasn't been completely
synchronized to the N-1 other legs of the raid1 LV (replacing failed legs during
repair involves discontinuing access to any replaced legs data, thus preventing
data recovery on the primary leg e.g. via dd_rescue).
Even though repair would not cause data loss when adding legs to a fully synced
raid1 LV, we don't have information yet defining this state yet (e.g. a raid1
LV flag telling the fully synchronized status before any legs were added),
hence can't automatically decide to allow to repair.
If nonetheless a repair on a non-synced raid1 LVs is intended, the "--force"
option has to be provided.
Resolves: rhbz1311765
Check for dm-raid target version with non-standard raid4 mapping expecting the dedicated
parity device in the last rather than the first slot and prohibit to create, activate or
convert to such LVs from striped/raid0* or vice-versa in order to avoid data corruption.
Add related tests to lvconvert-raid-takeover.sh
Resolves: rhbz1388962
On conversions between striped/raid0* and raid4, the kernel expects
the dedicated raid4 parity SubLVs in the first segment area rather than
in the last it's been allocated to, thus the data mapping ain't proper.
Enhance lvconvert (lib/metadata/raid_manip.c) to shift the dedicated
parity SubLVs on conversions from striped/raid0* to raid4 and vice-versa.
In case of raid0_meta -> raid4 where the MD raid0 personality already has
stored RAID array device positions in the superblocks, the MetaLVs have to
be cleared so that the kernel doesn't fail validating the array positions
after lvm has shifted them up by one.
Add more tests to lvconvert-raid-takeover.sh including one to check for
mapping flaws by converting a created raid4 with filesystem -> striped
and fsck it.
Whilst on it:
- add missing direct striped -> raid4 conversion to the takeover array
to avoid an intermim conversion from striped -> raid0*
- clean up the takeover array
- allow lvconvert to actually call lv_raid_convert() on all takeover requests
in order to check parameters and display messages provided by takeover
functions rather than just "...not supported" from within lvconvert
- fix a typo
Resolves: rhbz1386148
Works if the pool is inactive.
Activation code doesn't notice a new raid dependency in on-disk metadata
when a thin LV is already active.
https://bugzilla.redhat.com/1365286
The dm-raid target now rejects device rebuild requests during ongoing
resynchronization thus causing 'lvconvert --repair ...' to fail with
a kernel error message. This regresses with respect to failing automatic
repair via the dmeventd RAID plugin in case raid_fault_policy="allocate"
is configured in lvm.conf as well.
Previously allowing such repair request required cancelling the
resynchronization of any still accessible DataLVs, hence reasoning
potential data loss.
Patch allows the resynchronization of still accessible DataLVs to
finish up by rejecting any 'lvconvert --repair ...'.
It enhances the dmeventd RAID plugin to be able to automatically repair
by postponing the repair after synchronization ended.
More tests are added to lvconvert-rebuild-raid.sh to cover single
and multiple DataLV failure cases for the different RAID levels.
- resolves: rhbz1371717
Commit 199697accf rerouted funtion
for priting cache volume origin to lvm2app app function - which
however had a bug. So restore the original functionality
and print correct LV as cache origin LV.
Unconditionally guard there is at least 1/4 of metadata volume
free (<16Mib) or 4MiB - whichever value is smaller.
In case there is not enough free space do not let operation proceed and
recommend thin-pool metadata resize (in case user has not
enabled autoresize, manual 'lvextend --poolmetadatasize' is needed).
In the case there is no active thin volume, report thin pool
as lock holder. This fixed function like lvextend
which either expecte lock holder LV is some active thin
or 'possibly' inactive thin pool.
The existing code doesn't understand that mirror logs should cling to
parallel LVs (like extending them) instead of avoiding them.
As a quick workaround to avoid lvcreate failures, hard-code
--alloc normal for mirror logs even if the rest of the allocation
used a stricter policy.
https://bugzilla.redhat.com/show_bug.cgi?id=1376532
Reinstantiate reporting of metadata percent usage for cache volumes.
Also show the same percentage with hidden cache-pool LV.
This regression was caused by optimization for a single-ioctl in
2.02.155.
Introduce 'hard limit' for max number of cache chunks.
When cache target operates with too many chunks (>10e6).
When user is aware of related possible troubles he
may increase the limit in lvm.conf.
Also verbosely inform user about possible solution.
Code works for both lvcreate and lvconvert.
Lvconvert fully supports change of chunk_size when caching LV
(and validates for compatible settings).
'pvmove -n name pv1 pv2' allows to collocate multiple RAID SubLVs
on pv2 (e.g. results in collocated raidlv_rimage_0 and raidlv_rimage_1),
thus causing loss of resilence and/or performance of the RaidLV.
Fix this pvmove flaw leading to potential data loss in case of PV failure
by preventing any SubLVs from collocation on any PVs of the RaidLV.
Still allow to collocate any DataLVs of a RaidLV with their sibling MetaLVs
and vice-versa though (e.g. raidlv_rmeta_0 on pv1 may still be moved to pv2
already holding raidlv_rimage_0).
Because access to the top-level RaidLV name is needed,
promote local _top_level_lv_name() from raid_manip.c
to global top_level_lv_name().
- resolves rhbz1202497
Adding MetaLVs to given DataLVs (e.g. raid0 -> raid0_meta takeover),
_avoid_pvs_with_other_images_of_lv() was missing code to prohibit
allocation when called with a just allocated MetaLV to prohibit
collaocation of the next allocated MetaLV on the same PV.
- resolves rhbz1366738
Enforce mirror/raid0/1/10/4/5/6 type specific maximum images when
creating LVs or converting them from mirror <-> raid1.
Document those maxima in the lvcreate/lvconvert man pages.
- resolves rhbz1366060
'lvchange --resync LV' or 'lvchange --syncaction repair LV' request the
RAID layout specific parity blocks in raid4/5/6 to be recreated or the
mirrored blocks to be copied again from the master leg/copy for raid1/10,
thus not allowing a rebuild of a particular PV.
Introduce repeatable option '--[raid]rebuild PV' to allow to request
rebuilds of specific PVs in a RaidLV which are known to contain corrupt
data (e.g. rebuild a raid1 master leg).
Add test lvchange-rebuild-raid.sh to test/shell doing rebuild
variations on raid1/10 and 5; add aux function check_status_chars
to support the new test.
- Resolves rhbz1064592
introduced with commit 8f62b7bfe5 rely on complete
defintions of the relations between the LVs of a VG.
Hence only run these checks when the complete_vg flag
is set on calls to check_lv_segments().
lvconvert failed in test lvconvert-thin-raid.sh when
calling check_lv_segments() from _read_segments() without
providing a complete definition.
General RAID and RAID segment type specific checks are added
to merge.c. New static _check_raid_seg() is called on each segment
of a RaidLV (which have just one) from check_lv_segments().
New checks caught some unititialized segment members
which are addressed here as well:
- initialize seg->region_size to 0 in lvcreate.c for raid0/raid0_meta
- initialize list seg->origin_list in lv_manip.c
When volume was lvconvert-ed to a thin-volume with external origin,
then in case thin-pool was in non-zeroing mode
it's been printing WARNING about not zeroing thin volume - but
this is wanted and expected - so nothing to warn about.
So in this particular use case WARNING needs to be suppressed.
Adding parameter support for lvcreate_params.
So now lvconvert creates 'normal thin LV' in read-only mode
(so any read will 'return 0' for a moment)
then deactivate regular thin LV and reacreate in 'final R/RW' mode
thin LV with external origin and activate again.
Before, the automatic update from older to newer version of PV extension
header happened within vg_write call. This may have caused problems under
some circumnstances where there's a code in between vg_write and vg_commit
which may have failed. In such situation, we reverted precommitted metadata
and put back the state to working version of VG metadata.
However, we don't have revert for PV write operation at the moment. So
if we updated PV headers already and we reverted vg_write due to failure
in subsequent code (before vg_commit), we ended up with lost VG metadata
(because old metadata pointers got reset by the PV write operation).
To minimize problematic situations here, we should put vg_write and
vg_commit that is done after PV header rewrites as close to each
other as possible.
This patch moves the automatic PV header rewrite for new extension
header part from vg_write to _vg_read where it's done the same way
as we do any other VG repairs if detected during VG read operation
(under VG write lock).
Any failing stripes in raid0/raid0_meta type LVs cause data loss,
thus replacement via 'lvconvert --replace...' does not make sense.
Patch prohibits replacement on raid0/raid0_meta LVs.
- resolves rhbz1356734
Commit ca878a3426 changed behavior
or resize operation. Later the code has been futher changed
to skip fs resize completely when size of LV is already matching
and finaly at the most recent resize changeset for resize the
check for matching size has been eliminated as well so we ended
with a request call to resize fs to 0 size in some cases.
This commit reoders some test so the prompt happens just once before
resize of possibly 2 related volumes.
Also extra test for having LV already given size is added, and
whole metadata update is skipped for this case as the only
result would be an increment of seqno.
However the filesystem is still resized when requested,
so if the LV has some size and the resize is resolved to
the same size, the filesystem resize is called so in case FS
would not match, the resize will happen.
A livelock occurs on extension in lv_manip when adjusting the region size,
which doesn't apply to any raid0/raid0_meta LVs (these don't have a bitmap).
Fix by prohibiting the region size adjustment on any such LVs.
- resolves rhbz1354604
An unconditional access to the non-existing MetaLV of a raid0 LV in
lv_raid_remove_missing() was causing the segfault.
Only call log_debug() on replacements of existing MetaLVs.
- resolves rhbz1354646
Apply the same idea as vg_update.
Before doing the VG remove on disk, invalidate
the VG in lvmetad. After the VG is removed,
remove the VG in lvmetad. If the command fails
after removing the VG on disk, but before removing
the VG metadata from lvmetad, then a subsequent
command will see the INVALID flag and not use the
stale metadata from lvmetad.
Previously, a command sent lvmetad new VG metadata in vg_commit().
In vg_commit(), devices are suspended, so any memory allocation
done by the command while sending to lvmetad, or by lvmetad while
updating its cache could deadlock if memory reclaim was triggered.
Now lvmetad is updated in unlock_vg(), after devices are resumed.
The new method for updating VG metadata in lvmetad is in two phases:
1. In vg_write(), before devices are suspended, the command sends
lvmetad a short message ("set_vg_info") telling it what the new
VG seqno will be. lvmetad sees that the seqno is newer than
the seqno of its cached VG, so it sets the INVALID flag for the
cached VG. If sending the message to lvmetad fails, the command
fails before the metadata is committed and the change is not made.
If sending the message succeeds, vg_commit() is called.
2. In unlock_vg(), after devices are resumed, the command sends
lvmetad the standard vg_update message with the new metadata.
lvmetad sees that the seqno in the new metadata matches the
seqno it saved from set_vg_info, and knows it has the latest
copy, so it clears the INVALID flag for the cached VG.
If a command fails between 1 and 2 (after committing the VG on disk,
but before sending lvmetad the new metadata), the cached VG retains
the INVALID flag in lvmetad. A subsequent command will read the
cached VG from lvmetad, see the INVALID flag, ignore the cached
copy, read the VG from disk instead, update the lvmetad copy
with the latest copy from disk, (this clears the INVALID flag
in lvmetad), and use the correct VG metadata for the command.
(This INVALID mechanism already existed for use by lvmlockd.)
Add code to support more LVs to be resized through a same code path
using a single lvresize_params struct.
(Now it's used for thin-pool metadata resize,
next user will be snapshot virtual resize).
Update code to adjust percent amount resize for use_policies.
Properly activate inactive thin-pool in case of any pool resize
as the command should not 'deffer' this operation to next activation.
Use common API design and pass just LV pointer to lv_manip.c functions.
Read cmd struct via lv->vg->cmd when needed.
Also do not try to return EINVALID_CMD_LINE error when we
have already openned VG - this error code can only be returned before
locking VG.
A number of places are working on a specific dev when they
call lvmcache_info_from_pvid() to look up an info struct
based on a pvid. In those cases, pass the dev being used
to lvmcache_info_from_pvid(). When a dev is specified,
lvmcache_info_from_pvid() will verify that the cached
info it's using matches the dev being processed before
returning the info. Calling code will not mistakenly
get info for the wrong dev when duplicate devs exist.
This confusion was happening when scanning labels when
duplicate devs existed. label_read for the first dev
would add an info struct to lvmcache for that dev/pvid.
label_read for the second dev would see the pvid in
lvmcache from first dev, and mistakenly conclude that
the label_read from the second dev can be skipped
because it's already been done. By verifying that the
dev for the cached pvid matches the dev being read,
this mismatch is avoided and the label is actually read
from the second duplicate.
Add function to obtain percentage value for cache lv_seg_status.
This API is rather evolving 'middle' step as the ultimate goal
is segment API fuctionality.
But first we need to be clear at reporting level which values
are needed to be reported for which LVs and segments.
lv_refresh_suspend_resume() has escaped with fail ret code
after failing suspend and could have left many volumes in suspend state.
So always unconditionally call resume also when suspend has failed.
Check first the LV is cow before even checking it's a merging COW.
Note: previosly merging_cow was also merging origin, so without
this explicit check it used to return '1' also when passed
LV has been merging origin.
When mirror/raid called copy_percent function to return,
when 100% was supposed to be returned, wrong float 100.0 value
could have been reported back instead of dm_percent_t DM_PERCENT_100.
There is broken API somewhere, since the function here rely on
actively being modifid VG content even when doing 'lvs' operation.
(extents_copies)
This refactors the code for autoactivation. Previously,
as each PV was found, it would be sent to lvmetad, and
the VG would be autoactivated using a non-standard VG
processing function (the "activation_handler") called via
a function pointer from within the lvmetad notification path.
Now, any scanning that the command needs to do (scanning
only the named device args, or scanning all devices when
there are no args), is done first, before any activation
is attempted. During the scans, the VG names are saved.
After scanning is complete, process_each_vg is used to do
autoactivation of the saved VG names. This makes pvscan
activation much more similar to activation done with
vgchange or lvchange.
The separate autoactivate phase also means that if lvmetad
is disabled (either before or during the scan), the command
can continue with the activation step by simply not using
lvmetad and reverting to disk scanning to do the
activation.
Add support for active cache LV.
Handle --cachemode args validation during command line processing.
Rework some lvm2 internal to use lvm2 defined CACHE_MODE enums
indepently on libdm defines and use enum around the code instead
of passing and comparing strings.
When there are duplicate devices for a PV, one device
is preferred and chosen to exist in the VG. The other
devices are not used by lvm, but are displayed by pvs
with a new PV attr "d", indicating that they are
unchosen duplicate PVs.
The "duplicate" reporting field is set to "duplicate"
when the PV is an unchosen duplicate, and that field
is blank for the chosen PV.
Wait to compare and choose alternate duplicate devices until
after all devices are scanned. During scanning, the first
duplicate dev is kept in lvmcache, and others are kept in a
new list (_found_duplicate_devs).
After all devices are scanned, compare all the duplicates
available for a given PVID and decide which is best.
If the dev used in lvmcache is changed, drop the old dev
from lvmcache entirely and rescan the replacement dev.
Previously the VG metadata from the old dev was kept in
lvmcache and only the dev was replaced.
A new config setting devices/allow_changes_with_duplicate_pvs
can be set to 0 which disallows modifying a VG or activating
LVs in it when the VG contains PVs with duplicate devices.
Set to 1 is the old behavior which allowed the VG to be
changed.
The logic for which of two devs is preferred has changed.
The primary goal is to choose a device that is currently
in use if the other isn't, e.g. by an active LV.
. prefer dev with fs mounted if the other doesn't, else
. prefer dev that is dm if the other isn't, else
. prefer dev in subsystem if the other isn't
If neither device is preferred by these rules, then don't
change devices in lvmcache, leaving the one that was found
first.
The previous logic for preferring a device was:
. prefer dev in subsystem if the other isn't, else
. prefer dev without holders if the other has holders, else
. prefer dev that is dm if the other isn't
Support parsing --chunksize option also when converting.
Now user can use cache pool created with i.e. 32K chunksize,
while in caching user can select 512K blocks.
Tool is supposed to validate cache metadata size is big enough
to support such chunk size. Otherwise error is shown.
When creating LV - in some case we change created segment type
(ATM for cache and snapshot) and we then manipulate with
lv segment according to 'lp' segtype.
Fix this by checking for proper type before accessing segment members.
This makes command like:
lvcreate --type cache-pool -L10 vg/cpool
lvcreate -H -L10 --cachesettings migtation_threshold=10000 vg/cpool
to pass since now tool correctly selects default cache policy.
If there's an activation volume_filter, it might not be possible
to activate the rmeta LVs to wipe them. At least inherit any
LV tags from the parent LV while attempting this.
Checking for devices uses is_missing_pv() to check
if there is a device for the PV. is_missing_pv()
is based on the MISSING_PV flag, which does not
always correspond to !pv->dev. When using lvmetad,
a command like:
pvs --config 'devices/filter=["a|/dev/sdb|", "r|.*|"]'
will cause a number of PVs to have NULL pv->dev, but
not the MISSING_PV flag. So, NULL pv->dev needs to
also be checked.
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
Probably not worth mentioning "segments" here, just state that devices
for an LV can't be all found during the check - it's less mysterious for
user then:
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find all devices for LV fedora/root while checking used and assumed devices.
WARNING: Couldn't find all devices for LV fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
When checking assumed PVs against real devices used for LVs and if
there's no device assigned for an assumed PV (e.g. due to filters),
do log_warn instead of log_error and continue checking LV segments
and associated assumed PVs further, just like we do log_warn elsewhere
in this situation.
This way user will see the warning for each LV which couldn't be
checked completely against real PVs used. Before, we logged only
the very first occurence of missing device for an LV in a VG and we
returned from the function doing this check for all the LVs in VG
immediately which may be a bit misleading because it didn't tell
user about all the other LVs and whether they could be checked
or not.
For example, we have this setup:
[0] fedora/~ # pvs
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
/dev/vda2 fedora lvm2 a-- 19.49g 0
[0] fedora/~ # lvs -o+devices
LV VG Attr LSize Devices
root fedora -wi-ao---- 19.00g /dev/vda2(0)
swap fedora -wi-ao---- 500.00m /dev/vda2(4864)
Before this patch (only the very first LV in a VG is logged to have a
problem while checking used and assumed devices):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
With this patch applied (all LVs where we hit problem while checking
used and assumed devices are logged and it's warning, not error):
[0] fedora/~ # pvs --config 'devices/filter=["a|/dev/sda|", "r|.*|"]'
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Device for PV Qcxpcy-XgtP-UD3s-PmG0-qLyE-Z0ho-DYsxoz not found or rejected by a filter.
WARNING: Couldn't find device for segment belonging to fedora/root while checking used and assumed devices.
WARNING: Couldn't find device for segment belonging to fedora/swap while checking used and assumed devices.
PV VG Fmt Attr PSize PFree
/dev/sda lvm2 --- 128.00m 128.00m
[unknown] fedora lvm2 a-m 19.49g 0
vg/snapshotN should not appear anywhere.
No code should be showing this, but it was noticed in some logs last
week and we can deal with it in display_lvname().
The lvmetad connection is created within the
init_connections() path during command startup,
rather than via the old lvmetad_active() check.
The old lvmetad_active() checks are replaced
with lvmetad_used() which is a simple check that
tests if the command is using/connected to lvmetad.
The old lvmetad_set_active(cmd, 0) calls, which
stopped the command from using lvmetad (to revert to
disk scanning), are replaced with lvmetad_make_unused(cmd).
It's possible for an LVM LV to use a device during activation which
then differs from device which LVM assumes based on metadata later on.
For example, such device mismatch can occur if LVM doesn't have
complete view of devices during activation or if filters are
misbehaving or they're incorrectly set during activation.
This patch adds code that can detect this mismatch by creating
VG UUID and LV UUID index while scanning devices for device cache.
The VG UUID index maps VG UUID to a device list. Each device in the
list has a device layered above as a holder which is an LVM LV device
and for which we know the VG UUID (and similarly for LV UUID index).
We can acquire VG and LV UUID by reading /sys/block/<dm_dev_name>/dm/uuid.
So these indices represent the actual state of PV device use in
the system by LVs and then we compare that to what LVM assumes
based on metadata.
For example:
[0] fedora/~ # lsblk /dev/sdq /dev/sdr /dev/sds /dev/sdt
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdq 65:0 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev1 253:3 0 104M 0 mpath
sdr 65:16 0 104M 0 disk
`-mpath_dev1 253:3 0 104M 0 mpath
sds 65:32 0 104M 0 disk
|-vg-lvol0 253:2 0 200M 0 lvm
`-mpath_dev2 253:4 0 104M 0 mpath
sdt 65:48 0 104M 0 disk
`-mpath_dev2 253:4 0 104M 0 mpath
In this case the vg-lvol0 is mapped onto sdq and sds becauset this is
what was available and seen during activation. Then later on, sdr and
sdt appeared and mpath devices were created out of sdq+sdr (mpath_dev1)
and sds+sdt (mpath_dev2). Now, LVM assumes (correctly) that mpath_dev1
and mpath_dev2 are the PVs that should be used, not the mpath
components (sdq/sdr, sds/sdt).
[0] fedora/~ # pvs
Found duplicate PV xSUix1GJ2SK82ACFuKzFLAQi8xMfFxnO: using /dev/mapper/mpath_dev1 not /dev/sdq
Using duplicate PV /dev/mapper/mpath_dev1 from subsystem DM, replacing /dev/sdq
Found duplicate PV MvHyMVabtSqr33AbkUrobq1LjP8oiTRm: using /dev/mapper/mpath_dev2 not /dev/sds
Using duplicate PV /dev/mapper/mpath_dev2 from subsystem DM, ignoring /dev/sds
WARNING: Device mismatch detected for vg/lvol0 which is accessing /dev/sdq, /dev/sds instead of /dev/mapper/mpath_dev1, /dev/mapper/mpath_dev2.
PV VG Fmt Attr PSize PFree
/dev/mapper/mpath_dev1 vg lvm2 a-- 100.00m 0
/dev/mapper/mpath_dev2 vg lvm2 a-- 100.00m 0
Commit b64703401d cause regression
when handling stacked resize of pool metadata volume that would
be a raid LV.
Fix it by properly setting up size also for layer extension.
There's a window between doing VG read and checking PV device size
against real device size. If the device is removed in this window,
the dev cache still holds struct device and pv->dev still references
that and that PV is not marked as missing. However, if we're trying
to get size for such device, the open fails because that device
doesn't exists anymore.
We called existing pv_dev_size in _check_pv_dev_sizes fn. But
pv_dev_size assigned a size of 0 if the dev_get_size it called failed
(because the device is gone).
So call the dev_get_size directly and check for the return code
in _check_pv_dev_sizes and go further only if we really know the
device size. This is to avoid confusing warning messages like:
Device /dev/sdd1 has size of 0 sectors which is smaller than corresponding PV size of 31455207 sectors. Was device resized?
One or more devices used as PVs in VG helter_skelter have changed sizes.
When a command modifies a PV or VG, or changes the
activation state of an LV, it will send a dbus
notification when the command is finished. This
can be enabled/disabled with a config setting.
Historical LV is valid as long as there is at least one live LV among
its ancestors. If we find any invalid (dangling) historical LVs, remove
them automatically.
The vg_strip_outdated_historical_lvs iterates over the list of historical LVs
we have and it shoots down the ones which are outdated.
Configuration hook to set the timeout will be in subsequent patch.