IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This bug showed up when trying to add a log to a mirror whose images are on
multiple devices. This is an intra-release regression and no WHATS_NEW
entry will be added. The error was introduce in the following commit:
2d8a2f35c7
The solution is to recognise in _alloc_init that if there are no mirrors
or stripes specified, then 'new_extents' should be zero.
to settle udev before calling deactivate_lv.
This is an intra-release regression (no WHATS_NEW entry required). It is
part of the fix for the current WHATS_NEW entry:
Work around resume_lv causing error LV scanning during splitmirror operation.
When user wants to remove thin pool - check if there are no thin volumes using it.
If so - query before removal (or -ff for no question) and remove them first.
When LV is unlinked, we want to catch problem in vg_validate,
that LV has changed.
i.e. catch LV has been removed and is no long thin_pool while still
being referenced by some thin volume.
Revert John patch, which fixed only 1 place where ~LVM_WRITE was in use and
convert ommited LVM_READ/WRITE flags to 64bit constants as well.
(Since both 'status' flags for LV and VG are 64bit.)
Changing lv_mirror_count to only count the AREA_LVs made the function
stop working for PVMOVE mirrors. A conditional has been added to fix
that problem. Additionally, when counting the images in a mirror stack,
we don't need to subtract 1 from the count we get back from the
lv_mirror_count call on the temporary mirror layer. (This is because we
are no falsely counting the top layer of the temporary mirror.)
lv_mirror_count was not able to handle mirrors of stripes properly. When a
failed device is removed, the MIRRORED status flag is removed from the LV
conditionally based on the results of lv_mirror_count. However, lv_mirror_count
trusted the MIRRORED flag - thinking any such LV must be mirrored. It would
happily assign first_seg(lv)->area_count as the number of mirrors, but when
a mirrored striped LV was reduced to a simple striped LV area_count would be
the number of /stripes/ not the number of /mirrors/. A result higher than 1
would be returned from lv_mirror_count, the MIRRORED flag would not be cleared,
and the LV would fail to be up-converted properly in lvconvert_mirrors_aux
because of it.
The operation of deactivating the residual error target LV after removing a
mirror layer can cause a "device in-use" conflict with udev. Giving udev a
poke before calling deactivate_lv eliminates the conflict. The stick used
to poke udev is 'sync_local_dev_names'.
Kernel requires a mirror to be at least 1 region large. So,
if our mirror log is itself a mirror, it must be at least
1 region large. This restriction may not be necessary for
non-mirrored logs, but we apply the rule anyway.
(The other option is to make the region size of the log
mirror smaller than the mirror it is acting as a log for,
but that really complicates things. It's much easier to
keep the region_size the same for both.)
WHATS_NEW entry:
Fix log size calculation when only a log is being added to a mirror.
The original fix pass the mirror LV to allocate_extents (rather than
passing NULL) so that _alloc_init could correctly determine the necessary
size of the mirror log. In the previous check-in, I noted:
In order to get a decent value computed, we need to pass in the 'lv' argument
to allocate_extents. This would normally imply a desire for cling/contiguous
allocation to the given LV, but since we are not allocating any parallel
extents and only log extents, it works fine.
However, passing in the LV did have unintended consequences on the placement of
the log. The better solution is to pass in the number of extext that are in
the mirror LV instead of the LV itself. This will not cause the allocator to
reserve that number of extents, because 'stripes' and 'mirrors' are specified
as 0. Thus, 'extents' is used to calculate the size of the log, but won't
affect how much is allocated.
LVM_WRITE is a 32-bit flag. Now that RAID[_IMAGE|_META] are 64-bit,
and'ing a RAID LV's status against LVM_WRITE can reset the higher order
flags.
A similar thing will affect thinp flags if not careful.
_alloc_init calculates the number of necessary log extents via
'mirror_log_extents'. 'mirror_log_extents' takes 3 arguments: region_size,
pe_size, and size of the mirror LV. Unfortunately, _alloc_init is guessing at
the mirror size by using 'ah->new_extents / ah->area_multiple' - the number of
extents that the mirror images have. However, this is /always/ wrong when
allocating the log separately. Further, the log is always allocated separately
unless we are up-converting the mirror at the same time. It was by luck alone
that a default value of '1' reflects what we want in most cases.
In order to get a decent value computed, we need to pass in the 'lv' argument
to allocate_extents. This would normally imply a desire for cling/contiguous
allocation to the given LV, but since we are not allocating any parallel
extents and only log extents, it works fine.
When an image is split from a 2-way mirror, the original mirror is converted to
a linear device. To do this, the top "layer" must be removed. The segments
are transferred from the sub-lv to the top-level LV and the link is severed.
The former sub-lv - having its segments transferred - now contains a temporary
error target.
When the original LV is resumed, the old sub-lv that now contains an error
segment is activated and scanned. This is what causes the I/O error messages.
There are three ways to fix this problem:
1) Do not set the sub-lv which contains the error target as "visible" before
suspending the original LV. This way, when the original is resumed, the sub-lv
device node is not created and it is not scanned - avoiding the error messages.
The problem with this approach is that if the machine crashes after the
resume, it leaves the *hidden* LV in place and the user has a more difficult
time noticing that it needs to be cleaned up. Thus, this type of processing is
frowned upon.
2) Do like _remove_mirror_images does and suspend the original, then suspend
the sub-lv (the error target), then resume the sub-lv, and finally resume the
original LV. This seems like extra pointless operations to me, but it does not
produce the error message (although, I'm not sure why) and it allows us to
leave the visible flag in place.
3) Flag the sub-lv (error target) with a "do not scan" flag. This seems like
the cleanest approach, but I have been unable to find the method for doing
this. LVs get tagged in such a way by _get_udev_flags, but in this case the
resume of the original LV also resumes the error target LV without running it
through _get_udev_flags (likely because they are no longer linked). Could
there be something wrong in resume_lv?
Option #2 was chosen to fix this bug, but it seems like more of a workaround
for now.
A gentle reminder that anyone relying on the output of reporting commands
like lvs in scripts must use -o to guarantee they get the fields they expect.
The default sequence of fields can change from release to release.
Equally, the 'attr' fields can have new values introduced and/or characters
appended to them.
Makes dumpconfig whole-section output wrong in a different way from before,
but we should be able to merge cft_cmdline properly into cmd->cft now and
remove cascade.
There was a bad sequence:
*) Make changes to LV layout to split images (e.g. 4-way -> 2-way/2-way)
1) vg_write, suspend_lv(original_mirror), vg_commit
2) activate_lv(newly_split_lv)
3) resume_lv(original_mirror)
Step #2 is not allowed. However, without it, the resume of the original
mirror will also resume its former sub-LVs - making it impossible to
activate the newly split LV due to the changes in layering, pointers, and
names that had already been made. Additionally, the resume or the original
brings the sub-lv's online with names that differ from the metadata on disk -
also a no-no. Thus, the split must be done in stages such that the active LVs
always reflect what is in the committed LVM metadata.
First, alter the original mirror by releasing the images. The images are made
visible and independent as an intermediate stage. (This way, we can have
consistency between LVM metadata and active LVs.) The second stage collects
the recently split LVs, deactivates them, forms them into a mirror if necessary,
and then activates them. It is a bit of a circuitous method, but it is the only
way to split a mirror from a mirror and obey these general rules:
1) Never [de]activate sub-lvs when the top-level LV is suspended
2) Avoid having active LVs that differ from the description in the LVM metadata
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
functionality. A number of bugs (copied and pasted all over the code) should
disappear:
- most string lookup based on dm_config_find_node would segfault when
encountering a non-zero integer (the intention there was to print an
error message instead)
- check for required sections in metadata would have been satisfied by
values as well (i.e. not sections)
- encountering a section in place of expected flag value would have
segfaulted (due to assumed but unchecked cn->v != NULL)
leaving behind the LVM-specific parts of the code (convenience wrappers that
handle `struct device` and `struct cmd_context`, basically). A number of
functions have been renamed (in addition to getting a dm_ prefix) -- namely,
all of the config interface now has a dm_config_ prefix.
There's a very high memory usage when calling _pv_analyse_mda_raw (e.g. while
executing pvck) that can end up with "out of memory".
_pv_analyse_mda_raw scans for metadata in the MDA, iteratively increasing the
size to scan with SECTOR_SIZE until we find a probable config section or we're
at the edge of the metadata area. However, when using a memory pool, we're also
iteratively chasing for bigger and bigger mempool chunk which can't be found
and so we're always allocating a new one, consuming more and more memory...
This patch just changes the mempool to direct memory allocation in this
problematic part of the code.
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way. It does not yet support upconverting linear to n-way.
The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild. This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'. (There is no
mirror-on-mirror layering required.)
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Users already have the ability to split an image from an LV of "mirror"
segtype. This patch extends that ability to LVs of "raid1" segtype.
This patch only allows a single image to be split off, however. (The
"mirror" segtype allows an arbitrary number of images to be split off.
e.g. 4-way => 3-way/linear, 2-way/2-way, linear,3-way)
of top-level LV.
We can't activate sub-lv's that are being removed from a RAID1 LV while it
is suspended. However, this is what was being used to have them show-up
so we could remove them. 'sync_local_dev_names' is a sufficient and
proper replacement and can be done after the top-level LV is resumed.
1) add new function 'raid_remove_top_layer' which will be useful
to other conversion functions later (also cleans up code)
2) Add error messages if raid_[extract|add]_images fails
3) Add function prototypes to prevent compiler warnings when
compiling with '--with-raid=shared'
Fix a couple more issues that kabi found.
- Add some error messages in failure cases
- s/malloc/zalloc/
- use vg->vgmem for lv names instead of vg->cmd->mem
Add config option to enable crc checking of VG structures.
Currently it's disabled by default.
For the internal test-suite this check it is enabled.
Note: In the case the internal error is detected, debug build with
compile option DEBUG_ENFORCE_POOL_LOCKING helps to catch the source
of the problem.
Use debug pool locking functionality. So the command could check,
whether the memory in the pool has not been modified.
For lv_postoder() instead of unlocking and locking for every changed
struct status member do it once when entering and leaving function.
(mprotect would trap each such memory access).
Currently lv_postoder() does not modify other part of vg structure
then status flags of each LV with flags that are reverted back to
its original state after function exit.
Extend vginfo cache with cached VG structure. So if the same metadata
are use, skip mda decoding in the case, the same data are in use.
This helps for operations like activation of all LVs in one VG,
where same data were decoded giving the same output result.
Patch adds 1-to-1 connection between volume_group and lvmcache_vginfo.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
As this flag could not have been set by the current code - removing it.
Note: because of the wrong code logic this call:
lvmcache_update_vg(correct_vg, correct_vg->status & PRECOMMITTED &
(inconsistent ? INCONSISTENT_VG : 0));
had always passed '0' - now after flag removal it's passing
PRECOMMITTED flag in - this present functinal change in this patch.
To match the original functionality - 0 had to be always passed.
More testing is needed here.
Compiler complaining that meta_lv could be used uninitialized. (Not true
because it is protected by 'clear_metadata'.) I switched to using 'lv->vg',
as it makes no difference to vg_[write|commit].
(here clvmd crashed in the middle of operation),
lock is not removed from cache - here is one example:
locking/cluster_locking.c:497 Locking VG V_vg_test UN (VG) (0x6)
locking/cluster_locking.c:113 Error writing data to clvmd: Broken pipe
locking/locking.c:399 <backtrace>
locking/locking.c:461 <backtrace>
Internal error: Volume Group vg_test was not unlocked
Code should always remove lock info from lvmcache and update counters
on unlock, even if unlock fails.
Today, we use "suppress_messages" flag (set internally in init_locking fn based
on 'ignorelockingfailure() && getenv("LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES")'.
This way, we can suppress high level messages like "File-based locking
initialisation failed" or "Internal cluster locking initialisation failed".
However, each locking has its own sequence of initialization steps and these
could log some errors as well. It's quite misleading for the user to see such
errors and warnings if the "--sysinit" is used (and so the ignorelockingfailure
&& LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable). Errors and
warnings from these intermediary steps should be suppressed as well if requested.
This patch propagates the "suppress_messages" flag deeper into locking init
functions. I've also added these flags for other locking types for consistency,
though it's not actually used for no_locking and readonly_locking.
Last usage was removed in Petr's commit related to VG mda repair fix
where relaxed check starts to ignore inconsistencies coming from
PVs that are marked MISSING - thus removing unused variable.
Implementation described in doc/lvm2-raid.txt.
Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
This should be set by default! Normally we have "activation/udev_sync = 1"
in lvm.conf (example.conf.in). But if we use lvm2 without any config file
(or without a definition within '--config' option) the DEFAULT_UDEV_SYNC
is used instead. Together with verify_udev_operations=0 (when we rely on
udev fully), this can cause races as the node could be missing when needed.
(See also https://bugzilla.redhat.com/show_bug.cgi?id=723144)
Clvmd detects modifed config file before it takes lv_lock.
If the config file is changed rapidly - the change was ignored within
a seocnd ranged. This patch adds also compare of file size.
So change like some flag for 0 to 1 would pass unnoticed - but
it's quick fix for failing test suite.
FIXME: Implement inotify solution.