IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch adds the ability to upconvert a raid1 array - say from 2-way to
3-way. It does not yet support upconverting linear to n-way.
The 'raid' device-mapper target allows for individual components (images) of
an array to be specified for rebuild. This mechanism is used when adding
new images to the array so that the new images can be resync'ed while the
rest of the images in the array can remain 'in-sync'. (There is no
mirror-on-mirror layering required.)
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Users already have the ability to split an image from an LV of "mirror"
segtype. This patch extends that ability to LVs of "raid1" segtype.
This patch only allows a single image to be split off, however. (The
"mirror" segtype allows an arbitrary number of images to be split off.
e.g. 4-way => 3-way/linear, 2-way/2-way, linear,3-way)
of top-level LV.
We can't activate sub-lv's that are being removed from a RAID1 LV while it
is suspended. However, this is what was being used to have them show-up
so we could remove them. 'sync_local_dev_names' is a sufficient and
proper replacement and can be done after the top-level LV is resumed.
1) add new function 'raid_remove_top_layer' which will be useful
to other conversion functions later (also cleans up code)
2) Add error messages if raid_[extract|add]_images fails
3) Add function prototypes to prevent compiler warnings when
compiling with '--with-raid=shared'
Fix a couple more issues that kabi found.
- Add some error messages in failure cases
- s/malloc/zalloc/
- use vg->vgmem for lv names instead of vg->cmd->mem
Add config option to enable crc checking of VG structures.
Currently it's disabled by default.
For the internal test-suite this check it is enabled.
Note: In the case the internal error is detected, debug build with
compile option DEBUG_ENFORCE_POOL_LOCKING helps to catch the source
of the problem.
Use debug pool locking functionality. So the command could check,
whether the memory in the pool has not been modified.
For lv_postoder() instead of unlocking and locking for every changed
struct status member do it once when entering and leaving function.
(mprotect would trap each such memory access).
Currently lv_postoder() does not modify other part of vg structure
then status flags of each LV with flags that are reverted back to
its original state after function exit.
Extend vginfo cache with cached VG structure. So if the same metadata
are use, skip mda decoding in the case, the same data are in use.
This helps for operations like activation of all LVs in one VG,
where same data were decoded giving the same output result.
Patch adds 1-to-1 connection between volume_group and lvmcache_vginfo.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
As this flag could not have been set by the current code - removing it.
Note: because of the wrong code logic this call:
lvmcache_update_vg(correct_vg, correct_vg->status & PRECOMMITTED &
(inconsistent ? INCONSISTENT_VG : 0));
had always passed '0' - now after flag removal it's passing
PRECOMMITTED flag in - this present functinal change in this patch.
To match the original functionality - 0 had to be always passed.
More testing is needed here.
Compiler complaining that meta_lv could be used uninitialized. (Not true
because it is protected by 'clear_metadata'.) I switched to using 'lv->vg',
as it makes no difference to vg_[write|commit].
(here clvmd crashed in the middle of operation),
lock is not removed from cache - here is one example:
locking/cluster_locking.c:497 Locking VG V_vg_test UN (VG) (0x6)
locking/cluster_locking.c:113 Error writing data to clvmd: Broken pipe
locking/locking.c:399 <backtrace>
locking/locking.c:461 <backtrace>
Internal error: Volume Group vg_test was not unlocked
Code should always remove lock info from lvmcache and update counters
on unlock, even if unlock fails.
Today, we use "suppress_messages" flag (set internally in init_locking fn based
on 'ignorelockingfailure() && getenv("LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES")'.
This way, we can suppress high level messages like "File-based locking
initialisation failed" or "Internal cluster locking initialisation failed".
However, each locking has its own sequence of initialization steps and these
could log some errors as well. It's quite misleading for the user to see such
errors and warnings if the "--sysinit" is used (and so the ignorelockingfailure
&& LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable). Errors and
warnings from these intermediary steps should be suppressed as well if requested.
This patch propagates the "suppress_messages" flag deeper into locking init
functions. I've also added these flags for other locking types for consistency,
though it's not actually used for no_locking and readonly_locking.
Last usage was removed in Petr's commit related to VG mda repair fix
where relaxed check starts to ignore inconsistencies coming from
PVs that are marked MISSING - thus removing unused variable.
Implementation described in doc/lvm2-raid.txt.
Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
This should be set by default! Normally we have "activation/udev_sync = 1"
in lvm.conf (example.conf.in). But if we use lvm2 without any config file
(or without a definition within '--config' option) the DEFAULT_UDEV_SYNC
is used instead. Together with verify_udev_operations=0 (when we rely on
udev fully), this can cause races as the node could be missing when needed.
(See also https://bugzilla.redhat.com/show_bug.cgi?id=723144)
Clvmd detects modifed config file before it takes lv_lock.
If the config file is changed rapidly - the change was ignored within
a seocnd ranged. This patch adds also compare of file size.
So change like some flag for 0 to 1 would pass unnoticed - but
it's quick fix for failing test suite.
FIXME: Implement inotify solution.
If MD linear device has set rounding (overload chunk size attribute),
the pvcreate command prints this warning:
/dev/md0 sysfs attr level not in expected format: linear
teardown sequence. (Previously the snapshot was deactivated while its
origin was active and before its removal was committed to disk, so
restarting after a crash at the point would leave corruption.)
When an LVM mirror is up-converted (an additional image added), it creates
a temporary mirror stack. The lower-level mirror in the stack that is
created was not being activated exclusively - violating the exclusive nature
of the original mirror. We now check for exclusive activation of a mirror
before converting it, and if found, we ensure that the temporary mirror
is also exclusively activated.
Mirrors used to be created by first creating a linear device and then adding
the other images plus the log. Now mirrors are created by creating all the
images in one go and then adding the log separately. The new way ran into
the condition that cluster mirrors cannot change the log type (in the case
of creation, from core -> disk) while the mirror is not active. (It isn't
active because it is in the process of being created.) The reason this
condition is in place is because a remote node may have the mirror active, and
we don't want to alter the log underneath it.
What we really needed was a way of checking if the mirror was active remotely
but not locally, and in that case do not allow a change of the log. I've added
this check, and cluster mirrors can now be created again.
We've used udev fallback code till now to check whether udev
created/removed the entries in /dev correctly and if not,
a repair was done (giving a warning messagea about that).
This patch adds a possibility to enable this additional check
and subsequent fallback only when required (debugging purposes
mostly) and trust udev completely.
So let's disable the fallback code by default and add a new
configuration option "activation/udev_fallback".
(The original code for creating the nodes will still be used
in case the device directory that is set in lvm.conf differs
from the one that udev uses and also when activation/udev_rules
is set to 0 - otherwise we would end up with no nodes/symlinks
at all)
It's useful to keep the partial flag cached - so just move the call
for vg_mark_partil_lvs() into import_vg_from_config_tree() so it gets
evaluated before it goes through the lvmcache.
This patch should not present any functional change.
Note: It is rather temporal solution - proper place is probably inside the
'read' call back - but needs some more discussion.
For now using this minor hack.
As the ACTIVATE_EXCL could be set only in clvmd code - there is no
use for this test in lv_add_mirrors() function only called from
tools context.
FIXME: Add cluster test case for this.
To avoid modification of 'read-only' volume group structure
add a new structure to pass local data around the code for LV
activation.
As origin_only is one such flag - replace this parameter with new
struct lv_activate_opts.
More parameters might eventually become part of lv_activate_opts.
transient error), stemming from the following sequence of events:
1) devices fail IO, triggering repair
2) dmeventd starts fixing up the mirror
3) during the downconversion, a new metadata version is written
--> the devices come back online here
4) the mirror device suspend/resume is called to update DM tables
5) during the suspend/resume cycle, *pre*-commit metadata is read;
however, since the failed devices are now back online, we get back
inconsistent set of precommit metadata and the whole operation fails
The patch relaxes the check that fails in step 5 above, namely by ignoring
inconsistencies coming from PVs that are marked MISSING.
and use this for the LVM critical section logic. Also report an error if
code tries to load a table while any device is known to be in the
suspended state.
(If the variety of problems these changes are showing up can't be fixed
before the next release, the error messages can be reduced to debug
level.)
are affected by the move. (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.
The current fix was selected so as to minimise the testing surface. I
hope eventually to replace it with a cleaner one that extends the
deptree code.
Some lvconvert scenarios still suffer from related problems.
Currently some operation with striped mirrors lead
to corrupted metadata, this patch just add detection of such
situation.
Example:
# lvcreate -i2 -l10 -n lvs vg_test
# lvconvert -m1 vg_test/lvs
# lvreduce -f -l1 vg_test/lvs
Reducing logical volume lvs to 4.00 MiB
Segment extent reduction 9not divisible by #stripes 2
Logical volume lvs successfully resized
# lvremove vg_test/lvs
Segment extent reduction 1not divisible by #stripes 2
LV segment lvs:0-4294967295 is incorrectly listed as being used by LV lvs_mimage_0
Internal error: LV segments corrupted in lvs_mimage_0.
There's a possibility someone will use the '/' in the hostname. Since we
generate a temporary file name (path) including the hostname, any '/' would
be ambiguous.
We can always set such hostname using 'sethostname' from unistd.h. But the
'hostname' command already includes the check and removes the '/' char.
However, some old versions still allow that.
See: https://bugzilla.redhat.com/show_bug.cgi?id=711445.
Since this is only a temporary name and the possibility of this error is
quite negligible, we don't need any complex escape sequence here, just a
simple char replace.
Before, we used vg_write_lock_held call to determnine the way a device is
opened. Unfortunately, this opened many devices in RW mode when it was not
really necessary. With the OPTIONS+="watch" rule used in the udev rules,
this could fire numerous events while closing such devices (and it caused
useless scans from within udev rules in return).
A common bug we hit with this was with the lvremove command which was unable
to remove the LV since it was being opened from within the udev rules. This
patch should minimize such situations (at least with respect to LVM handling
of devices).
Though there's still a possibility someone will open a device 'outside' in
parallel and fire the event based on the watch rule when closing a device
once opened for RW.
allocates these buffers in such way it adds memory page for each such buffer
and size of unlock memory check will mismatch by 1 or 2 pages.
This happens when we print or read lines without '\n' so these buffers are
used. To avoid this extra allocation, use setvbuf to set these bufffers ahead.
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
Also, add a new 'obtain_device_list_from_udev' setting to lvm.conf with which
we can turn this feature on or off if needed.
If set, the cache of block device nodes with all associated symlinks
will be constructed out of the existing udev database content.
This avoids using and opening any inapplicable non-block devices or
subdirectories found in the device directory. This setting is applied
to udev-managed device directory only, other directories will be scanned
fully. LVM2 needs to be compiled with udev support for this setting to
take effect. N.B. Any device node or symlink not managed by udev in
udev directory will be ignored with this setting on.
Avoid using of already released memory when duplicated MDA is found.
As get_pv_from_vg_by_id() may call lvmcache_label_scan() use the local copy
of the vgname and vgid on the stack as vginfo may dissapear and code was
then accessing garbage in memory.
i.e. pvs /dev/loop0
(when /dev/loop0 and /dev/loop1 has same MDA content)
Invalid read of size 1
at 0x523C986: dm_hash_lookup (hash.c:325)
by 0x440C8C: vginfo_from_vgname (lvmcache.c:399)
by 0x4605C0: _create_vg_text_instance (format-text.c:1882)
by 0x46140D: _text_create_text_instance (format-text.c:2243)
by 0x47EB49: _vg_read (metadata.c:2887)
by 0x47FBD8: vg_read_internal (metadata.c:3231)
by 0x477594: get_pv_from_vg_by_id (metadata.c:344)
by 0x45F07A: _get_pv_if_in_vg (format-text.c:1400)
by 0x45F0B9: _populate_pv_fields (format-text.c:1414)
by 0x45F40F: _text_pv_read (format-text.c:1493)
by 0x480431: _pv_read (metadata.c:3500)
by 0x4802B2: pv_read (metadata.c:3462)
Address 0x652ab80 is 0 bytes inside a block of size 4 free'd
at 0x4C2756E: free (vg_replace_malloc.c:366)
by 0x442277: _free_vginfo (lvmcache.c:963)
by 0x44235E: _drop_vginfo (lvmcache.c:992)
by 0x442B23: _lvmcache_update_vgname (lvmcache.c:1165)
by 0x443449: lvmcache_update_vgname_and_id (lvmcache.c:1358)
by 0x443C07: lvmcache_add (lvmcache.c:1492)
by 0x46588C: _text_read (text_label.c:271)
by 0x466A65: label_read (label.c:289)
by 0x4413FC: lvmcache_label_scan (lvmcache.c:635)
by 0x4605AD: _create_vg_text_instance (format-text.c:1881)
by 0x46140D: _text_create_text_instance (format-text.c:2243)
by 0x47EB49: _vg_read (metadata.c:2887)
Add testing script
pv_manip.c to properly account for case when pe_start=0 and the first
physical extent is to be released (currently skip the first extent to
avoid discarding the PV label).
My previous patch fixed incorrect error check for dm_snprintf.
However in this particular case - dm_snprintf has been used differently -
just like strncpy + setting last char with '\0' - so the code had to return
error - because the buffer was to short for whole string.
Patch replaces it with real strncpy.
Also test for alloca() failure is removed - as the program behaviour
is rather undefined in this case - it never returns NULL.
lvseg properties for lvm2app, 'devices' and 'seg_pe_ranges'.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
dm_pool_free incorrectly. This check-in fixes that incorrect usage.
I've also added a WHATS_NEW line to reflect the changes I made to allow
lv_extend to operate on 0 length intrinsically layered LVs (i.e mirrors
and RAID). I forgot that in the last commit.
allows us to allocate all images of a mirror (or RAID array) at one
time during create.
The current mirror implementation still requires a separate allocation
for the log, however.
Actually, we can call vg_set_fid(vg, NULL) instead of calling
destroy_instance for all PV structs and a VG struct - it's the same
code we already have in the vg_set_fid.
Also, allow exactly the same fid to be set again for the same PV/VG
Before, this could end up with the fid destroyed because we destroyed
existing fid first and then we used the new one and we didn't care
whether existing one == new one by chance.
Instead of searching linear list of all LVs, PVs - use created hash tables
also for quick mapping between LV.
(Note - for small number of PVs or LVs the overhead of the hash is bigger).
TODO: Use hash tables in volume_group structure directly.
Instead of regenerating config tree and parsing same data again,
check whether export_vg_to_buffer does not produce same string as
the one already cached - in this case keep it, otherwise throw cached
content away.
For the code simplicity calling _free_cached_vgmetadata() with
vgmetadata == NULL as the function handles this itself.
Note: sometimes export_vg_to_buffer() generates almost the same data
with just different time stamp, but for the patch simplicity,
data are reparsed in this case.
This patch currently helps for vgrefresh.
Isn't usually perfomance critical - but log_error is used i.e.for debuging,
this code noticable slows down the processing.
Added 512KB limit to avoid memory exhastions in case of some endless loop.
TODO: use _lvm_errmsg buffer only when lvm2api needs it.
Avoid locking sum testing with valgrind compilation.
Make memory unaccessible in the valgrind for dm_pool_abadon_object.
Valgrind hinting should not be needed in _free_chunk for dm_free.
Could be reached via few of our lvm2 test cases:
==11501== Invalid read of size 8
==11501== at 0x49B2E0: _area_length (import-extents.c:204)
==11501== by 0x49B40C: _read_linear (import-extents.c:222)
==11501== by 0x49B952: _build_segments (import-extents.c:323)
==11501== by 0x49B9A0: _build_all_segments (import-extents.c:334)
==11501== by 0x49BB4C: import_extents (import-extents.c:364)
==11501== by 0x497655: _format1_vg_read (format1.c:217)
==11501== by 0x47E43E: _vg_read (metadata.c:2901)
cut from t-vgcvgbackup-usage.sh
--
pvcreate -M1 $(cat DEVICES)
vgcreate -M1 -c n $vg $(cat DEVICES)
lvcreate -l1 -n $lv1 $vg $dev1
--
Idea of the fix is rather defensive - to allocate one extra element
to 'map' array which is then used in _area_length() - where the
loop checks, whether next map entry is continuous.
By placing there always one extra zero entry -
we fix the read of unallocated memory, and we make sure the data would
not make a continous block.
FIXME: there could be a problem if some special broken lvm1 data would be imported.
As the format1 is currently not really used - leave it for future fix
and use this small hotfix for now.
Invalid primary_vginfo was supposed to move all its lvmcache_infos to
orphan_vginfo - however it has called _drop_vginfo() inside the loop
that released primary_vginfo itself - thus made the loop using released
memory.
Use _vginfo_detach_info() instead and call _drop_vginfo after
th loop is finished.
Valgrind trace it should fix:
Invalid read of size 8
at 0x41E960: _lvmcache_update_vgname (lvmcache.c:1229)
by 0x41EF86: lvmcache_update_vgname_and_id (lvmcache.c:1360)
by 0x441393: _text_read (text_label.c:329)
by 0x442221: label_read (label.c:289)
by 0x41CF92: lvmcache_label_scan (lvmcache.c:635)
by 0x45B303: _vg_read_by_vgid (metadata.c:3342)
by 0x45B4A6: lv_from_lvid (metadata.c:3381)
by 0x41B555: lv_activation_filter (activate.c:1346)
by 0x415868: do_activate_lv (lvm-functions.c:343)
by 0x415E8C: do_lock_lv (lvm-functions.c:532)
by 0x40FD5F: do_command (clvmd-command.c:120)
by 0x413D7B: process_local_command (clvmd.c:1686)
Address 0x63eba10 is 16 bytes inside a block of size 160 free'd
at 0x4C2756E: free (vg_replace_malloc.c:366)
by 0x41DE70: _free_vginfo (lvmcache.c:980)
by 0x41DEDA: _drop_vginfo (lvmcache.c:998)
by 0x41E854: _lvmcache_update_vgname (lvmcache.c:1238)
by 0x41EF86: lvmcache_update_vgname_and_id (lvmcache.c:1360)
by 0x441393: _text_read (text_label.c:329)
by 0x442221: label_read (label.c:289)
by 0x41CF92: lvmcache_label_scan (lvmcache.c:635)
by 0x45B303: _vg_read_by_vgid (metadata.c:3342)
by 0x45B4A6: lv_from_lvid (metadata.c:3381)
by 0x41B555: lv_activation_filter (activate.c:1346)
by 0x415868: do_activate_lv (lvm-functions.c:343)
problematic line:
dm_list_iterate_items_safe(info2, info3, &primary_vginfo->infos)
Fix 2 more functions sending cluster messages to avoid passing uninitilised bytes
and compensate 1 extra byte attached to the message from the clvm_header.args[1]
member variable.
If _move_lv_segments is passed a 'lv_from' that does not yet
have any segments, it will screw things up because the code
that does the segment copy assumes there is at least one
segment. See copy code here:
lv_to->segments = lv_from->segments;
lv_to->segments.n->p = &lv_to->segments;
lv_to->segments.p->n = &lv_to->segments;
If 'segments' is an empty list, the first statement copies over
the values, but the next two reset those values to point to the
other LV's list structure. 'lv_to' now appears to have one
segment, but it is really an ill-set pointer.
other cases, the code could wind up removing wrong number of mirrors. In yet
other cases, we could remove the right number of mirrors, but fail to respect
the removal preferences (i.e. keep an image that was requested to be removed
while removing an image that was requested to be kept). Under some
circumstances, remove_mirror_images could also get stuck in an infinite loop.
This patch should fix all of the above undesirable behaviours.
Signed-off-by: Petr Rockai <prockai@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
LVM doesn't behave correctly if running as non-root user,
there is warning when it detects it.
Despite this, it produces many error messages, saying nothing.
See https://bugzilla.redhat.com/show_bug.cgi?id=620571
This patch fixes two things:
1) Removes eror message from device_is_usable() which has no
information value anyway (real warning is printed inside it).
2) it fixes device-mapper initialization, if we support
core dm module autoload and device node is present, it should
fail early and not try recreate existing and correct node.
(non-root == permission denied here)
N.B. In future code should support user roles, some more
drastic checks in code are probably contraproductive now.
Attach \0 for proper char* display - otherwise somewhat random message could
be displayed in debug more and read of unpredictable read of uninitilized
memory values could happen.
As code uses strncpy(system_id, NAME_LEN) and doesn't set '\0'
Fix it by always allocating NAME_LEN + 1 buffer size and with zalloc
we always get '\0' as the last byte.
This bug may trigger some unexpected behavior of the string operation
code - depends on the pool allocator.
FIXME: refactor this code to alloc_vg.
We have 3 components and traling '\0' so allocate proper room for all of them.
Problem was nicely hidden by allocation from pool and allocation aligment
offset - so to trigger real problem with this one is actually hard.
Return value of readlink limits valid string size.
Characters after returned size present some garbage to printf.
Fix it by placing '\0' on the return size value.
Missing free_vg on error_path in lvmcache_get_vg fn. Call destroy_instance
only if the fid is not part of the vg in backup_read_vg fn (otherwise it's
part of the VG we're returning and we definitely don't want to destroy it!).
This is necessary for proper format instance ref_count support. We iterate
over vg->pvs and vg->removed_pvs list and the ref_count is decremented and
then it is destroyed if not referenced anymore.