IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When we are stacking LV over device, which has for some reason
increased read_ahead (e.g. MD RAID), the read_ahead hint
for libdevmapper is wrong (it is zero).
If the calculated read_ahead hint is zero, patch uses read_ahead of underlying device
(if first segment is PV) when setting DM_READ_AHEAD_MINIMUM_FLAG.
Because we are using dev-cache, it also store this value to cache for future use
(if several LVs are over one PV, BLKRAGET is called only once for underlying device.)
This should fix all the reamining problems with readahead mismatch reported
for DM over MD configurations (and similar cases).
Current code, when need to ensure that volume is not
active on remote node, it need to try to exclusive
activate volume.
Patch adds simple clvmd command which queries all nodes
for lock for given resource.
The lock type is returned in reply in text.
(But code currently uses CR and EX modes only.)
This means two things:
1) Non-mirrored LVs will be no longer affected by mirror monitoring. (Before,
if you had a LV that went partially missing on a VG where a mirror leg failed,
this LV would be removed automatically by dmeventd... Probably not an
unrecoverable dataloss bug, but still quite unpleasant.)
2) If enough parallel PV space is available at the time of the mirror failure,
the failed devices will be automatically replaced using this spare space. Which
(and whether) free space may be used is still not configurable, but is a
planned feature. Since it is relatively easy to undo the action by converting
the mirror manually, I don't consider this to be a showstopper. In fact, I
think the compromise is much better than what we have now.
pvmove now keep suspended devices if temporary mirror creation fails.
We can try to restore previous state if it is first attempt to activate
pvmove (code basically run the same code as --abort automatically).
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
We can temporarily violate max_lv during mirror conversion etc.
(If the operation fails, orphan mirror images are visible to administrator
for manual remove for example. Not that this should ever happen:-)
Force limit only for lvcreate (and vg merge) command.
Patch also adds simple max_lv tests into testsuite
The vg->lv_count parameter now includes always number of visible
logical volumes.
Note that virtual snapshot volume (snapshotX) is never visible,
but it is stored in metadata with visible flag.
link_lv_to_vg and unlink_lv_from_vg are the only functions
for adding/removing logical volume from volume group.
Only these function should manipulate with vg->lvs list.
The snapshot segment (snapshotX) is created twice
during the text metadata segment processing.
This can cause temporary violation of max_lv count.
Simplify the code, snapshot segment is properly initialized
in init_snapshot_seg function now and do not need to be replaced
by vg_add_snapshot call.
The vg_add_snapshot() is now usefull only for adding new
snapshot and it shares the same initialization function.
The snapshot name is always generated, name paramater can be
removed from function call.
As a simplification to the tools and further liblvm, this patch pushes
the setting of NON_BLOCKING lock flag inside the lock_vol() call.
The policy we set is if any existing VGs are currently locked, we
set the NON_BLOCKING flag.
Should be no functional change.
The seg variable is temporary variable for list iterator,
code cannot expect that after iteration it remains NULL
(it contains non-NULL pointer here id list is empty).
Patch fixes first_seg function so it now correctly returns NULL
for empty segment list.
Buildsystem support device-mapper only install,
but generic install tagret includes both dm+lvm2.
For distribution which uses separate install_device-mapper,
there is no way how to install lvm2 only
(so after installing lvm2 for packaging purposes
built system must remove installed device-mapper files).
Fix it by allowing lvm2_install target, similarily like
install_cluster for clvmd.
(install = install_device-mapper + install_lvm2)
The dataalign value must always be aligned according
to MDA area.
The currect code checks if calculated value collides with
MDA area but not if the value is so small that it is
located before MDA starts.
Unfortunatelly there can be also MDA in the end of the device.
The patch adds simple check to avoid this miscalculation.
Patch expects that first MDA always starts on <= pagesize boundary
(this is true for all allowed label sector parameters).
Add lvs origin_size field.
Fix linux configure --enable-debug to exclude -O2.
Still a few rough edges, but hopefully usable now:
lvcreate -s vg1 -L 100M --virtualoriginsize 1T
Run backup of metadata on remote nodes in the
same place like local node - when calling backup().
Introduce backup_locally() which calls only
local backup if needed.
Remote backup is now trigerred by LCK_VG_BACKUP flag
combination (special VG lock).
This lock type will call check_current_backup()
(including backup_locally() call) and updates
metadata on all nodes.
(Patch fixes non-functional remote backup,
current call during VG lock never triggers.)
The backup() call store metadata from memory.
But in cluster backup() call performs
remote nodes metadata backup and it reads data from disk.
For metadata backup consistency,
patch moves all backup() calls after vg_commit.
(Moreover, some tools already do that this way.)
- Rename unlock_all to destroy_lvhash,
this function is called in cluster shutdown
unlocks everything and clean up allocated info space.
- Tidy lv_hash_lock use
.
Except adding free(lvi) in lv_has destructror
there is no functional change.
If user requests report attribute from PVSEG type
and PV is orphan (or all devices is set), the report
is empty.
Try for example (when only orphan PV are present)
#pvs
#pvs -o +devices
# pvs /dev/sdb1
PV VG Fmt Attr PSize PFree
/dev/sdb1 lvm2 -- 46.58G 46.58G
# pvs -o +devices /dev/sdb1
(no output)
The problem is caused by empty pv->segments list.
Fix it by providing fake segment here (similar to fake structures
in _pvsegs_sub_single() calls.
# pvs -a -o devices
Volume group name (null) has invalid characters
Skipping volume group (null)
...
_pvsegs_sub_single creates fake vg, we need to check
that pv is real here.
Since now, all code reading volume group is responsible for releasing
the memory allocated by calling vg_release(vg).
(For simplicity of use, vg_releae can be called for vg == NULL,
the same logic like free(NULL)).
Also providing simple macro for unlocking & releasing in one step,
tools usualy uses this approach.
The global memory pool (cmd->mem) should be used only for global
physical volume operations.
This patch have to be applied with all subsequent patches to complete
memory pool per vg logic.
Using separate memory pool has quite bit memory saving impact when
using large VGs, this is mainly needed when we have to use
preallocated and locked memory (and should not overflow from that
memory space).
The all_pvs list, used in vg_read, should make its own private
copy of pv structures, otherwise (when vg will use its own pool)
it will point to released memory pool.
The same applies for get_pvs() call.
Patch adds pv_list copy helper and adds explicit memory pool
parameter into _copy_pv.
(Please note that all these helper functions cannot guarantee that
vg related fields are valid - proper vg read & lock must be used
if it is requested.)
Currently PV commands, which performs full device scan, repeatly
re-reads PVs and scans for all devices.
This behaviour can lead to OOM for large VG.
This patch allows using internal metadata cache for pvs & pvdisplay,
so the commands scan the PVs only once.
(We have to use VG_GLOBAL otherwise cache is invalidated on every
VG unlock in process_single PV call.)
Patch fixes these problems:
- during the snapshot creation process, it needs create 2 LVs,
one is cow, second becomes snapshot.
If the code fails in vg_add_snapshot, code lvcreate will not remove
LV cow volume.
- if max_lv is set and VG contains snapshot, it can happen that
during the activation lv_count is temporarily increased over the limit
and VG metadata are not properly processed
see https://bugzilla.redhat.com/show_bug.cgi?id=490298
- vgcfgrestore alows restore with max_lv set to lower valuer that actual
LV count. This later leads to situation that max_lv is completely ignored.
- vgck doesn't call vg_validate(). It should at least try:-)
Signed-off-by: Milan Broz <mbroz@redhat.com>
The original liblvm.a has been moved to liblvm-internal.a.
We now use liblvm.a for the new application library and build
it inside liblvm directory.
Change dependencies so tools depend on liblvm application library,
and application library depends on liblvm internal.
(for example when CLVMD_CMD_LOCK_VG for is called during vgscan).
If clvmd calls LV lock, it calls
/* clean the pool for another command */
dm_pool_empty(cmd->mem);
to clean up memory pool after command.
Unfortunately, do_refresh_cache() do not call this
and because during it operation it allocates some memory,
the pool increases.
Also do_refresh_cache should use lvm_lock
(it manipulates with lvm internal data).
The same applies for lvm_backup command.
Signed-off-by: Milan Broz <mbroz@redhat.com>
if rlocn not defined (there is no metadata area).
In most cases it fails in validate_name(),
unfortunately there are situatuions, when
validate_name is ok and later code fails with
checksum error.
Reproducer:
# dd if=/dev/zero of=/dev/loop0
# pvcreate --metadatasize 637k /dev/loop0
Physical volume "/dev/loop0" successfully created
# pvs /dev/loop0
/dev/loop0: Checksum error
PV VG Fmt Attr PSize PFree
/dev/loop0 lvm2 -- 1.00M 1.00M
Signed-off-by: Milan Broz <mbroz@redhat.com>
-
Using argv[] list in exec_cmd() to allow more params for external commands.
Fsadm does not allow checking mounted filesystem.
Fsadm no longer accepts 'any other key' as 'no' answer to y/n.
Fsadm improved handling of command line options.
New structure lvm (used as an alias to cmd_context), new type definition lvm_t
for the lvm handle. Added functions lvm_create, lvm_destroy and
lvm_reload_config using the new handle.
Modified test/api/test.c:
Use new lvm.h header file and lvm_t handle.
Removed lib/lvm2.h
Author: Thomas Woerner <twoerner@redhat.com>
This patch is not fully tested and leaves some related bugs unfixed.
Intended behaviour of the code now:
pe_start in the lvm2 format PV label header is set only by pvcreate (or
vgconvert -M2) and then preserved in *all* operations thereafter.
In some specialist cases, after the PV is added to a VG, the pe_start
field in the VG metadata may hold a different value and if so, it
overrides the other one for as long as the PV is in such a VG.
Currently, the field storing the size of the data area in the PV label
header always holds 0. As it only has meaning in the context of a
volume group, it is calculated whenever the PV is added to a VG (and can
be derived from extent_size and pe_count in the VG metadata).
When reporting explicitly label attributes (pv_uuid for example), we do not
need to read metadata.
This patch separate the label fileds and removes scan_vgs_for_pvs
in process_each_pv() if not needed.
(There should be no user visible change in output.)
We display '0' for these fields now in this case. Ideally these values are
undefined for an orphan PV but today there is no way to specify undefined
with display functions such as _size64_disp().
pvcreate $DEV
vgcreate -s 1k vg_test $DEV
lvcreate -l 1 -n lv1 vg_test
..
/dev/vg_test/lv1: write failed after 1024 of 4096 at 0: No space left on device
Just check for maximum write size in set_lv.
It fails for 1k PE now.
Patch adds log_region_size into allocation habdle struct
and use it in _alloc_parallel_area() for proper log size calculation
instead of hardcoded 1 extent - which can fail.
Reproducer for incorrect log size calculation:
DEV=/dev/sd[bcd]
pvcreate $DEV
vgcreate -s 1k vg_test $DEV
lvcreate -m1 -L 12M -n mirr vg_test
https://bugzilla.redhat.com/show_bug.cgi?id=477040
The log size calculation is mostly copied from kernel code.
Check for major/minor collision is added in _add_dev_to_dtree()
where we already read info by uuid,
so in the case of requesting major/minor it queries device-mapper
by major/minor for device availability.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=204992
Problem is dm_report_init() may return NULL and subsequent call to
dm_report_set_output_field_name_prefix() doesn't handle NULL value.
Example:
pvs --nameprefixes --rows --unquoted --noheadings -opv_name,fred
Logical Volume Fields
---------------------
lv_uuid - Unique identifier
lv_name - Name. LVs created for internal use are enclosed in brackets.
...
Physical Volume Segment Fields
------------------------------
pvseg_start - Physical Extent number of start of segment.
pvseg_size - Number of extents in segment.
Unrecognised field: fred
Segmentation fault
with the second vgcreate overwriting the first.
Obtain lock before calling vg_create(), which checks for existence of vgname
and fails if it already exists.
Prior to this patch, "lvremove -f vgname" would fail if vgname contained
one or more snapshot LVs. Now this passes, but has a side-effect.
If you issue "lvremove vgname" where vgname contains one or more snaps,
you will get an extra "y/n" prompt to remove the same snapshot.
Example:
$ lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lvsnap vgtest swi-a- 16.00M lvtest 0.05
lvtest vgtest owi-a- 64.00M
$ lvremove vgtest
Do you really want to remove active logical volume "lvsnap"? [y/n]: n
Logical volume "lvsnap" not removed
Do you really want to remove active logical volume "lvsnap"? [y/n]: n
Logical volume "lvsnap" not removed
Command failed with status code 5.
Fixing this will most likely require modification of the iterator
function, process_each_lvs_in_vg() to iterate over snaps in some
cases (e.g. lvs, vgdisplay -v) but not in others (lvremove).
(Avoids having same mirror table loaded twice concurrently by first
using a 'zero' table to set the size of the device so when mirror
table is preloaded it doesn't have to be activated immediately.)
snapshot DSO unregistered itself when snapshot changed state to invalid.
This can cause a race (and several timeouts), when for example another snapshot
is added and in the middle of operation (suspend/resume) the monitoring thread
unregister itself.
Fix it by keeping the snapshot monitored after invalidation - just reset
threshold to not really print any messages to syslog.