IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Prevent lvresize from being able to resize internal LVs: mirror legs
(*_mimage_*), mirror log (*_mlog), snapshot placeholder LVs (snapshot*)
and others. Resizing these would leads to unexpected metadata and
sometimes crashes (in case of growing snapshot*).
This option should be configurable, but for now
do not set it at all.
(lvm2app is used in udisks probers and there
cac cause several nasty races when trying to update
lvmcache during rescan.)
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
This patch adds a new implementation of locking function instead
of mlockall() that may lock way too much memory (>100MB).
New function instead uses mlock() system call and selectively locks
memory areas from /proc/self/maps trying to avoid locking areas
unused during lock-ed state.
Patch also adds struct cmd_context to all memlock() calls to have
access to configuration.
For backward compatibility functionality of mlockall()
is preserved with "activation/use_mlockall" flag.
As a simple check, locking and unlocking counts the amount of memory
and compares whether values are matching.
For static builds dependency for SELinux libs is not handled by 'ar'.
Till better solution is found, for static builds STATIC_LIBS is used.
Patch updates SELinux detection to use 3rd & 4th parameter for Success/Fail.
Also removes detection of pthread from this check as we know which
version of libdevmapper we are going to link with lvm after merge.
SELinux header check moved to the SELinux test code.
Create new substituted variable PTHREAD_LIBS and link this library
only with tools/libs which really needs it - i.e. dmeventd.
Check for libpthread only for builds with clvmd or dmeventd.
Remove variable LIB_PTHREAD
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
- increase timeout to 30 secs (on Chrissie request)
- source both cluster and clvmd for options (like all the other cluster
init scripts)
- add clustered_vgs and _lvs commodity fns
- move rh_status* fns at the top, so they can be reused
- heavily cleanup start and stop fns from redundant code and unnecessary
loops
- improve output from different operations
- make the init script lsb compliant
- don´t force kill of the daemon, send only a TERM signal and then wait
for it to exit
- Resolves rhbz#533247
lvm2 devices have always UUID set even if imported from lvm1 metadata.
Patch removes name argument from dev_manager_info call and converts
all activation related calls to use query by UUID.
Also it simplifies mknode call (which is the only user on mknodes parameter).
Add a merging snapshot to the deptree, using the "error" target, rather
than avoid adding it entirely. This allows proper cleanup of the -cow
device without having to rename the -cow to use the origin's name as a
prefix.
Move the preloading of the origin LV, after a merge, from
lv_remove_single() to vg_remove_snapshot(). Having vg_remove_snapshot()
preload the origin allows the -cow device to be released so that it can
be removed via deactivate_lv(). lv_remove_single()'s deactivate_lv()
reliably removes the -cow device because the associated snapshot LV,
that is to be removed when a snapshot-merge completes, is always added
to the deptree (and kernel -- via "error" target).
Now when the snapshot LV is removed both the -cow and -real devices
get removed using uuid rather than device name. This paves the way
for us to switch over to info-by-uuid queries.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There's a tiny period of time when the _mimage device is visible during
downconversion from mirror to linear. Since it is visible, we need to
create the symlinks, otherwise warning messages will be issued about udev
not creating those symlinks. We have to rely on udev flags completely.
When activating a merging origin it is valid, and expected, to not have
a node in the deptree for both the origin and its merging snapshot. The
_cached_info() caller is only concerned with whether a device is open.
If there isn't a node in the tree the associated device is definitely
not open.
depending on if the mirror has a 'core' or 'disk' log. When there
is a disk log, the new leg is added by stacking a new mirror on
top of the old (one leg is the old mirror and the other leg is the newly
added device). When the log is a 'core' log, the new leg is simply added
to the existing mirror and all the devices are re-synced.
The logic that handles collapsing the stacked 'disk' log mirror was
having the effect of causing 'core' logged mirrors to begin resync'ing
for a second time. I have used the 'CONVERTING' flag to indicate that
a mirror is converting by way of stacking. This is no longer set for
up-converting core logs. The final 'collapse' logic can safely be skipped
for 'core' log mirrors - getting rid of the second resync.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
Allow the number of logical extents to be expressed (for a snapshot) as
a percentage of the total space in the Origin Logical Volume with the
suffix %ORIGIN.
Update the relevant man pages accordingly. Eliminate inconsistencies
between the man pages and tools/commands.h
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*_safe. This had the effect of segfaulting the log daemon when
converting a mirror from one log type to another.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
When activation of pvmove mirror fails on cluster, some nodes
still possibly succeeded in activation.
- Explicitly deactivate that mirror to be sure
- properly pair suspend/resume calls to not cause memory lock problems in clvmd
Code cannot simply call _finish_pvmove on cluster in this situation, because
changed LVs are suspended twice (causing memory inbalance) and also temporary
mirror is activated when it is not expected (and we know that it failed already).
Patch prepares special function which remove temporary mirror references from
metadata and then resumes changed LVs.
In dev_manager_info 0 means error and 1 info is returned,
not that device exists (that value is part of info struct).
Fix query by uuid only (no name) which returns 0 when device
does not exist.
Support "wait before testing" using '+' in pvmove and lvconvert
interval. Doing so overrides the new default of sleeping after checking
the LV's progress.
Sleeping before checking progress can lead to extraneous polldaemons
being left running. These polldaemons would have otherwise exited had
they checked before sleeping. Checking progress before sleeping helps
workaround the subtly unreliable nature of "finished" state checking
in _percent_run.
Update test/t-mirror-names.sh to use '+' when providing its lvconvert
interval.
more descriptive message if locking fails instead of
"Locking type -1 initialisation failed."
Use read-only locking instead of misleading ignorelocking option
in message.
For mirror repair (and similar tasks) it can happen that full
device rescan is issued from clvmd.
Because code can be in the middle of repair (calling suspend)
clvmd should never try to scan suspended devices
(otherwise it causes deadlock).
Also code must not change ignore_suspended_device flag when
doing refresh_filters (called from lvmcache scan code).
bitmap tracking was switched from the e2fsprogs implementation to
the device-mapper implementation (dm_bitset_t). The latter has a
leading uin32_t field designed to hold the number of bits that are
being tracked. The code was not properly handling this change in
all places. Specifically, when getting the bitmap to/from disk.
Endian adjustments will likely need to be made on the accounting
field as well, since bitmaps are passed between machines on
start-up.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
This spurious 'break' has been here since this code was first committed
in June 2005 and stopped the algorithm behaving as described in the
comment above it and rendered the variable 'already_found_one' useless.
Made .update_metadata optional in 'struct poll_functions' definitions;
eliminated _update_lvconvert_mirror() stub.
Tweak a mirror-specific error message in the generic polldaemon code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The logic was that lvconvert repair volumes, marking
PV as MISSING and following vgreduce --removemissing
removes these missing devices.
Previously dmeventd mirror DSO removed all LV and PV
from VG by simply relying on
vgreduce --removemissing --force.
Now, there are two subsequent calls:
lvconvert --repair --use-policies
vgreduce --removemissing
So the VG is locked twice, opening space for all races
between other running lvm processes. If the PV reappears
with old metadata on it (so the winner performs autorepair,
if locking VG for update) the situation is even worse.
Patch simply adds removemissing PV functionality into
lvconcert BUT ONLY if running with --repair and --use-policies
and removing only these empty missing PVs which are
involved in repair.
(This combination is expected to run only from dmeventd.)
Version >= 1.8.0 of the DM snapshot target appends metadata sectors used
to a snapshot's status. This patch allows LVM2 to accurately determine
if the snapshot store is empty. Knowing when a snapshot store is empty
is important in the context of snapshot-merge (means merge is complete).
Also update LVM2 to be aware of the possibility for "Merge failed" in
the snapshot-merge target's status.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
the background polldaemon is allowed to start. It can be used
standalone or in conjunction with --refresh or --available y.
Control over when the background polldaemon starts will be particularly
important for snapshot-merge of a root filesystem.
Dracut will be updated to activate all LVs with: --poll n
The lvm2-monitor initscript will start polling with: --poll y
NOTE: Because we currently have no way of knowing if a background
polldaemon is active for a given LV the following limitations exist and
have been deemed acceptable:
1) it is not possible to stop an active polldaemon; so the lvm2-monitor
initscript doesn't stop running polldaemon(s)
2) redundant polldaemon instances will be started for all specified LVs
if vgchange or lvchange are repeatedly used with '--poll y'
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch tries to correctly track changes in lvmcache related to commit/revert.
For vg_commit: if there is cached precommitted metadata, after successfull commit
these metadata must be tracked as committed.
For vg_revert: remote nodes must drop precommitted metadata and its flag in lvmcache.
(N.B. Patch do not touch LV locks here in any way.)
All this machinery is needed to properly solve remote node cache invalidaton which
cause several problems recently observed.
Lock mode is int masked by LCK_TYPE_MASK, always.
Patch also remove uneccessary masking lock flag on sender side,
if masking is needed, it is don on client side already.
- Add drop_precommitted flag to force drop precommitted metadata
- add lvmcache_commit_metadata() which upgrades precommitted metadata in cache
No functional change in this patch - just preparation for following change.
And decode flags in humar readable form in client.
And clean some trailing whitespaces.
No functional change in this patch (only debugging messages changed).
The use_precommitted flag indicates, that we want to use precommitted metadata
(used in suspend call to preload table with precommitted data).
But if there are no such data, committed metadata are read but the cache
still contains that precommitted flag.
(The problem is that later possible drop_metadata call will not invalidate
device in cache.)
The wrong precommitted state is stored in on remote nodes during normal
suspend/resume cycle _without_ vg_write/commit.
Use the PRECOMMITTED status flag here instead (which is always set if using
precommited metadata here).
If renaming snapshot with virtual origin, the origin is renamed too.
But the code must resume LVs in reverse order to properly
pair memlock (in cluster locking).
(The resume of snapshot resumes origin too and later resume
is ignored otherwise.)
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
The memlock_inc() fix is wrong, memlock count is not
propagated to long living process (clvmd) and just
it underflow there.
Also suspend is needed to pre-load precommited metadata
on other nodes (remapping to error taget in this case).
With explicit suspend we generate lock request and code
can update memlock count.
(Infinitely "locked" memory caused that fs_unlock() was not
called properly and on cluster nodes remains
old links in /dev/mapper for not active devices.)
(N.B. failing of suspend call here is not handled as fatal
error - the LV is going to be removed later anyway.)
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).
The LV locks make sense only for clustered LVs.
Properly check cluster flag and never issue cluster lock here.
There are several places in code, where it is already checked, this
patch add this check to all needed calls.
In previous code the lock behaviour was inconsistent,
for example, the pre/post callback can take lock even for local volume,
but deactivate call do not released this lock and it remains held forever.
The local LV lock request now just let run the underlying activation code
on local node, the same process like in local locking.
(Again, this is important for new mirror repair calls, here for local
mirrors but with cluster locking enabled.)
This is unnoticed regression from commit 31672ff60e
The pre/post callback need to convert lock always, local node
is going to modify metadata in this case, it it fails conversion,
the call is ignored.
Also it fixes bug when the lock is not yet held, we cannot set LKF_CONVERT
in this case, it will fail because this lock do not exist.
Note that the automatic conversion is still disabled in activate
call, so the original fix (reactivation of exlusive LV) should
be still in place.
(Code already not fail if unlocking not locked resource.)
This is needed in pre/post lock_lv call, where we can
request the same lock on local node becuase of suspend call.
- do_command and lock_vg expect flags (no change here)
Bug fixes:
- lock_vg should check for NONBLOCK on lock_cmd, flags have this bit masked-out
- do_pre/post_command expect do not mask flag at all, this causes that
the code inside is never run! (see following patches, these functions
expect plain command without flags)
If there is problem deactivate LV and
_init_mirror_log is called with remove_on_failure = 1,
remove the newly created log LV from metadata.
(This can happen if there is active device with the same name
but different UUID.)
The main reason for this "workaround" patch is to
- do not keep _mlog volume in metadata, so user can repeat the action
- print better error message describing the real problem
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
/dev/vg_bar/lv1_mlog: not found: device not cleared
Aborting. Failed to wipe mirror log.
Error locking on node bar-01: Input/output error
Unable to deactivate mirror log LV. Manual intervention required.
Failed to create mirror log.
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Aborting. Unable to deactivate mirror log.
Failed to initialise mirror log.
pvmove suspends all moved LVs + pvmoveX mirrored LV itself.
This suspends even underlying pvmoveX and following explicit
suspend call is just noop.
But in resume the pvmoveX volume is no longer underlying
device for moved LVs, so it performs full resume with memlock
decrease.
Code must call memlock_inc() if suspend is requested, volume
is already suspended and error is not requested.
These are no longer used by anyone. The dm_list defines are all in
libdevmapper.h and libdm/datastruct/list.c contains any function definitions.
There is some code in "old-tests" that still use this but this code is not
being maintained.
Thanks to Zdenek for spotting this.
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
If the vg_read() returned error, no lock was taken,
so always call vg_release().
Otherwise this can happen because of missing FAILED_*:
# vgchange -a y x --ignorelockingfailure
Volume group "x" not found
Internal error: Attempt to unlock unlocked VG x
The sysfs filter initialise hash of available devices using
scan of /sys/block. We need to refresh even this hash
when performing full scan otherwise the newly appeared
device could be rejected, because there is no entry
in sysfs filter.
This easily could happen when attaching new device
to cluster node. (Only force refresh of context
in clvmd -R works here now).
Unfortunately consequences of this are much worse,
missing device part on that node is replaced with missing segment
(even when no partial arg is selected) and this directly
lead to data corruption.
See https://bugzilla.redhat.com/show_bug.cgi?id=538515
Simply fix it by refreshing device filters in lvmcache
before performing the full device scan.
(on one node a storage connection failed):
# vgchange -a y vg_bar ; echo $?
Error locking on node bar-02: Refusing activation of partial LV lv1. Use --partial to override.
1 logical volume(s) in volume group "vg_bar" now active
0
So activation fails on one node, error is correctly printed but
status code is wrong.
This patch fixes the top level (vgchange) to return proper code
(and print # of activated LVs).
(lvchange returns error properly here.)
(This affects only cluster locking because only cluster
locking module set LCK_PRE_MEMLOCK.)
With currect code you get
# vgchange -a n
Internal error: _memlock_count has dropped below 0.
when using cluster locking.
It is caused by _unlock_memory calls here
if ((flags & (LCK_SCOPE_MASK | LCK_TYPE_MASK)) == LCK_LV_RESUME)
memlock_dec();
Unfortunately it is also (wrongly) called in immediate unlock
(when LCK_HOLD is not set) from lock_vol
(LCK_UNLOCK is misinterpreted as LCK_LV_RESUME).
Avoid this by comparing original flags and provide memlock
code type of operation (suspend/resume).
All hidden (not visible) volumes should be activated through
other visible volumes.
(There are already exceptions like snapshot, mirror log and image,
which should be cleaned one day...)
This solves problems for future types of hidden volumes,
which can have special meaning and must not be activated implicitly
(e.g. key store volume).
- fix missing unlocking of VG
lvcreate -l 100%PVS -n lv1 vg_test
Please specify physical volume(s) with %PVS
Internal error: Volume Group vg_test was not unlocked
- if no PVS specified, use all available
Fix segfault if %PVS in lvresize without PVs list.
The DRBD uses underlying device so code should prefer top
device if duplicate is found.
Patch also introduce
dev_subsystem_part_major and dev_subsytem_name
functions to easily handle all these replication susbystems
and not hardcode md_major call.
See https://bugzilla.redhat.com/show_bug.cgi?id=530881
for full problem description.
Option --all is only partially documented currently, so document in all
commands. Also make {pv|vg|lv}{display|s} man pages consistent with help
output. Remove ununsed 'disk_ARG' parameter. Leave --trustcache out of
the man page output. Update --units argument to show all possible units.
Per discussion on lvm-devel mailing list and part of debian patch set,
don't set defaults for owner and group, since nobody seems to use them, and
still allow override.
Going forward, we would like to allow users to specify the total
number of metadatacopies in a VG rather than on a per-PV basis. In
order to facilitate that, introduce --pvmetadatacopes to replace
--metadatacopies everywhere. We still allow --metadatacopies for
pv commands, but require --pvmetadatacopies for vg commands.
Eventually we will introduce --vgmetadatacopies. Once we do that,
we should either deprecate --metadatacopies or make it a synonym based
on the command (pvmetadatacopies for pv commands, and vgmetadatacopies
for vg commands). The latter option would likely just require a simple
'strncpy' check against cmd->command->name to qualify the merge_synonym
call.
Update nightly tests to cover the pvmetadatacopies synonym.
Note that this patch is the result of an eariler review comment for
the implicit pvcreate patches. Should apply cleanly on top of the
implicit pvcreate patches (I applied after patch 10/10 in that series).
NOTE: This patch will require --pvmetadatacopies for vgconvert as
--metadatacopies is no longer accepted.
is granted at one mode and an attempt to convert it wthout the LCK_CONVERT
flag set then it will return errno=EBUSY.
This fixes a pretty bad bug in which an LV could be activated exclusively on
one node and lvchange -ay on another would convert it to shared!
It might break some things in other areas, but I doubt it.
lv_deactivate now returns always success, because tree deactivation
functions (see dm_tree_deactivate_children) always returns success.
Because code should return failure in lv_deactivate at least,
fix it by checking for device existence after real deactivation call.
(After discussion this was prefered solution to dm tree function rewrite
which affects snapshots and mirrors.)
Add configure --enable-units-compat to set si_unit_consistency off by default.
Use standard output units for 'PE Size' and 'Stripe size' in pv/lvdisplay.
Clean up VG_RESIZEABLE flag by creating vg_is_resizeable().
Update comment - we no longer have ALLOW_RESIZEABLE.
Also use vg_is_exported() in one place missed by earlier patch.
Should be no functional change.
Remove the checks for vg_read_error() in most of the tools callback
functions and instead make the check in _process_one_vg() more general.
In all but vgcfgbackup, we do not want to proceed if we get any error
from vg_read(). In vgcfgbackup's case, we may proceed if the backup
is to proceed with inconsistent VGs. This is a special case though,
and we mark it with the READ_ALLOW_INCONSISTENT flag passed to
process_each_vg (and subsequently to _process_one_vg).
NOTE: More cleanup is needed in the vg_read_error() path cases.
This patch is a start.
- add DM_UDEV_RULES_VSN to provide a variable to be checked for in the other
rules (e.g. to check that DM rules are actually installed, we can alternate
functionality in the other rules based on this information, also we have
versioning support for the rules)
- set proper sbin path for dmsetup and blkid, /sbin first, then /usr/sbin.
This is necessary for anaconda to work properly.
- add 'last_rule' for cryptsetup's temporary devices (symlinks in /dev/mapper
only)
Now that we've refactored the internal library functions that do the
vg_remove, we can handle the deferred commit of a lvm_vg_remove() inside
lvm_vg_write(). This makes the VG create/remove API more consistent in
terms of disk commits - they now both require an lvm_vg_write() to commit
the create or remove to disk.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Add a new constraint that vgname locks must be obtained in
alphabetical order. At this point, we have test coverage for
the 3 commands affected - vgsplit, vgmerge, and vgrename.
Tests have been updated to cover these commands.
Going forward any command or library call that must obtain
more than one vgname lock must do so in alphabetical order.
Future patches will update lvm2app to enforce this ordering.
Author: Dave Wysochanski <dwysocha@redhat.com>
interface it should be using, it can still be overriden with -I.
If corosync isn't running or there is no information then the usual
checking will happen.
This code only builds if corosync is available.
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7BoKYEeKMr033xK6o4og7F13sGi|��� already in use on "/dev/sdb1"
is now
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi already in use on "/dev/sdb1"
Eliminate busy loop during pvcreate of a "normal" partition.
_md_sysfs_attribute_snprintf() would busy loop if the device it was
given was not a blkext-based MD partition.
Rather than being cute with a busy-loop prone 'goto check_md_major' in
_md_sysfs_attribute_snprintf(): explicitly check if the provided device
is a blkext-based partition (blkext_major()); and then check that the
get_primary_dev() determined parent is an MD device (md_major()).
The changes to remove LCK_NONBLOCK from the LVM locks broke clvmd because the
code was clearly wrong but working anyway! The constant was being masked rather
than the variable that was supposed to match against it.
Adds 'data_alignment_detection' config option to the devices section of
lvm.conf. If your kernel provides topology information in sysfs (linux
>= 2.6.31) for the Physical Volume, the start of data area will be
aligned on a multiple of the ’minimum_io_size’ or ’optimal_io_size’
exposed in sysfs.
minimum_io_size is used if optimal_io_size is undefined (0). If both
md_chunk_alignment and data_alignment_detection are enabled the result
of data_alignment_detection is used.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the pvcreate --dataalignmentoffset option is not specified the start
of a PV's aligned data area will be shifted by the associated
'alignment_offset' exposed in sysfs (unless
devices/data_alignment_offset_detection is disabled in lvm.conf).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Documented which use-cases force the reinstatement of the nuanced
handling of pe_start. As soon as orphan PVs are eliminated much of this
will no longer be a concern ('preserve_pe_start' can be reenabled in
.pv_setup).
Added defensive 'if (pv->pe_align)' check in _text_pv_write()'s pe_start
loop.
If pv_setup was given a non-zero pe_start it would short-circuit
establishing a default pv->pe_align. pv->pe_align=0 would result
in a divide by zero in _mda_setup(). 'vgconvert -M2 $vgname' hit this.
.pv_write still properly preserves pe_start if it was supplied.
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Without this fix rounding the end of the first mda to a pe_align
boundary could silently exceed the disk_size.
Final 'if (start1 + mda_size1 > disk_size)' block serves as a safety
net.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Document existing pe_start policy.
Fix issue in _text_pv_setup() where existing pe_start case could have
the pv->pe_start set to pv->pe_align even though pe_start shouldn't ever
change.
vgconvert and pvcreate have a facility to preserve the existing start
of the on-disk data extents, known as pe_start.
They indicate this by passing the existing value to the pvsetup function
which must preserve it.
This patch avoids one particular case where the value could get
changed incorrectly now that the alignment settings are configurable.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
We must hold the VG_ORPHAN lock until we commit to disk. Otherwise,
we risk a race condition on vgcreate / vgextend. Reverts the following
commit:
commit 72a41480ba
Author: Dave Wysochanski <dwysocha@redhat.com>
Date: Fri Jul 10 20:09:21 2009 +0000
Move orphan lock obtain/release inside vg_extend().
With this change we now have vgcreate/vgextend liblvm functions.
Note that this changes the lock order of the following functions as the
orphan lock is now obtained first. With our policy of non-blocking
second locks, this should not be a problem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When converting to the new liblvm functions, the vgcreate code path
changed to create a new vg, then set values. As a result of this
change, and the fact that we give a user a message if they try to
set the same value of a VG attribute (extent_size, alloc_policy, etc),
you'll see these 2 extraneous "is already" messages with vgcreate:
tools/lvm vgcreate vg2 /dev/loop2
Physical extent size of VG vg2 is already 4.00 MB
Volume group allocation policy is already normal
Volume group "vg2" successfully created
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
liblvm unit test case uses the following APIs:
- lvm_create, lvm_destroy
- lvm_vg_create, lvm_vg_extend, lvm_vg_set_extent_size, lvm_vg_write,
lvm_vg_remove, lvm_vg_close
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
The checks for RESIZEABLE_VG should now be inside the various functions that
have to do such operations.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
If there is syntax error in metadata, it now prints messages
like:
Couldn't read 'start_extent' for segment 'extent_count'.
Couldn't read all logical volumes for volume group vg_test.
The segment specification is wrong and confusing.
Patch fixes it by introducing "parent" member in config_node which
points to parent section and config_parent_name function, which
provides pointer to node section name.
Also it adds several LV references where possible.
Sun May 3 11:40:51 CEST 2009 Petr Rockai <me@mornfall.net>
* Convert the straight instances of vg_lock_and_read to new vg_read(_for_update).
Author: Petr Rockai <prockai@redhat.com>
Committer: Dave Wysochanski <dwysocha@redhat.com>
Is an application uses query and set major:minor
to device, it should not fallback to default major by default.
Add new function whoich allows that (and use it in lvm2).
E.g.
# vgscan
Parse error at byte 2360 (line 54): expected a value
Failed to load config file /etc/lvm/lvm.conf
You have a memory leak (not released memory pool):
[0x818c788] library (12 bytes)
...
Several commands calls process_each_vg() and in provided
callback it explicitly recovers VG if inconsistent.
(vgchange, vgconvert, vgscan)
It means that old VG is released and reread but the function
above (process_one_vg) tries to unlock and release old VG.
Patch moves the repair logic into _process_one_vg() function.
It always tries to read vg (even inconsistent) and then decides
what to do according new defined parameter.
Also patch unifies inconsistent error messages.
The only slight change if for vgremove command, where
it now tries to repair VG before it removes if force arg is given.
(It works similar way before, just the order of operation changed).
When mirror convert polling is started (mainly as backgound process,
in lvchange -a y or in lvconvert itself) it tries to read VG
and LV identified by its name.
Unfortunatelly, the VG can have already different LV under the same name,
and various more or less funny things can happen (note that
_finish_lvconvert_mirror suspends the volume for example).
(the typical example is our testing script which continuously recreates
LVs under the same name in the same VG.)
This patch adds optional uuid parameter which helps to properly
select the monitoring object. For lvconvert polling it is set to LV UUID
and both _get_lvconvert_vg and _get_lvconvert_lv uses it to read proper VG/LV.
(In the pvmove case it is NULL, here we poll for physical volume name).
During vgreduce is failed mirror image replaced with error segment,
this segmant type has always area_count == 0.
Current code expects that there is at least one area with device,
patch fixes it by additional check (fixes segfault during vgreduce).
Also do not calculate readahead in every lv_info call, we only need
to cache PV readahead before activation calls which locks memory.
Use lvconvert --repair in dmeventd mirror DSO.
for now.
It replaces bad behaviour in one set of circumstances with bad behaviour
in a different set. We think the behaviour needs to be more configurable.
When we are stacking LV over device, which has for some reason
increased read_ahead (e.g. MD RAID), the read_ahead hint
for libdevmapper is wrong (it is zero).
If the calculated read_ahead hint is zero, patch uses read_ahead of underlying device
(if first segment is PV) when setting DM_READ_AHEAD_MINIMUM_FLAG.
Because we are using dev-cache, it also store this value to cache for future use
(if several LVs are over one PV, BLKRAGET is called only once for underlying device.)
This should fix all the reamining problems with readahead mismatch reported
for DM over MD configurations (and similar cases).
Current code, when need to ensure that volume is not
active on remote node, it need to try to exclusive
activate volume.
Patch adds simple clvmd command which queries all nodes
for lock for given resource.
The lock type is returned in reply in text.
(But code currently uses CR and EX modes only.)
This means two things:
1) Non-mirrored LVs will be no longer affected by mirror monitoring. (Before,
if you had a LV that went partially missing on a VG where a mirror leg failed,
this LV would be removed automatically by dmeventd... Probably not an
unrecoverable dataloss bug, but still quite unpleasant.)
2) If enough parallel PV space is available at the time of the mirror failure,
the failed devices will be automatically replaced using this spare space. Which
(and whether) free space may be used is still not configurable, but is a
planned feature. Since it is relatively easy to undo the action by converting
the mirror manually, I don't consider this to be a showstopper. In fact, I
think the compromise is much better than what we have now.
pvmove now keep suspended devices if temporary mirror creation fails.
We can try to restore previous state if it is first attempt to activate
pvmove (code basically run the same code as --abort automatically).
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
We can temporarily violate max_lv during mirror conversion etc.
(If the operation fails, orphan mirror images are visible to administrator
for manual remove for example. Not that this should ever happen:-)
Force limit only for lvcreate (and vg merge) command.
Patch also adds simple max_lv tests into testsuite
The vg->lv_count parameter now includes always number of visible
logical volumes.
Note that virtual snapshot volume (snapshotX) is never visible,
but it is stored in metadata with visible flag.
link_lv_to_vg and unlink_lv_from_vg are the only functions
for adding/removing logical volume from volume group.
Only these function should manipulate with vg->lvs list.
The snapshot segment (snapshotX) is created twice
during the text metadata segment processing.
This can cause temporary violation of max_lv count.
Simplify the code, snapshot segment is properly initialized
in init_snapshot_seg function now and do not need to be replaced
by vg_add_snapshot call.
The vg_add_snapshot() is now usefull only for adding new
snapshot and it shares the same initialization function.
The snapshot name is always generated, name paramater can be
removed from function call.
As a simplification to the tools and further liblvm, this patch pushes
the setting of NON_BLOCKING lock flag inside the lock_vol() call.
The policy we set is if any existing VGs are currently locked, we
set the NON_BLOCKING flag.
Should be no functional change.
The seg variable is temporary variable for list iterator,
code cannot expect that after iteration it remains NULL
(it contains non-NULL pointer here id list is empty).
Patch fixes first_seg function so it now correctly returns NULL
for empty segment list.
Buildsystem support device-mapper only install,
but generic install tagret includes both dm+lvm2.
For distribution which uses separate install_device-mapper,
there is no way how to install lvm2 only
(so after installing lvm2 for packaging purposes
built system must remove installed device-mapper files).
Fix it by allowing lvm2_install target, similarily like
install_cluster for clvmd.
(install = install_device-mapper + install_lvm2)
The dataalign value must always be aligned according
to MDA area.
The currect code checks if calculated value collides with
MDA area but not if the value is so small that it is
located before MDA starts.
Unfortunatelly there can be also MDA in the end of the device.
The patch adds simple check to avoid this miscalculation.
Patch expects that first MDA always starts on <= pagesize boundary
(this is true for all allowed label sector parameters).
Add lvs origin_size field.
Fix linux configure --enable-debug to exclude -O2.
Still a few rough edges, but hopefully usable now:
lvcreate -s vg1 -L 100M --virtualoriginsize 1T
Run backup of metadata on remote nodes in the
same place like local node - when calling backup().
Introduce backup_locally() which calls only
local backup if needed.
Remote backup is now trigerred by LCK_VG_BACKUP flag
combination (special VG lock).
This lock type will call check_current_backup()
(including backup_locally() call) and updates
metadata on all nodes.
(Patch fixes non-functional remote backup,
current call during VG lock never triggers.)
The backup() call store metadata from memory.
But in cluster backup() call performs
remote nodes metadata backup and it reads data from disk.
For metadata backup consistency,
patch moves all backup() calls after vg_commit.
(Moreover, some tools already do that this way.)
- Rename unlock_all to destroy_lvhash,
this function is called in cluster shutdown
unlocks everything and clean up allocated info space.
- Tidy lv_hash_lock use
.
Except adding free(lvi) in lv_has destructror
there is no functional change.
If user requests report attribute from PVSEG type
and PV is orphan (or all devices is set), the report
is empty.
Try for example (when only orphan PV are present)
#pvs
#pvs -o +devices
# pvs /dev/sdb1
PV VG Fmt Attr PSize PFree
/dev/sdb1 lvm2 -- 46.58G 46.58G
# pvs -o +devices /dev/sdb1
(no output)
The problem is caused by empty pv->segments list.
Fix it by providing fake segment here (similar to fake structures
in _pvsegs_sub_single() calls.
# pvs -a -o devices
Volume group name (null) has invalid characters
Skipping volume group (null)
...
_pvsegs_sub_single creates fake vg, we need to check
that pv is real here.
Since now, all code reading volume group is responsible for releasing
the memory allocated by calling vg_release(vg).
(For simplicity of use, vg_releae can be called for vg == NULL,
the same logic like free(NULL)).
Also providing simple macro for unlocking & releasing in one step,
tools usualy uses this approach.
The global memory pool (cmd->mem) should be used only for global
physical volume operations.
This patch have to be applied with all subsequent patches to complete
memory pool per vg logic.
Using separate memory pool has quite bit memory saving impact when
using large VGs, this is mainly needed when we have to use
preallocated and locked memory (and should not overflow from that
memory space).
The all_pvs list, used in vg_read, should make its own private
copy of pv structures, otherwise (when vg will use its own pool)
it will point to released memory pool.
The same applies for get_pvs() call.
Patch adds pv_list copy helper and adds explicit memory pool
parameter into _copy_pv.
(Please note that all these helper functions cannot guarantee that
vg related fields are valid - proper vg read & lock must be used
if it is requested.)
Currently PV commands, which performs full device scan, repeatly
re-reads PVs and scans for all devices.
This behaviour can lead to OOM for large VG.
This patch allows using internal metadata cache for pvs & pvdisplay,
so the commands scan the PVs only once.
(We have to use VG_GLOBAL otherwise cache is invalidated on every
VG unlock in process_single PV call.)
Patch fixes these problems:
- during the snapshot creation process, it needs create 2 LVs,
one is cow, second becomes snapshot.
If the code fails in vg_add_snapshot, code lvcreate will not remove
LV cow volume.
- if max_lv is set and VG contains snapshot, it can happen that
during the activation lv_count is temporarily increased over the limit
and VG metadata are not properly processed
see https://bugzilla.redhat.com/show_bug.cgi?id=490298
- vgcfgrestore alows restore with max_lv set to lower valuer that actual
LV count. This later leads to situation that max_lv is completely ignored.
- vgck doesn't call vg_validate(). It should at least try:-)
Signed-off-by: Milan Broz <mbroz@redhat.com>
The original liblvm.a has been moved to liblvm-internal.a.
We now use liblvm.a for the new application library and build
it inside liblvm directory.
Change dependencies so tools depend on liblvm application library,
and application library depends on liblvm internal.
(for example when CLVMD_CMD_LOCK_VG for is called during vgscan).
If clvmd calls LV lock, it calls
/* clean the pool for another command */
dm_pool_empty(cmd->mem);
to clean up memory pool after command.
Unfortunately, do_refresh_cache() do not call this
and because during it operation it allocates some memory,
the pool increases.
Also do_refresh_cache should use lvm_lock
(it manipulates with lvm internal data).
The same applies for lvm_backup command.
Signed-off-by: Milan Broz <mbroz@redhat.com>
if rlocn not defined (there is no metadata area).
In most cases it fails in validate_name(),
unfortunately there are situatuions, when
validate_name is ok and later code fails with
checksum error.
Reproducer:
# dd if=/dev/zero of=/dev/loop0
# pvcreate --metadatasize 637k /dev/loop0
Physical volume "/dev/loop0" successfully created
# pvs /dev/loop0
/dev/loop0: Checksum error
PV VG Fmt Attr PSize PFree
/dev/loop0 lvm2 -- 1.00M 1.00M
Signed-off-by: Milan Broz <mbroz@redhat.com>
-
Using argv[] list in exec_cmd() to allow more params for external commands.
Fsadm does not allow checking mounted filesystem.
Fsadm no longer accepts 'any other key' as 'no' answer to y/n.
Fsadm improved handling of command line options.
New structure lvm (used as an alias to cmd_context), new type definition lvm_t
for the lvm handle. Added functions lvm_create, lvm_destroy and
lvm_reload_config using the new handle.
Modified test/api/test.c:
Use new lvm.h header file and lvm_t handle.
Removed lib/lvm2.h
Author: Thomas Woerner <twoerner@redhat.com>
This patch is not fully tested and leaves some related bugs unfixed.
Intended behaviour of the code now:
pe_start in the lvm2 format PV label header is set only by pvcreate (or
vgconvert -M2) and then preserved in *all* operations thereafter.
In some specialist cases, after the PV is added to a VG, the pe_start
field in the VG metadata may hold a different value and if so, it
overrides the other one for as long as the PV is in such a VG.
Currently, the field storing the size of the data area in the PV label
header always holds 0. As it only has meaning in the context of a
volume group, it is calculated whenever the PV is added to a VG (and can
be derived from extent_size and pe_count in the VG metadata).
When reporting explicitly label attributes (pv_uuid for example), we do not
need to read metadata.
This patch separate the label fileds and removes scan_vgs_for_pvs
in process_each_pv() if not needed.
(There should be no user visible change in output.)