IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Prevent lvresize from being able to resize internal LVs: mirror legs
(*_mimage_*), mirror log (*_mlog), snapshot placeholder LVs (snapshot*)
and others. Resizing these would leads to unexpected metadata and
sometimes crashes (in case of growing snapshot*).
This option should be configurable, but for now
do not set it at all.
(lvm2app is used in udisks probers and there
cac cause several nasty races when trying to update
lvmcache during rescan.)
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
This patch adds a new implementation of locking function instead
of mlockall() that may lock way too much memory (>100MB).
New function instead uses mlock() system call and selectively locks
memory areas from /proc/self/maps trying to avoid locking areas
unused during lock-ed state.
Patch also adds struct cmd_context to all memlock() calls to have
access to configuration.
For backward compatibility functionality of mlockall()
is preserved with "activation/use_mlockall" flag.
As a simple check, locking and unlocking counts the amount of memory
and compares whether values are matching.
For static builds dependency for SELinux libs is not handled by 'ar'.
Till better solution is found, for static builds STATIC_LIBS is used.
Patch updates SELinux detection to use 3rd & 4th parameter for Success/Fail.
Also removes detection of pthread from this check as we know which
version of libdevmapper we are going to link with lvm after merge.
SELinux header check moved to the SELinux test code.
Create new substituted variable PTHREAD_LIBS and link this library
only with tools/libs which really needs it - i.e. dmeventd.
Check for libpthread only for builds with clvmd or dmeventd.
Remove variable LIB_PTHREAD
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
- increase timeout to 30 secs (on Chrissie request)
- source both cluster and clvmd for options (like all the other cluster
init scripts)
- add clustered_vgs and _lvs commodity fns
- move rh_status* fns at the top, so they can be reused
- heavily cleanup start and stop fns from redundant code and unnecessary
loops
- improve output from different operations
- make the init script lsb compliant
- don´t force kill of the daemon, send only a TERM signal and then wait
for it to exit
- Resolves rhbz#533247
lvm2 devices have always UUID set even if imported from lvm1 metadata.
Patch removes name argument from dev_manager_info call and converts
all activation related calls to use query by UUID.
Also it simplifies mknode call (which is the only user on mknodes parameter).
Add a merging snapshot to the deptree, using the "error" target, rather
than avoid adding it entirely. This allows proper cleanup of the -cow
device without having to rename the -cow to use the origin's name as a
prefix.
Move the preloading of the origin LV, after a merge, from
lv_remove_single() to vg_remove_snapshot(). Having vg_remove_snapshot()
preload the origin allows the -cow device to be released so that it can
be removed via deactivate_lv(). lv_remove_single()'s deactivate_lv()
reliably removes the -cow device because the associated snapshot LV,
that is to be removed when a snapshot-merge completes, is always added
to the deptree (and kernel -- via "error" target).
Now when the snapshot LV is removed both the -cow and -real devices
get removed using uuid rather than device name. This paves the way
for us to switch over to info-by-uuid queries.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There's a tiny period of time when the _mimage device is visible during
downconversion from mirror to linear. Since it is visible, we need to
create the symlinks, otherwise warning messages will be issued about udev
not creating those symlinks. We have to rely on udev flags completely.
When activating a merging origin it is valid, and expected, to not have
a node in the deptree for both the origin and its merging snapshot. The
_cached_info() caller is only concerned with whether a device is open.
If there isn't a node in the tree the associated device is definitely
not open.
depending on if the mirror has a 'core' or 'disk' log. When there
is a disk log, the new leg is added by stacking a new mirror on
top of the old (one leg is the old mirror and the other leg is the newly
added device). When the log is a 'core' log, the new leg is simply added
to the existing mirror and all the devices are re-synced.
The logic that handles collapsing the stacked 'disk' log mirror was
having the effect of causing 'core' logged mirrors to begin resync'ing
for a second time. I have used the 'CONVERTING' flag to indicate that
a mirror is converting by way of stacking. This is no longer set for
up-converting core logs. The final 'collapse' logic can safely be skipped
for 'core' log mirrors - getting rid of the second resync.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
Allow the number of logical extents to be expressed (for a snapshot) as
a percentage of the total space in the Origin Logical Volume with the
suffix %ORIGIN.
Update the relevant man pages accordingly. Eliminate inconsistencies
between the man pages and tools/commands.h
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*_safe. This had the effect of segfaulting the log daemon when
converting a mirror from one log type to another.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
When activation of pvmove mirror fails on cluster, some nodes
still possibly succeeded in activation.
- Explicitly deactivate that mirror to be sure
- properly pair suspend/resume calls to not cause memory lock problems in clvmd
Code cannot simply call _finish_pvmove on cluster in this situation, because
changed LVs are suspended twice (causing memory inbalance) and also temporary
mirror is activated when it is not expected (and we know that it failed already).
Patch prepares special function which remove temporary mirror references from
metadata and then resumes changed LVs.
In dev_manager_info 0 means error and 1 info is returned,
not that device exists (that value is part of info struct).
Fix query by uuid only (no name) which returns 0 when device
does not exist.
Support "wait before testing" using '+' in pvmove and lvconvert
interval. Doing so overrides the new default of sleeping after checking
the LV's progress.
Sleeping before checking progress can lead to extraneous polldaemons
being left running. These polldaemons would have otherwise exited had
they checked before sleeping. Checking progress before sleeping helps
workaround the subtly unreliable nature of "finished" state checking
in _percent_run.
Update test/t-mirror-names.sh to use '+' when providing its lvconvert
interval.
more descriptive message if locking fails instead of
"Locking type -1 initialisation failed."
Use read-only locking instead of misleading ignorelocking option
in message.
For mirror repair (and similar tasks) it can happen that full
device rescan is issued from clvmd.
Because code can be in the middle of repair (calling suspend)
clvmd should never try to scan suspended devices
(otherwise it causes deadlock).
Also code must not change ignore_suspended_device flag when
doing refresh_filters (called from lvmcache scan code).
bitmap tracking was switched from the e2fsprogs implementation to
the device-mapper implementation (dm_bitset_t). The latter has a
leading uin32_t field designed to hold the number of bits that are
being tracked. The code was not properly handling this change in
all places. Specifically, when getting the bitmap to/from disk.
Endian adjustments will likely need to be made on the accounting
field as well, since bitmaps are passed between machines on
start-up.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
This spurious 'break' has been here since this code was first committed
in June 2005 and stopped the algorithm behaving as described in the
comment above it and rendered the variable 'already_found_one' useless.
Made .update_metadata optional in 'struct poll_functions' definitions;
eliminated _update_lvconvert_mirror() stub.
Tweak a mirror-specific error message in the generic polldaemon code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The logic was that lvconvert repair volumes, marking
PV as MISSING and following vgreduce --removemissing
removes these missing devices.
Previously dmeventd mirror DSO removed all LV and PV
from VG by simply relying on
vgreduce --removemissing --force.
Now, there are two subsequent calls:
lvconvert --repair --use-policies
vgreduce --removemissing
So the VG is locked twice, opening space for all races
between other running lvm processes. If the PV reappears
with old metadata on it (so the winner performs autorepair,
if locking VG for update) the situation is even worse.
Patch simply adds removemissing PV functionality into
lvconcert BUT ONLY if running with --repair and --use-policies
and removing only these empty missing PVs which are
involved in repair.
(This combination is expected to run only from dmeventd.)
Version >= 1.8.0 of the DM snapshot target appends metadata sectors used
to a snapshot's status. This patch allows LVM2 to accurately determine
if the snapshot store is empty. Knowing when a snapshot store is empty
is important in the context of snapshot-merge (means merge is complete).
Also update LVM2 to be aware of the possibility for "Merge failed" in
the snapshot-merge target's status.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
the background polldaemon is allowed to start. It can be used
standalone or in conjunction with --refresh or --available y.
Control over when the background polldaemon starts will be particularly
important for snapshot-merge of a root filesystem.
Dracut will be updated to activate all LVs with: --poll n
The lvm2-monitor initscript will start polling with: --poll y
NOTE: Because we currently have no way of knowing if a background
polldaemon is active for a given LV the following limitations exist and
have been deemed acceptable:
1) it is not possible to stop an active polldaemon; so the lvm2-monitor
initscript doesn't stop running polldaemon(s)
2) redundant polldaemon instances will be started for all specified LVs
if vgchange or lvchange are repeatedly used with '--poll y'
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch tries to correctly track changes in lvmcache related to commit/revert.
For vg_commit: if there is cached precommitted metadata, after successfull commit
these metadata must be tracked as committed.
For vg_revert: remote nodes must drop precommitted metadata and its flag in lvmcache.
(N.B. Patch do not touch LV locks here in any way.)
All this machinery is needed to properly solve remote node cache invalidaton which
cause several problems recently observed.
Lock mode is int masked by LCK_TYPE_MASK, always.
Patch also remove uneccessary masking lock flag on sender side,
if masking is needed, it is don on client side already.
- Add drop_precommitted flag to force drop precommitted metadata
- add lvmcache_commit_metadata() which upgrades precommitted metadata in cache
No functional change in this patch - just preparation for following change.
And decode flags in humar readable form in client.
And clean some trailing whitespaces.
No functional change in this patch (only debugging messages changed).
The use_precommitted flag indicates, that we want to use precommitted metadata
(used in suspend call to preload table with precommitted data).
But if there are no such data, committed metadata are read but the cache
still contains that precommitted flag.
(The problem is that later possible drop_metadata call will not invalidate
device in cache.)
The wrong precommitted state is stored in on remote nodes during normal
suspend/resume cycle _without_ vg_write/commit.
Use the PRECOMMITTED status flag here instead (which is always set if using
precommited metadata here).
If renaming snapshot with virtual origin, the origin is renamed too.
But the code must resume LVs in reverse order to properly
pair memlock (in cluster locking).
(The resume of snapshot resumes origin too and later resume
is ignored otherwise.)
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
The memlock_inc() fix is wrong, memlock count is not
propagated to long living process (clvmd) and just
it underflow there.
Also suspend is needed to pre-load precommited metadata
on other nodes (remapping to error taget in this case).
With explicit suspend we generate lock request and code
can update memlock count.
(Infinitely "locked" memory caused that fs_unlock() was not
called properly and on cluster nodes remains
old links in /dev/mapper for not active devices.)
(N.B. failing of suspend call here is not handled as fatal
error - the LV is going to be removed later anyway.)
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).
The LV locks make sense only for clustered LVs.
Properly check cluster flag and never issue cluster lock here.
There are several places in code, where it is already checked, this
patch add this check to all needed calls.
In previous code the lock behaviour was inconsistent,
for example, the pre/post callback can take lock even for local volume,
but deactivate call do not released this lock and it remains held forever.
The local LV lock request now just let run the underlying activation code
on local node, the same process like in local locking.
(Again, this is important for new mirror repair calls, here for local
mirrors but with cluster locking enabled.)
This is unnoticed regression from commit 31672ff60e
The pre/post callback need to convert lock always, local node
is going to modify metadata in this case, it it fails conversion,
the call is ignored.
Also it fixes bug when the lock is not yet held, we cannot set LKF_CONVERT
in this case, it will fail because this lock do not exist.
Note that the automatic conversion is still disabled in activate
call, so the original fix (reactivation of exlusive LV) should
be still in place.
(Code already not fail if unlocking not locked resource.)
This is needed in pre/post lock_lv call, where we can
request the same lock on local node becuase of suspend call.
- do_command and lock_vg expect flags (no change here)
Bug fixes:
- lock_vg should check for NONBLOCK on lock_cmd, flags have this bit masked-out
- do_pre/post_command expect do not mask flag at all, this causes that
the code inside is never run! (see following patches, these functions
expect plain command without flags)
If there is problem deactivate LV and
_init_mirror_log is called with remove_on_failure = 1,
remove the newly created log LV from metadata.
(This can happen if there is active device with the same name
but different UUID.)
The main reason for this "workaround" patch is to
- do not keep _mlog volume in metadata, so user can repeat the action
- print better error message describing the real problem
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
/dev/vg_bar/lv1_mlog: not found: device not cleared
Aborting. Failed to wipe mirror log.
Error locking on node bar-01: Input/output error
Unable to deactivate mirror log LV. Manual intervention required.
Failed to create mirror log.
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Aborting. Unable to deactivate mirror log.
Failed to initialise mirror log.
pvmove suspends all moved LVs + pvmoveX mirrored LV itself.
This suspends even underlying pvmoveX and following explicit
suspend call is just noop.
But in resume the pvmoveX volume is no longer underlying
device for moved LVs, so it performs full resume with memlock
decrease.
Code must call memlock_inc() if suspend is requested, volume
is already suspended and error is not requested.
These are no longer used by anyone. The dm_list defines are all in
libdevmapper.h and libdm/datastruct/list.c contains any function definitions.
There is some code in "old-tests" that still use this but this code is not
being maintained.
Thanks to Zdenek for spotting this.
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
If the vg_read() returned error, no lock was taken,
so always call vg_release().
Otherwise this can happen because of missing FAILED_*:
# vgchange -a y x --ignorelockingfailure
Volume group "x" not found
Internal error: Attempt to unlock unlocked VG x
The sysfs filter initialise hash of available devices using
scan of /sys/block. We need to refresh even this hash
when performing full scan otherwise the newly appeared
device could be rejected, because there is no entry
in sysfs filter.
This easily could happen when attaching new device
to cluster node. (Only force refresh of context
in clvmd -R works here now).
Unfortunately consequences of this are much worse,
missing device part on that node is replaced with missing segment
(even when no partial arg is selected) and this directly
lead to data corruption.
See https://bugzilla.redhat.com/show_bug.cgi?id=538515
Simply fix it by refreshing device filters in lvmcache
before performing the full device scan.
(on one node a storage connection failed):
# vgchange -a y vg_bar ; echo $?
Error locking on node bar-02: Refusing activation of partial LV lv1. Use --partial to override.
1 logical volume(s) in volume group "vg_bar" now active
0
So activation fails on one node, error is correctly printed but
status code is wrong.
This patch fixes the top level (vgchange) to return proper code
(and print # of activated LVs).
(lvchange returns error properly here.)
(This affects only cluster locking because only cluster
locking module set LCK_PRE_MEMLOCK.)
With currect code you get
# vgchange -a n
Internal error: _memlock_count has dropped below 0.
when using cluster locking.
It is caused by _unlock_memory calls here
if ((flags & (LCK_SCOPE_MASK | LCK_TYPE_MASK)) == LCK_LV_RESUME)
memlock_dec();
Unfortunately it is also (wrongly) called in immediate unlock
(when LCK_HOLD is not set) from lock_vol
(LCK_UNLOCK is misinterpreted as LCK_LV_RESUME).
Avoid this by comparing original flags and provide memlock
code type of operation (suspend/resume).
All hidden (not visible) volumes should be activated through
other visible volumes.
(There are already exceptions like snapshot, mirror log and image,
which should be cleaned one day...)
This solves problems for future types of hidden volumes,
which can have special meaning and must not be activated implicitly
(e.g. key store volume).
- fix missing unlocking of VG
lvcreate -l 100%PVS -n lv1 vg_test
Please specify physical volume(s) with %PVS
Internal error: Volume Group vg_test was not unlocked
- if no PVS specified, use all available
Fix segfault if %PVS in lvresize without PVs list.
The DRBD uses underlying device so code should prefer top
device if duplicate is found.
Patch also introduce
dev_subsystem_part_major and dev_subsytem_name
functions to easily handle all these replication susbystems
and not hardcode md_major call.
See https://bugzilla.redhat.com/show_bug.cgi?id=530881
for full problem description.
Option --all is only partially documented currently, so document in all
commands. Also make {pv|vg|lv}{display|s} man pages consistent with help
output. Remove ununsed 'disk_ARG' parameter. Leave --trustcache out of
the man page output. Update --units argument to show all possible units.
Per discussion on lvm-devel mailing list and part of debian patch set,
don't set defaults for owner and group, since nobody seems to use them, and
still allow override.
Going forward, we would like to allow users to specify the total
number of metadatacopies in a VG rather than on a per-PV basis. In
order to facilitate that, introduce --pvmetadatacopes to replace
--metadatacopies everywhere. We still allow --metadatacopies for
pv commands, but require --pvmetadatacopies for vg commands.
Eventually we will introduce --vgmetadatacopies. Once we do that,
we should either deprecate --metadatacopies or make it a synonym based
on the command (pvmetadatacopies for pv commands, and vgmetadatacopies
for vg commands). The latter option would likely just require a simple
'strncpy' check against cmd->command->name to qualify the merge_synonym
call.
Update nightly tests to cover the pvmetadatacopies synonym.
Note that this patch is the result of an eariler review comment for
the implicit pvcreate patches. Should apply cleanly on top of the
implicit pvcreate patches (I applied after patch 10/10 in that series).
NOTE: This patch will require --pvmetadatacopies for vgconvert as
--metadatacopies is no longer accepted.
is granted at one mode and an attempt to convert it wthout the LCK_CONVERT
flag set then it will return errno=EBUSY.
This fixes a pretty bad bug in which an LV could be activated exclusively on
one node and lvchange -ay on another would convert it to shared!
It might break some things in other areas, but I doubt it.
lv_deactivate now returns always success, because tree deactivation
functions (see dm_tree_deactivate_children) always returns success.
Because code should return failure in lv_deactivate at least,
fix it by checking for device existence after real deactivation call.
(After discussion this was prefered solution to dm tree function rewrite
which affects snapshots and mirrors.)
Add configure --enable-units-compat to set si_unit_consistency off by default.
Use standard output units for 'PE Size' and 'Stripe size' in pv/lvdisplay.
Clean up VG_RESIZEABLE flag by creating vg_is_resizeable().
Update comment - we no longer have ALLOW_RESIZEABLE.
Also use vg_is_exported() in one place missed by earlier patch.
Should be no functional change.
Remove the checks for vg_read_error() in most of the tools callback
functions and instead make the check in _process_one_vg() more general.
In all but vgcfgbackup, we do not want to proceed if we get any error
from vg_read(). In vgcfgbackup's case, we may proceed if the backup
is to proceed with inconsistent VGs. This is a special case though,
and we mark it with the READ_ALLOW_INCONSISTENT flag passed to
process_each_vg (and subsequently to _process_one_vg).
NOTE: More cleanup is needed in the vg_read_error() path cases.
This patch is a start.
- add DM_UDEV_RULES_VSN to provide a variable to be checked for in the other
rules (e.g. to check that DM rules are actually installed, we can alternate
functionality in the other rules based on this information, also we have
versioning support for the rules)
- set proper sbin path for dmsetup and blkid, /sbin first, then /usr/sbin.
This is necessary for anaconda to work properly.
- add 'last_rule' for cryptsetup's temporary devices (symlinks in /dev/mapper
only)
Now that we've refactored the internal library functions that do the
vg_remove, we can handle the deferred commit of a lvm_vg_remove() inside
lvm_vg_write(). This makes the VG create/remove API more consistent in
terms of disk commits - they now both require an lvm_vg_write() to commit
the create or remove to disk.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Add a new constraint that vgname locks must be obtained in
alphabetical order. At this point, we have test coverage for
the 3 commands affected - vgsplit, vgmerge, and vgrename.
Tests have been updated to cover these commands.
Going forward any command or library call that must obtain
more than one vgname lock must do so in alphabetical order.
Future patches will update lvm2app to enforce this ordering.
Author: Dave Wysochanski <dwysocha@redhat.com>
interface it should be using, it can still be overriden with -I.
If corosync isn't running or there is no information then the usual
checking will happen.
This code only builds if corosync is available.
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7BoKYEeKMr033xK6o4og7F13sGi|��� already in use on "/dev/sdb1"
is now
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi already in use on "/dev/sdb1"
Eliminate busy loop during pvcreate of a "normal" partition.
_md_sysfs_attribute_snprintf() would busy loop if the device it was
given was not a blkext-based MD partition.
Rather than being cute with a busy-loop prone 'goto check_md_major' in
_md_sysfs_attribute_snprintf(): explicitly check if the provided device
is a blkext-based partition (blkext_major()); and then check that the
get_primary_dev() determined parent is an MD device (md_major()).
The changes to remove LCK_NONBLOCK from the LVM locks broke clvmd because the
code was clearly wrong but working anyway! The constant was being masked rather
than the variable that was supposed to match against it.