IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
With mirror_log_fault_policy of 'remove' and mirror_image_fault_policy
of 'allocate', the log type of the mirror volume is converted from
'disk' or 'mirrored' to 'core' when all mirror legs but one in a mirror
volume broke.
Keep new_log_count as a number of valid log devices by using log_count
variable for a temporary usage in the first phase of error recovery
in _lvconvert_mirrors_repair().
Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
mirrors, we must also check that the log daemon (cmirrord) is running.
The log module can be auto-loaded, but the daemon cannot be
"auto-started". Failing to check for the daemon produces cryptic
messages that customers have a hard time deciphering. (The system
messages do report that the log daemon is not running, but people
don't seem to find this message easily.)
Here are examples of what is printed when the module is available,
but the log daemon has not been started.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg
Shared cluster mirrors are not available.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v
Setting logging type to disk
Finding volume group "vg"
Archiving volume group "vg" metadata (seqno 3).
Creating logical volume lv
Executing: /sbin/modprobe dm-log-userspace
Cluster mirror log daemon is not running
Shared cluster mirrors are not available.
Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).
The main problem with these bugs was that the newly split
off LV was not being suspended properly. This meant that
the memlock count was not being balanced, the DM devices
were not being renamed, and some DM devices which should
have been removed were not.
I've also renamed some of the variables and added comments
to make things clearer as to what is going on. (I can break
this patch in two if it means easier review.)
Switch dmeventd to use dm_create_lockfile and drop duplicate code.
Allow clvmd pidfile to be configurable.
Switch cmirrord and clvmd to use dm_create_lockfile.
This should bring less confusion when there are some settings left and
people just forgot about it and then they run into problems. These messages
should give them a hint of what's really going on.
A previous check-in added logic to handle the case where both images
of a mirrored log failed. It solved the problem by simply removing
the log entirely - leaving the parent mirror with a 'core' log. This
worked for most cases. However, if there was a small delay between
the failures of the two mirrored log devices, the mirror would hang,
LVM would hang, and no additional LVM commands could be issued.
When the first leg of the log fails, it signals the need for repair.
Before 'lvconvert --repair' is run by dmeventd, the second leg fails.
'lvconvert' would see both devices as failed and try to remove the
log entirely. When it came time to suspend the parent mirror to
update the configuration, the suspend would hang because it couldn't
get any I/O through the mirrored log, which was plugged waiting for
corrective action. The solution is to replace the log with an error
target to clear any pending writes before removing it. This allows
the parent mirror to suspend and make the proper changes.
When using vgmetadatacopies value other than "umanaged" (0), prompt
the user if the usage of --metadataignore would change the value of
vgmetadatacopies. The main 2 cases are:
1) pvchange --metadataignore
2) vgextend --metadataignore
We leave the prompt check in the tools, and do not change anything
if the user says 'n'.
Examples:
vgextend --metadataignore y vgtest /dev/loop0
Setting metadataignore will override preferred number of copies of VG vgtest metadata.
Are you sure? [y/n]: y
No physical volume label read from /dev/loop0
Physical volume "/dev/loop0" successfully created
Volume group "vgtest" successfully extended
pvchange --metadataignore y /dev/loop3
Setting metadataignore on /dev/loop3 will override preferred number of copies of VG vgtest metadata.
Are you sure? [y/n]: y
WARNING: Changing preferred number of copies of VG vgtest metadata from 3 to 2
Physical volume "/dev/loop3" changed
1 physical volume changed / 0 physical volumes not changed
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
recovery didn't kick in. But leave the behaviour alone when using
precommitted metadata because of a warning in a confusing FIXME.
In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do. (How many other tools fail similarly now?)
And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.
This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
Rather than attempting to remove all the images of a mirrored
log volume via remove_mirror_images, simply remove the log
if all its devices have failed.
Taka was the first to report that there is still an outstanding
issue with handling this case. I've managed to reproduce it
only very rarely, and am still working on identifying the problem.
Failing to handle the problem rarely is better than not handling
the scenario at all, so I'm checking this in.
Moreover, in current mirror handling, when it calls activate
on removed but suspended detached log this counter drops below zero
and confuses debug log.
The same region size is used for both mirror volume and mirrored
log volume, but when the physical extent size is bigger than region size,
the size of mirror leg for mirrored log is smaller than the region size
and lvcreate command fails.
This patch adjusts a region size of mirrored log to a smaller value of
region size or physical extent size.
[This patch ensures that the region_size of the mirrored log does not
exceed the size of the mirrored log itself, which would violate the
kernel constraint: (region_size <= ti->len).]
Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Preload libc.mo file for localized lvm before taking memory lock - this way
we prevent disk access for some error paths in libdm, that prints localized
errno messages while they are still in memory locked state.
- s/Active clustred VG/clustered VG/ (only LV can be active)
- print only active LVs (not all) in status command
(In the lvdisplay form /dev/vg/lv.)
For now, still use awk (already used in clustered_vgs).
https://bugzilla.redhat.com/show_bug.cgi?id=598495
to 3-way mirror. When conversion operations are performed on
these types of mirrors, log options can be confused/ignored.
In the case of a converting 3-way mirror, we have a top-level
2-way corelog mirror whose legs are 1) a 2-way disk-log mirror
and 2) a linear device. If we wish to convert this 3-way mirror
to a 2-way mirror, the linear device is removed and the extra
top layer is eliminated. If we also wished to convert the disk
log to a core log in the same step, ambiguity creeps in. It is
somewhat obvious what the user wants - a 2-way mirror with a
corelog. However, looking at the top level mirror before
compression, it seems that the mirror already has a core log.
This is why the operation seemed to fail.
This patch simply re-evaluates what mirrored_seg points to after
a compression and then considers the log argument.
This is a fix for bug 599898.
Because execve stops the command loop,
we never receive response (only socket close) for clvmd -S,
so waiting for response here makes no sense.
But if the calling process (clvmd -S) exits too early, connection
is closed from client side, clvmd takes this as an error and
never run restart code.
Ugly hack(TM).
When using clustered mirrors, we need device nodes to be created during
processing of device tree, not at its end like we normally do (we need to
access the nodes in cmirror prematurely). Therefore we use a new flag called
"immediate_dev_node" stored in deptree's load_properties struct to instruct the
device tree processing code to immediately synchronize with udev and flush all
stacked node operations so the nodes are prepared for use.
For now, the immediate_dev_node is used for clustered mirrors during
processing the dm_tree_preload_children code only. We can add more later if
needed.
linux/kdev_t.h even though it wasn't needed. Strangely, it seems
to be causing problems on various architectures (i686) in the
function daemons/cmirrord/functions.c:disk_status_info()->sprintf.
I'm not sure why this is a problem since none of the macros in
kdev_t.h are used in that code, but it certainly doesn't hurt to
pull an unnecessary header and it seems to fix the problem.
Code is mixing up internal DLM and LVM definitions of lock
modes and flags.
OpenAIS and singlenode locking do not depend on DLM but
code currently cannot be compiled without libdlm.h!
LCK_* flags is LVM abstraction, used through all the code.
Only low-level backend (clvmd-cman etc) should use DLM definitions,
also this code should do all needed conversions.
Because there are two DLM flags used in generic code
(NOQUEUE, CONVERT) we define it similar way like lock modes.
(So all needed binary-compatible flags are on one place in locking.h)
(Further code cleaning still needed, though:-)
- allocate environment dynamically (still missing some limit?)
- try to recover, if destroy failed (do not destroy lvm here) and free memory
- check strdup() return codes
- report failure to log
- do not print NULL in exclusive lock loop
Activate only the first replicator-dev LV, that activates all other
related LVs from Replicator. In case of error during this activation,
it will not retry again for other heads (less confusing error log).
length(array) is specific to GNU awk and doesn't work in mawk.
Use a return value of "split" function to indicate array size, this is
supported in both gawk and mawk.
This patch fixes the following errors during "make install" when mawk is
installed as a default awk.
mawk: scripts/relpath.awk: line 25: illegal reference to array from
mawk: scripts/relpath.awk: line 25: illegal reference to array to
mawk: scripts/relpath.awk: line 27: illegal reference to array from
mawk: scripts/relpath.awk: line 32: illegal reference to array to
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Adding function _add_partial_replicator_to_dtree() to create
partial tree for Replicator target.
Using dm_tree_node_set_presuspend_node() for Replicator.
As for _process_one_vg() we need similar retry loop for
process_each_lv_in_vg(). This patch retries to process
failed LVs with reopened VGs.
Patch does not add any extra repeated invocations if there is not
found any missing VG during LV processing.
Patch modifes behavior of _process_one_vg().
In the first pass vg_read() collectis for replicator sorted list of
additional VGs during lock_vol().
If any other VG is needed by the replicator and it is not yet opened
then next iteration loop is taken with all collected VGs.
Flag vg->cmd_missing_vgs detects missing VGs.
Introduce struct cmd_vg to store information about needed
volume group name, vgid, flags and the pointer to opened VG.
Keep VGs list in alphabetical order for locking order.
Introduce functions:
cmd_vg_add() add new cmd_vg entry.
cmd_vg_lookup() search cmd_vgs for vg_name.
cmd_vg_read() open VGs in cmd_vgs list.
cmd_vg_release() close VGs in reversed order.
Adding configure.in support for Replicators.
Adding basic lib lvm support for Replicators.
Adding flags REPLICATOR and REPLICATOR_LOG.
Adding segments SEG_REPLICATOR and SEG_REPLICATOR_DEV.
Adding basic methods for handling replicator metadata.
For deactivation of Replicator check in advance that all heads
have open_count == 0. For this presuspend_node is used as all
head nodes are linking this control node.
Introducing dm_tree_node_set_presuspend_node() for presuspending child
node (i.e. replicator control target) before deactivation of parent node
(i.e. replicator-dev target).
This patch presents no functional change to current dtree - only
replicator target currently sets presuspend node for dev nodes.
Patch adds failed_lvnames to the list of parameters for process_each_lv_in_vg().
If the list is not NULL it will be filled with LV names of failing LVs
during function execution.
Application could later reiterate only on failed LVs.
are active mirrors or snapshots.
We don't have the mechanisms in place to change the device-mapper
tables for those targets that have behavioral differences between
cluster and single machine instances. Allowing users to change
the attribute but not changing the target's behavior can lead to
data corruption.
The following bugs are fixed/avoided by this patch:
235123 - vgchange -c [ny] do not change target types when necessary
289331 - RFE: switching from cluster domain to local domain needs to deactivate volume somehow
289541 - when changing from local to cluster, volumes can not appear to be deactivated
This should avoid various races between dmeventd on multiple nodes
in cluster where one node already repairing device and another
run full scan and locks the device.
the device cache file is dumped both in vgscan and clvmd process.
Unfortunately, clvmd calls lvmcache_label_scan,
it properly destroys persistent filter, but during
persistent_filter_dump it merges old cache content back!
This causes that change in filters is not properly propagated
into device cache after vgscan on cluster.
(Only new devices are added.)
https://bugzilla.redhat.com/show_bug.cgi?id=591861
Use Requires.private: instead of Libs.private:
Use UDEV_PC and SELINUX_PC for Require.private:
It looks like usage of Requires.private is prefered from Libs.private.
However pkg-config documentation is really poor here. But here is
short outcome:
There is a difference in Libs.private: and Requires.private: where
we specify libselinux instead of -lselinux -lsepol.
We leave resolving of query like 'pkg-config --libs --static devmapper'
on taking proper selinux and udev libs to their .pc files instead of
hardcoding them into our .pc file which is might give incorrect answer.
- i.e. dependency of libselinux package might change and we may return
wrong list of linked libraries.
http://bugs.freedesktop.org/show_bug.cgi?id=4738http://err.no/personal/blog/tech/2008-03-25-18-07_pkg-config,_sonames_and_Requires.private
A shortcut for --ignorelockingfailure, --ignoremonitoring, --poll n options
and LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable used all at
once in initialisation scripts (e.g. rc.sysinit or initrd).
being able to remove more images from a mirror than the
number of PVs directly specified for removal.
The effort to fix bug 581611 corrected a bug that was unnoticed
at the time. The loop in _remove_mirror_images that looks over
the specified PVs was allowing devices that were previously
counted and moved to the end of the list to be double-counted.
This resulted in the number of devices needed for removal always
being satisfied - even if the user did not specify enough PVs
for removal to satisfy the request. When 581611 was fixed, this
double-counting no longer took place and the result was to remove
only the minimum of the number of PVs specified or the number
that was asked to be removed.
By simply always setting 'new_area_count' (as used to be done
only in the else statement), we return to the previous behavior.
Indeed, this is exactly what the double-counting was allowing
to happen before the fix of 581611.
Allow lv_remove_with_dependencies() to know the top-level LV that was
requested to be removed (otherwise it recurses and we lose context).
A merging snapshot cannot be removed directly but the associated origin
can be. Disallow removal of a merging snapshot unless the associated
origin is also being removed.
There's no need for foreign udev rules to touch LVM reserved devices
(snapshot, pvmove, _mlog, _mimage, _vorigin) even if they happen to
be visible. The same applies for /dev/disk content - no need to create
any content for these devices (and so no need to run any "blkid" etc.).
This also prevents setting any inotify "watch" from udev rules on such
devices that is a source of race conditions (the rules need to honor
DM_UDEV_DISABLE_OTHER_RULES_FLAG for this to work though).
This version number change reflects the memory handling change
for string-based pv/vg/lv string based attributes.
In addition, when adding support for tags, I forgot to increase
the version number.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
Internally, we used DM names instead of UUIDs while processing event
handlers. This caused problems while trying to vgrename a VG with active LVs
where the names are being changed and so the devices were not found then.
The patch also contains a little bit of refactoring, moving "build_dlid" code
found in dev_manager.c to "build_dm_uuid", now in lvm-string.c (so we have
build_dm_uuid and build_dm_name at one place).
Patch is inspired by Debian's extra patch.
- removes OWNER & GROUP make vars they are parts of INSTALL command.
- adds INSTALL_PROGRAM for executable, uses $(INSTALL)
- adds INSTALL_DATA for non-executable data, uses ($INSTALL)
- adds INSTALL_WDATA for writable non-executable data, uses ($INSTALL)
- adds configure option --enable-write_install - to support
installatin of writable files used by distribution
- replaces usage of ifeq @LIB_SUFFIX@ with $(LIB_SUFFIX)
- installs .a files from static builds without executable flag
- installs .a files to $(usrlibdir) instead of $(libdir)
- installs all static binaries to $(staticdir)
- create .so links for devel package in $(usrlibdir) instead of
$(libdir)
- makes .so and .so.LIB_VERSION files within builddir
- removes VERSIONED_SHLIB and created versioned LIB_SHARED automagicaly
- install LIB_SHARED via install_lib_shared target
- install plugins via install_lib_shared_plugin target
- prints whole 'install' command during installation instead of less
informative "Installing $(something) $(somewhere)"
- install multiple man pages with one INSTALL command
- use DISTCLEAN_TARGETS instead of creating multiple distclean targets
Usage of VPATH makes troubles when used within $(builddir).
Not only source files are being found through VPATH,
but targets as well. (make --debug=v)
Thus if user builds the code in $(srcdir) and also in some $(builddir)
he gets mangled results as some generated files (i.e. .export.sym)
are 'reused' from $(srcdir) instead of $(builddir).
This patch switches to use vpath were we could explicitly name
suffixes that should be looked via vpath - we must take care,
we do not generate files with these suffixes:
.c, .in, .po, .exported_symbols
When moving parts of striped LVs, pvmove wouldn't care about leaving you with
two stripes on the same disk. Now --alloc anywhere is needed for that.
(Tried and gave up on two alternative approaches before the one committed here.)
to check for presence of this module and avoid using --frames
option for genhtml in this case.
Fix arg list for AC_PATH_PROG for lcov and genhtml.
(detecting empty LCOV and GENHTML string in Makefiles).
Because we have now strong rule for lock ordering:
- VG locks must be taken in alphabetical order
- ORPHAN locks must be the last
vgs_locked() is now not needed.
This fixes problem with orphan locking, e.g.
vgremove VG1 | vgremove VG2
lock(VG1) | lock(VG2)
lock(ORPHAN) | lock(ORPHAN) -> fail, non-blocking
https://bugzilla.redhat.com/show_bug.cgi?id=578413
(More similar places in code.)
Physical segments were still allocated from global
command context mempool.
This leads to very high memory usage when
activating large VG (vgchange).
(Memory usage was about 2G when >3000LVs).
Fix it by properly using vg->vgmem private pool,
so all the memory is released early.
New memory pool parameter is needed here for pv_split_segment
function.
Also fix the same problem in some minor allocations
(vg description, lv segment split).
In addition to previous patch, we really do not need
to search for segment which was just allocated in
split request.
Make pv_split_segment function return newly allocated
(split) segment also.
(So after this patch, there is only one user
of slow find_peg_by_pe).
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
vg_validate call is an adept to optimisation, it is very
ineeficient and slow.
Anyway, we should call it only before writing data to disk.
The call in lvmcache was just temporary validation,
we realy do not need to revalidate cached metadata
every time.
(Actually, I added that there just to prove that cache works
properly and forgot to remove it.)
Patch removes it from lvmcache completely, this can hit only
internal bug in export function (and this bug must
be detected in any vg_write call anyway before).
The _read_vg uses already hash for PVs to optimise
reading of large VGs and avoiding repeated PV list traversing.
Use the same aproach to speed up parsing VG with many LVs.
If dmeventd runs with -d flag, it doesn't fork into backgroud.
The command kill(getppid(), SIGTERM) attempts to kill the parent dmeventd
process, however, if there is no parent, it kills whatever process spawned
dmeventd. In case of debugging with gdb, the parent is gdb, thus
kill(getppid(), SIGTERM) kills the debugger.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
clvmd does not propagate DMEVENTD_MONITOR_IGNORE.
Update get_activation_monitoring_mode() to check if the VG that the
LV is being activated in is clustered. If so, skip it.
Any get_activation_monitoring_mode() error will cause the associated LV
(or VG) to be skipped during activation. Both vgchange_single() and
lvchange_single(), which call get_activation_monitoring_mode(), are
called by their respective process_each_..() method.
in clvmd, dmevend, man, tests.
Don't include dependency files for clow and cscope.out targets
Improve dependency tracking for dmeventd and liblvm2cmd sources.
This check-in enables the 'mirrored' log type. It can be specified
by using the '--mirrorlog' option as follows:
#> lvcreate -m1 --mirrorlog mirrored -L 5G -n lv vg
I've also included a couple updates to the testsuite. These updates
include tests for the new log type, and some fixes to some of the
*lvconvert* tests.
clvmd's do_lock_lv() already properly controls dmeventd monitoring based
on LCK_DMEVENTD_MONITOR_MODE in lock_flags -- though one small fix was
needed for this to work: _lock_for_cluster() must treat
dmeventd_monitor_mode()'s return as a tri-state value.
Also cleanup do_lock_lv() to:
- explicitly init_dmeventd_monitor() based on LCK_DMEVENTD_MONITOR_MODE
- no longer reset init_dmeventd_monitor() to default at the end of
do_lock_lv() -- it is unnecessary