IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Adding function _add_partial_replicator_to_dtree() to create
partial tree for Replicator target.
Using dm_tree_node_set_presuspend_node() for Replicator.
Introduce struct cmd_vg to store information about needed
volume group name, vgid, flags and the pointer to opened VG.
Keep VGs list in alphabetical order for locking order.
Introduce functions:
cmd_vg_add() add new cmd_vg entry.
cmd_vg_lookup() search cmd_vgs for vg_name.
cmd_vg_read() open VGs in cmd_vgs list.
cmd_vg_release() close VGs in reversed order.
Add pointer to linked list of opened VGs. List temporarily keeps
the information about needed or locked and opened VGs for replicator target.
Also add cmd_missing_vgs flag information for quick check and
also for possible continuos process_each_lv() usage where we need
to detect whether failure has been caused by missing VG or
some other reason.
Adding configure.in support for Replicators.
Adding basic lib lvm support for Replicators.
Adding flags REPLICATOR and REPLICATOR_LOG.
Adding segments SEG_REPLICATOR and SEG_REPLICATOR_DEV.
Adding basic methods for handling replicator metadata.
Some commands start with a pvname, but we'd like to force users to
start with a vg handle to obtain a pv handle. Our best option seems
to be providing a way to look up the vgname from the pvname, and then
require them to use vg_read/vg_open.
In addition to the pvname lookup function, this patch also provides a
lookup by pvid. The lookup by pvid can be used in conjunction with
lvmcache_get_pvids to process all pvs in the system.
The pvid find function first calls lvmcache_vgname_from_pvid, which may
cause the label to be read if it is not in the cache. If the vgname is
returned is an orphan, we then check to see if there are metadata areas,
and if not, we scan every PV on the system by calling scan_vgs_for_pvs().
In most cases we should not need to do this, and by using the info->mdas
count, we avoid calling pv_read() as prior code did. So this patch is a
bit cleaner and should allow us to refactor more of the pv code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
are active mirrors or snapshots.
We don't have the mechanisms in place to change the device-mapper
tables for those targets that have behavioral differences between
cluster and single machine instances. Allowing users to change
the attribute but not changing the target's behavior can lead to
data corruption.
The following bugs are fixed/avoided by this patch:
235123 - vgchange -c [ny] do not change target types when necessary
289331 - RFE: switching from cluster domain to local domain needs to deactivate volume somehow
289541 - when changing from local to cluster, volumes can not appear to be deactivated
This should avoid various races between dmeventd on multiple nodes
in cluster where one node already repairing device and another
run full scan and locks the device.
the device cache file is dumped both in vgscan and clvmd process.
Unfortunately, clvmd calls lvmcache_label_scan,
it properly destroys persistent filter, but during
persistent_filter_dump it merges old cache content back!
This causes that change in filters is not properly propagated
into device cache after vgscan on cluster.
(Only new devices are added.)
https://bugzilla.redhat.com/show_bug.cgi?id=591861
A shortcut for --ignorelockingfailure, --ignoremonitoring, --poll n options
and LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable used all at
once in initialisation scripts (e.g. rc.sysinit or initrd).
Target install_dm_plugin installs files to libdir/device-mapper.
Target install_lvm2_plugin installs files to libdir/lvm2.
Both targets creates relative links to libdir to keep the code
compatible with current dlopen handling.
Once we will be able to read plugins from subdir, links
could be removed.
being able to remove more images from a mirror than the
number of PVs directly specified for removal.
The effort to fix bug 581611 corrected a bug that was unnoticed
at the time. The loop in _remove_mirror_images that looks over
the specified PVs was allowing devices that were previously
counted and moved to the end of the list to be double-counted.
This resulted in the number of devices needed for removal always
being satisfied - even if the user did not specify enough PVs
for removal to satisfy the request. When 581611 was fixed, this
double-counting no longer took place and the result was to remove
only the minimum of the number of PVs specified or the number
that was asked to be removed.
By simply always setting 'new_area_count' (as used to be done
only in the else statement), we return to the previous behavior.
Indeed, this is exactly what the double-counting was allowing
to happen before the fix of 581611.
Allow lv_remove_with_dependencies() to know the top-level LV that was
requested to be removed (otherwise it recurses and we lose context).
A merging snapshot cannot be removed directly but the associated origin
can be. Disallow removal of a merging snapshot unless the associated
origin is also being removed.
There's no need for foreign udev rules to touch LVM reserved devices
(snapshot, pvmove, _mlog, _mimage, _vorigin) even if they happen to
be visible. The same applies for /dev/disk content - no need to create
any content for these devices (and so no need to run any "blkid" etc.).
This also prevents setting any inotify "watch" from udev rules on such
devices that is a source of race conditions (the rules need to honor
DM_UDEV_DISABLE_OTHER_RULES_FLAG for this to work though).
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
Internally, we used DM names instead of UUIDs while processing event
handlers. This caused problems while trying to vgrename a VG with active LVs
where the names are being changed and so the devices were not found then.
The patch also contains a little bit of refactoring, moving "build_dlid" code
found in dev_manager.c to "build_dm_uuid", now in lvm-string.c (so we have
build_dm_uuid and build_dm_name at one place).
lvm2app needs a link back to the vg in order to use the vg handle for
memory allocations as well as other things. This patch adds the field
to struct physical_volume, and sets pv->vg when reading a vg from disk or
extending a vg by using the helper function previously added,
add_pvl_to_vgs(). Moves and renames are handled with separate code
inside move_pv() and vgmerge(). Add pv->vg check to vg_validate().
A NULL value in pv->vg signifies membership in the orphan VG.
Note though in the case of pv_read() on a device with metadatacopies == 0,
more devices may need to be read for an authoritative answer.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Now that we have library functions to add/delete a pv from the vg->pvs
list, call them from everywhere.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a delete function to manage the vg->pvs list.
NOTE: It may be possible to do further cleanup to these add/del functions
by passing a 'pv' as input instead of 'pv_list'. The pv_list is used for
functions which do allocations (lvcreate) while other places in the code
just manage a list of 'pv' (e.g. import functions, vgextend, etc).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move the increment of vg->pv_count next to the place where we add to
vg->pvs. It looks safe to do this since the only caller of import_pool_vg()
calls import_pool_pvs() immediately afterward, and there is no way
import_pool_vg() can fail (always returns 1). However, if there's a
memory allocation failure inside import_pool_pvs(), we will end up with
a different count in vg->pv_count that with the original code. In any
case, vg->pv_count should be as close to dm_list_size(&vg->pvs) as
possible, as is the case everywhere else in the code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The dm_list * parameter is unnecessary since we are passing in 'vg'
and the only caller of import_pool_pvs() passes '&vg->pvs' in the
dm_list * parameter. Just use vg->pvs directly in the function.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Patch is inspired by Debian's extra patch.
- removes OWNER & GROUP make vars they are parts of INSTALL command.
- adds INSTALL_PROGRAM for executable, uses $(INSTALL)
- adds INSTALL_DATA for non-executable data, uses ($INSTALL)
- adds INSTALL_WDATA for writable non-executable data, uses ($INSTALL)
- adds configure option --enable-write_install - to support
installatin of writable files used by distribution
- replaces usage of ifeq @LIB_SUFFIX@ with $(LIB_SUFFIX)
- installs .a files from static builds without executable flag
- installs .a files to $(usrlibdir) instead of $(libdir)
- installs all static binaries to $(staticdir)
- create .so links for devel package in $(usrlibdir) instead of
$(libdir)
- makes .so and .so.LIB_VERSION files within builddir
- removes VERSIONED_SHLIB and created versioned LIB_SHARED automagicaly
- install LIB_SHARED via install_lib_shared target
- install plugins via install_lib_shared_plugin target
- prints whole 'install' command during installation instead of less
informative "Installing $(something) $(somewhere)"
- install multiple man pages with one INSTALL command
- use DISTCLEAN_TARGETS instead of creating multiple distclean targets
Usage of VPATH makes troubles when used within $(builddir).
Not only source files are being found through VPATH,
but targets as well. (make --debug=v)
Thus if user builds the code in $(srcdir) and also in some $(builddir)
he gets mangled results as some generated files (i.e. .export.sym)
are 'reused' from $(srcdir) instead of $(builddir).
This patch switches to use vpath were we could explicitly name
suffixes that should be looked via vpath - we must take care,
we do not generate files with these suffixes:
.c, .in, .po, .exported_symbols
A user specifying duplicate paths on the cmdline of vgcreate will
get a message similar to the following:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Internal error: Duplicate PV id jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8 detected for /dev/loop3 in vgtest2.
This is caught by vg_validate(), but it would be good to find
this condition earlier in the vgcreate code. add_pv_to_vg()
currently checks by pvname, but does not look for duplcate pvids.
This patch adds the check for duplicate pvids and results in new
error output as follows:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Physical volume '/dev/loop5 (jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8)' listed more than once.
Unable to add physical volume '/dev/loop5' to volume group 'vgtest2'.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When moving parts of striped LVs, pvmove wouldn't care about leaving you with
two stripes on the same disk. Now --alloc anywhere is needed for that.
(Tried and gave up on two alternative approaches before the one committed here.)
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Refactor adding to the vg->pvs list and incrementing the count, which
will allow further refactoring. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Simple refactor to mov code that updates the vg extent counts from a
single pv's counts close to the code that adds a pv to vg->pvs and
updates vg->pv_count. No functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In add_pv_to_vg(), we should only add the pv to vg->pvs after all
internal checks have passed. The check for vg->extent_count exeeding
maximum was after we added the pv to the list, so this function could
return a state of vg->pvs that did not reflect other parameters such
as vg->pv_count.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Because we have now strong rule for lock ordering:
- VG locks must be taken in alphabetical order
- ORPHAN locks must be the last
vgs_locked() is now not needed.
This fixes problem with orphan locking, e.g.
vgremove VG1 | vgremove VG2
lock(VG1) | lock(VG2)
lock(ORPHAN) | lock(ORPHAN) -> fail, non-blocking
https://bugzilla.redhat.com/show_bug.cgi?id=578413
(More similar places in code.)
Physical segments were still allocated from global
command context mempool.
This leads to very high memory usage when
activating large VG (vgchange).
(Memory usage was about 2G when >3000LVs).
Fix it by properly using vg->vgmem private pool,
so all the memory is released early.
New memory pool parameter is needed here for pv_split_segment
function.
Also fix the same problem in some minor allocations
(vg description, lv segment split).
In addition to previous patch, we really do not need
to search for segment which was just allocated in
split request.
Make pv_split_segment function return newly allocated
(split) segment also.
(So after this patch, there is only one user
of slow find_peg_by_pe).
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
vg_validate call is an adept to optimisation, it is very
ineeficient and slow.
Anyway, we should call it only before writing data to disk.
The call in lvmcache was just temporary validation,
we realy do not need to revalidate cached metadata
every time.
(Actually, I added that there just to prove that cache works
properly and forgot to remove it.)
Patch removes it from lvmcache completely, this can hit only
internal bug in export function (and this bug must
be detected in any vg_write call anyway before).
The _read_vg uses already hash for PVs to optimise
reading of large VGs and avoiding repeated PV list traversing.
Use the same aproach to speed up parsing VG with many LVs.
Code moves initilization of stats values to _memlock_maps().
For dmeventd we need to use mlockall() - so avoid reading config value
and go with _use_mlockall code path.
Patch assumes dmeventd uses C locales!
Patch needs the call or memlock_inc_daemon() before memlock_inc()
(which is our common use case).
Some minor code cleanup patch for _un/_lock_mem_if_needed().
in clvmd, dmevend, man, tests.
Don't include dependency files for clow and cscope.out targets
Improve dependency tracking for dmeventd and liblvm2cmd sources.
to obtain sources. Create make.tmpl target for
simplier generation of cflow files with the help of
CFLOW_LIST, CFLOW_LIST_TARGET, CFLOW_TARGET.
Still cflow usage is not perfect.
Move daemons/ and lib/ subtargets to their Makefiles so we don't get
double cleanup error during execution of distclean target.
Instead of duplicating clean target inside distclean target,
just use it as a subtarget and avoid add duplicating code.
This check-in enables the 'mirrored' log type. It can be specified
by using the '--mirrorlog' option as follows:
#> lvcreate -m1 --mirrorlog mirrored -L 5G -n lv vg
I've also included a couple updates to the testsuite. These updates
include tests for the new log type, and some fixes to some of the
*lvconvert* tests.
clvmd's do_lock_lv() already properly controls dmeventd monitoring based
on LCK_DMEVENTD_MONITOR_MODE in lock_flags -- though one small fix was
needed for this to work: _lock_for_cluster() must treat
dmeventd_monitor_mode()'s return as a tri-state value.
Also cleanup do_lock_lv() to:
- explicitly init_dmeventd_monitor() based on LCK_DMEVENTD_MONITOR_MODE
- no longer reset init_dmeventd_monitor() to default at the end of
do_lock_lv() -- it is unnecessary
This is the next preparatory step towards better --alloc anywhere
support and is not intended to break anything that currently works so
please report any problems - segfaults, bogus data in the new debug
messages, or if the code now chooses bizarre allocation layouts.
. Add "monitoring" option to "activation" section of lvm.conf
. Have clvmd consult the lvm.conf "activation/monitoring" too.
. Introduce toollib.c:get_activation_monitoring_mode().
. Error out when both --monitor and --ignoremonitoring are provided.
. Add --monitor and --ignoremonitoring support to lvcreate. Update
lvcreate man page accordingly.
. Clarify that '--monitor' controls the start and stop of monitoring in
the {vg,lv}change man pages.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When we pv_read() a device that has an orphan vgname, we might need to scan
the system to be sure this is true. However, if the PV has mdas, there's
no way possible for it to have an orphan vgname unless it is a true orphan.
Some areas of the code were optimized to take advantage of this fact, while
others were not (we would still do the expensive scan if a device had mdas
but had an orphan VG).
This patch unifies the code so that every place we are operating on such
a PV, we skip the expensive scan if there are mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Petr Rockai <prockai@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
(VDSO on 32bit is VSyscall on 64bit)
It seems it could be locked on 64bit kernels running 32bit binaries,
but it makes troubles on real 32bit machines where mlock() returns
error when trying to lock such map area. (0xffffe000)
Behavior of mlockall() seems to be similar.
This patch adds a new implementation of locking function instead
of mlockall() that may lock way too much memory (>100MB).
New function instead uses mlock() system call and selectively locks
memory areas from /proc/self/maps trying to avoid locking areas
unused during lock-ed state.
Patch also adds struct cmd_context to all memlock() calls to have
access to configuration.
For backward compatibility functionality of mlockall()
is preserved with "activation/use_mlockall" flag.
As a simple check, locking and unlocking counts the amount of memory
and compares whether values are matching.
Modify linking of readline library. Create new substituted varible
READLINE_LIBS - readline library is linked ONLY with tools that really use
it - i.e. lvm. (Static lvm does not use readlin).
Previous behaviour put this library into the variable LIBS and thus
linked it with all created object files of lvm project (i.e. plugins...).
READLINE detection is simplified.
Termcap library is linked in only if readline library doesn't have its own
dependency (i.e. old distributions).
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
lvm2 devices have always UUID set even if imported from lvm1 metadata.
Patch removes name argument from dev_manager_info call and converts
all activation related calls to use query by UUID.
Also it simplifies mknode call (which is the only user on mknodes parameter).
We need to allocate memory for the tag and copy the tag value before we
add it to the list of tags. We could put this inside lvm2app since the
tools keep their memory around until vg_write/vg_commit is called, but
we put it inside the internal library to minimize code in lvm2app.
We need to copy the tag passed in by the caller to ensure the lifetime of
the memory until the {vg|lv} handle is released.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Similar refactoring to vgchange - pull out common parts and put into
library function for reuse. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Pull out common code to be called from tools as well as lvm2app.
Leave archive() at tool level so we can use from vgcreate
as well as vgchange. Should be no functional change.
- add stack macro in vgchange
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a merging snapshot to the deptree, using the "error" target, rather
than avoid adding it entirely. This allows proper cleanup of the -cow
device without having to rename the -cow to use the origin's name as a
prefix.
Move the preloading of the origin LV, after a merge, from
lv_remove_single() to vg_remove_snapshot(). Having vg_remove_snapshot()
preload the origin allows the -cow device to be released so that it can
be removed via deactivate_lv(). lv_remove_single()'s deactivate_lv()
reliably removes the -cow device because the associated snapshot LV,
that is to be removed when a snapshot-merge completes, is always added
to the deptree (and kernel -- via "error" target).
Now when the snapshot LV is removed both the -cow and -real devices
get removed using uuid rather than device name. This paves the way
for us to switch over to info-by-uuid queries.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Internally we store sizes in sectors, but lvm2app exports sizes
in bytes. We could get fancier and allow units configuration but
this fix should do for now.
Fixes rhbz561422.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We unfortunately don't yet _know_, in dev_manager_snapshot_percent(), if
a snapshot-merge target is active (activation is deferred if dev is
open); so we can't short-circuit origin devices based purely on existing
LVM LV attributes.
Set 'fail_if_percent_unsupported' in dev_manager_snapshot_percent() for
a merging origin LV, otherwise passing unsupported LV types to _percent
will lead to a default successful return with percent_range as
PERCENT_100.
For a merging origin, PERCENT_100 will result in a polldaemon that runs
infinitely (because completion is PERCENT_0).
When activating a merging origin it is valid, and expected, to not have
a node in the deptree for both the origin and its merging snapshot. The
_cached_info() caller is only concerned with whether a device is open.
If there isn't a node in the tree the associated device is definitely
not open.
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
In dev_manager_info 0 means error and 1 info is returned,
not that device exists (that value is part of info struct).
Fix query by uuid only (no name) which returns 0 when device
does not exist.
more descriptive message if locking fails instead of
"Locking type -1 initialisation failed."
Use read-only locking instead of misleading ignorelocking option
in message.
All this seems to do is provide a memory leak so remove it.
The only caller of _alloc_pv() later explicitly sets
pv->vg_name = fmt->orphan_vg_name so clearly this allocation
should be removed. I also saw no where in the code where
strncpy was used to assign pv->vg_name - only direct assignments
and strdup's.
merge completes. This narrows the scope of this "hack" (which still
needs a proper fix within the deptree).
This stops dmeventd from trying to access snapshot devices that were
already removed.
For mirror repair (and similar tasks) it can happen that full
device rescan is issued from clvmd.
Because code can be in the middle of repair (calling suspend)
clvmd should never try to scan suspended devices
(otherwise it causes deadlock).
Also code must not change ignore_suspended_device flag when
doing refresh_filters (called from lvmcache scan code).
'const'. Be consistent with its use (and dev_manager_snapshot_percent()).
Pass 'lv' from dev_manager_snapshot_percent() to _percent() to
_percent_run(). _percent_run() always dereferenced 'lv' (when
initializing segh) even though it may have been NULL (as was the case
until now for dev_manager_snapshot_percent()).
If a "snapshot-origin" LV (snapshot-merge whose merge was deferred
becuase it was open) was passed to _percent_run() it would always return
100%.
Update _percent_run() to NOT return PERCENT_100 et. al. if
->target_percent() wasn't ever called and supplied 'lv' is a merging
origin. A default return of 100% does not work for snapshot-merge.
Also tweak a related lvconvert log_error() to include "Aborting merge."
Eliminate 'merging_snapshot' from 'struct logical_volume' and just use
'snapshot' for origin lv's reference to the merging snapshot; also set
MERGING in the origin lv's status.
If either the origin or snapshot that is to be merged is open the merge
will not start; only the merge metadata will be written. The merge will
start on the next activation of the origin (or via lvchange --refresh)
IFF both the origin and snapshot are closed.
Merge on activate is particularly important if we want to merge over a
mounted filesystem that cannot be unmounted (until next boot) --- for
example root.
snapshots are suspended, new origin is created, snapshots are resumed, new
origin is resumed. So it allocates memory while suspended.
To fix it, move vg_commit after suspend_lv, so that the suspend code will
treat it as precommitted vg and will preload new origin prior to suspend.
NOTE: agk doesn't like this "hack"; need to revisit and fix
This is useful for when the snapshot is still active and merging hasn't
started yet; it shows a merge is pending. Once merging starts the
merging snapshot will be hidden but can still be displayed with 'lvs -a'
Report snapshot origin with merging snapshot as 'O' instead of 'o':
Before merge starts this shows that a merge is pending. While merging
the snapshot will be hidden, 'O' enables a user to see that there is a
snapshot merging.
"snapshot-merge" target based on whether the LV is a merging snapshot.
When activating a snapshot-merge target do not attempt to monitor the
LV for events; the polldaemon will monitor the snapshot as it is
merged.
Allow "snapshot-merge" target's usage to be parsed via standard
"snapshot" methods.
NOTE: follow on fixes to the _percent_run change are still needed
Introduces new libdevmapper function dm_tree_node_add_snapshot_merge_target
Verifies that the kernel (dm-snapshot) provides the 'snapshot-merge'
target.
Activate origin LV as snapshot-merge target. Using snapshot-origin
target would be pointless because the origin contains volatile data
while a merge is in progress.
Because snapshot-merge target is activated in place of the
snapshot-origin target it must be resumed after all other snapshots
(just like snapshot-origin does) --- otherwise small window for data
corruption would exist.
Ideally the merging snapshot would not be activated at all but if it is
to be activated (because snapshot was already active) it _must_ be done
after the snapshot-merge. This insures that DM's snapshot-merge target
will perform exception handover in the proper order (new->resume before
old->resume). DM's snapshot-merge does support handover if the reverse
sequence is used (old->resume before new->resume) but DM will fail to
resume the old snapshot; leaving it suspended.
To insure the proper activation sequence dm_tree_activate_children() was
updated to accommodate an additional 'activation_priority' level. All
regular snapshots are 0, snapshot-merge is 1, and merging snapshot is 2.
Make 'merging_snapshot' pointer that points from the origin to the
segment that represents the merging snapshot.
Import/export 'merging_store' metadata.
Do not allow creating snapshots while another snapshot is merging.
Snapshot created in this state would certainly contain invalid data.
NOTE: patches at the end of this series will remove 'merging_snapshot'
and will introduce helpful wrappers and cleanups.
This spurious 'break' has been here since this code was first committed
in June 2005 and stopped the algorithm behaving as described in the
comment above it and rendered the variable 'already_found_one' useless.
1. Found bug in 'redundant log' implementation that caused
problems when converting a linear that spanned multiple
devices to a mirror (wasn't checking for NULL value of
provided parameter in _alloc_parallel_area)
2. Testsuite was failing to perform tests when 'not' modifier
was used. This allowed a couple issues to slip through.
Added a 'not_sh' modifier that negates tests performed by
functions defined in the shell source file.
3. Was initializing a variable to far down, which cause
previously set value to be overridden. (This was the
result of the collision of the "redundant log" and
lvconvert fix patches.)
Upon successful fork(), _become_daemon() must assert that the locks that
are currently held belong to the parent, not the child. All of the
child's internal state saying 'this process holds a lock' has to be
reset.
A proper lvmcache_locking_reset() should follow later.
It is pretty much the same as reducing the number of
mirror legs, but we just don't delete them afterwards.
The following command line interface is enforced:
prompt> lvconvert --splitmirror <n> -n <name> <VG>/<LV>
where 'n' is the number of images to split off, and
where 'name' is the name of the newly split off logical volume.
If more than one leg is split off, a new mirror will be the
result. The newly split off mirror will have a 'core' log.
Example:
[root@bp-01 LVM2]# !lvs
lvs -a -o name,copy_percent,devices
LV Copy% Devices
lv 100.00 lv_mimage_0(0),lv_mimage_1(0),lv_mimage_2(0),lv_mimage_3(0)
[lv_mimage_0] /dev/sdb1(0)
[lv_mimage_1] /dev/sdc1(0)
[lv_mimage_2] /dev/sdd1(0)
[lv_mimage_3] /dev/sde1(0)
[lv_mlog] /dev/sdi1(0)
[root@bp-01 LVM2]# lvconvert --splitmirrors 2 --name split vg/lv /dev/sd[ce]1
Logical volume lv converted.
[root@bp-01 LVM2]# !lvs
lvs -a -o name,copy_percent,devices
LV Copy% Devices
lv 100.00 lv_mimage_0(0),lv_mimage_2(0)
[lv_mimage_0] /dev/sdb1(0)
[lv_mimage_2] /dev/sdd1(0)
[lv_mlog] /dev/sdi1(0)
split 100.00 split_mimage_0(0),split_mimage_1(0)
[split_mimage_0] /dev/sde1(0)
[split_mimage_1] /dev/sdc1(0)
It can be seen that '--splitmirror <n>' is exactly the same
as '--mirrors -<n>' (note the minus sign), except there is the
additional notion to keep the image being detached from the
mirror instead of just throwing it away.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Version >= 1.8.0 of the DM snapshot target appends metadata sectors used
to a snapshot's status. This patch allows LVM2 to accurately determine
if the snapshot store is empty. Knowing when a snapshot store is empty
is important in the context of snapshot-merge (means merge is complete).
Also update LVM2 to be aware of the possibility for "Merge failed" in
the snapshot-merge target's status.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
the background polldaemon is allowed to start. It can be used
standalone or in conjunction with --refresh or --available y.
Control over when the background polldaemon starts will be particularly
important for snapshot-merge of a root filesystem.
Dracut will be updated to activate all LVs with: --poll n
The lvm2-monitor initscript will start polling with: --poll y
NOTE: Because we currently have no way of knowing if a background
polldaemon is active for a given LV the following limitations exist and
have been deemed acceptable:
1) it is not possible to stop an active polldaemon; so the lvm2-monitor
initscript doesn't stop running polldaemon(s)
2) redundant polldaemon instances will be started for all specified LVs
if vgchange or lvchange are repeatedly used with '--poll y'
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch tries to correctly track changes in lvmcache related to commit/revert.
For vg_commit: if there is cached precommitted metadata, after successfull commit
these metadata must be tracked as committed.
For vg_revert: remote nodes must drop precommitted metadata and its flag in lvmcache.
(N.B. Patch do not touch LV locks here in any way.)
All this machinery is needed to properly solve remote node cache invalidaton which
cause several problems recently observed.
Lock mode is int masked by LCK_TYPE_MASK, always.
Patch also remove uneccessary masking lock flag on sender side,
if masking is needed, it is don on client side already.
- Add drop_precommitted flag to force drop precommitted metadata
- add lvmcache_commit_metadata() which upgrades precommitted metadata in cache
No functional change in this patch - just preparation for following change.
And decode flags in humar readable form in client.
And clean some trailing whitespaces.
No functional change in this patch (only debugging messages changed).
The use_precommitted flag indicates, that we want to use precommitted metadata
(used in suspend call to preload table with precommitted data).
But if there are no such data, committed metadata are read but the cache
still contains that precommitted flag.
(The problem is that later possible drop_metadata call will not invalidate
device in cache.)
The wrong precommitted state is stored in on remote nodes during normal
suspend/resume cycle _without_ vg_write/commit.
Use the PRECOMMITTED status flag here instead (which is always set if using
precommited metadata here).
If renaming snapshot with virtual origin, the origin is renamed too.
But the code must resume LVs in reverse order to properly
pair memlock (in cluster locking).
(The resume of snapshot resumes origin too and later resume
is ignored otherwise.)
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
The memlock_inc() fix is wrong, memlock count is not
propagated to long living process (clvmd) and just
it underflow there.
Also suspend is needed to pre-load precommited metadata
on other nodes (remapping to error taget in this case).
With explicit suspend we generate lock request and code
can update memlock count.
(Infinitely "locked" memory caused that fs_unlock() was not
called properly and on cluster nodes remains
old links in /dev/mapper for not active devices.)
(N.B. failing of suspend call here is not handled as fatal
error - the LV is going to be removed later anyway.)
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).
Patch should not cause any problems, only real change is
removing LCK_LOCAL bit from lock type flag, it is never used there.
(LCK_LOCAL is part arg[1] bits anyway.)
If there is problem deactivate LV and
_init_mirror_log is called with remove_on_failure = 1,
remove the newly created log LV from metadata.
(This can happen if there is active device with the same name
but different UUID.)
The main reason for this "workaround" patch is to
- do not keep _mlog volume in metadata, so user can repeat the action
- print better error message describing the real problem
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
/dev/vg_bar/lv1_mlog: not found: device not cleared
Aborting. Failed to wipe mirror log.
Error locking on node bar-01: Input/output error
Unable to deactivate mirror log LV. Manual intervention required.
Failed to create mirror log.
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Aborting. Unable to deactivate mirror log.
Failed to initialise mirror log.
At this point they probably do not matter but going forward they
may - depends on future patches for replicator, etc. I think
these probably got missed because they were 'flags' so I changed
the name to 'status' to be consistent. So the on-disk
things 'flags' and the in structure 'status' (bits).
NOTE: WHATS_NEW already has entry for this in current release.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
pvmove suspends all moved LVs + pvmoveX mirrored LV itself.
This suspends even underlying pvmoveX and following explicit
suspend call is just noop.
But in resume the pvmoveX volume is no longer underlying
device for moved LVs, so it performs full resume with memlock
decrease.
Code must call memlock_inc() if suspend is requested, volume
is already suspended and error is not requested.
These are no longer used by anyone. The dm_list defines are all in
libdevmapper.h and libdm/datastruct/list.c contains any function definitions.
There is some code in "old-tests" that still use this but this code is not
being maintained.
Thanks to Zdenek for spotting this.
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
The sysfs filter initialise hash of available devices using
scan of /sys/block. We need to refresh even this hash
when performing full scan otherwise the newly appeared
device could be rejected, because there is no entry
in sysfs filter.
This easily could happen when attaching new device
to cluster node. (Only force refresh of context
in clvmd -R works here now).
Unfortunately consequences of this are much worse,
missing device part on that node is replaced with missing segment
(even when no partial arg is selected) and this directly
lead to data corruption.
See https://bugzilla.redhat.com/show_bug.cgi?id=538515
Simply fix it by refreshing device filters in lvmcache
before performing the full device scan.
(This affects only cluster locking because only cluster
locking module set LCK_PRE_MEMLOCK.)
With currect code you get
# vgchange -a n
Internal error: _memlock_count has dropped below 0.
when using cluster locking.
It is caused by _unlock_memory calls here
if ((flags & (LCK_SCOPE_MASK | LCK_TYPE_MASK)) == LCK_LV_RESUME)
memlock_dec();
Unfortunately it is also (wrongly) called in immediate unlock
(when LCK_HOLD is not set) from lock_vol
(LCK_UNLOCK is misinterpreted as LCK_LV_RESUME).
Avoid this by comparing original flags and provide memlock
code type of operation (suspend/resume).
indented metadata lines.
Macro outnl() is using exported out_newline() instead of direct
call f->fn(), that required the visibility of the internal
struct formatter.
Rename fill_default_pvcreate_params to pvcreate_params_set_defaults.
Rename pvcreate_validate_restore_params to pvcreate_restore_params_validate.
Rename pvcreate_validate_params to pvcreate_params_validate.
Similar to other vg_set_* functions, we create a vg_set_clustered() function
which does a few checks and sets a flag. This is where we check for
any limitations of clusters.
The DRBD uses underlying device so code should prefer top
device if duplicate is found.
Patch also introduce
dev_subsystem_part_major and dev_subsytem_name
functions to easily handle all these replication susbystems
and not hardcode md_major call.
See https://bugzilla.redhat.com/show_bug.cgi?id=530881
for full problem description.
- we have these levels when the udev rules are processed:
10-dm.rules --> [11-dm-<subsystem>.rules] --> [12-dm-permissions.rules] -->
13-dm-disk.rules --> [...all the other foreign rules...] --> 95-dm-notify.rules
- each level can be disabled now by
DM_UDEV_DISABLE_{DM, SUBSYSTEM, DISK, OTHER}_RULES_FLAG
- add DM_UDEV_DISABLE_DM_RULES_FLAG to disable 10-dm.rules
- add DM_UDEV_DISABLE_OTHER_RULES_FLAG to disable all the other (non-dm) rules.
We cutoff these rules by using the 'last_rule', so this one should really be
used with great care and in well-founded situations. We use this for lvm's
hidden and layer devices now.
- add a parameter for add_dev_node, rm_dev_node and rename_dev_node so it's
possible to switch on/off udev checks
- use DM_UDEV_DISABLE_DM_RULES_FLAG and DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG
if there's no cookie set and we have resume, remove and rename ioctl.
This could happen when someone uses the libdevmapper that is compiled with
udev_sync but the software does not make use of it. This way we can switch
off the rules and fallback to libdevmapper node creation so there's no
udev/libdevmapper race.
[root@xxxx-01 ~]# lvconvert -m 1 --corelog VG/cmirror
Unable to convert the log of inactive cluster mirror cmirror
I've tried to clean-up the message a little more, so the name
of the mirror stands out more while preserving the sense that
it's not a problem with the specific device, but the fact that
it is inactive that is causing the problem.
New msg:
Unable to convert the log of an inactive cluster mirror, cmirror
Split pvcreate_validate_params into recovery and non-recovery parameters.
This is necessary so we can call the non-recovery validate function from
vgextend / vgcreate. Note in the pvcreate tool case, we must call the
recovery validation function first (see treatment of pe_start and --zero),
and that we add a call to fill_default_pvcreate_params before the validation
functions.
We need defaults for pvcreate_params at a higher level - this will
allow us to use a common function from the tools to take defaults,
then fill in any non-defaults from the commandline.
Future patches will refactor vgcreate/vgextend to call this function
if one or more pvcreate parameters are given on the commandline.
Another refactoring for implicit pvcreate support. We need to get
the pvcreate parameters somehow to the vg_extend routine. Options
seemed to be:
1. Attach the parameters to struct volume_group. I personally
did not like this idea in most cases, though one could make an
agrument why it might be ok at least for some of the parameters
(e.g. metadatacopies).
2. Pass them in to the extend routine. This second route seemed
to be the best approach given the constraints.
Future patches will parse the command line and fill in the actual
values for the pvcreate_single call.
Should be no functional change.
Should be no functional change. If this parameter is set to NULL, just fail
the extend if the device is not already a PV. If non-NULL, try pvcreate_single
before failing. Note that pvcreate_single() handles the log_error in case
of failure so we just return 0 if pvcreate_single() fails.
lv_deactivate now returns always success, because tree deactivation
functions (see dm_tree_deactivate_children) always returns success.
Because code should return failure in lv_deactivate at least,
fix it by checking for device existence after real deactivation call.
(After discussion this was prefered solution to dm tree function rewrite
which affects snapshots and mirrors.)
Add configure --enable-units-compat to set si_unit_consistency off by default.
Use standard output units for 'PE Size' and 'Stripe size' in pv/lvdisplay.
Clean up VG_RESIZEABLE flag by creating vg_is_resizeable().
Update comment - we no longer have ALLOW_RESIZEABLE.
Also use vg_is_exported() in one place missed by earlier patch.
Should be no functional change.
This patch is all just cleanup and no other patch depends on it.
Replace explicit dereference and check with vg_is_exported().
Update a few copyrights and remove unnecessary whitespace.
Should be no functional change.
Of the vgs field vg_attr, a few of the most likely to be used attributes
are clustered, exported, and partial. This patch adds the following 3
functions:
uint64_t lvm_vg_is_clustered(const vg_t vg)
uint64_t lvm_vg_is_exported(const vg_t vg)
uint64_t lvm_vg_is_partial(const vg_t vg)
Now that we've split vg_remove_single into two routines, in the first routine
that only manipulates memory, we move the PVs from the vg->pvs list to the
vg->removed_pvs list. Then later, we iterate through this list to write the
removed PVs to disk, which removes them from the volume group and places them
into the internal ORPHAN VG.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Split vg_remove_single into vg_remove_check (mandatory checks before
vgremove) and vg_remove (do actual remove by committing to disk).
In liblvm, we'd like to provide an consistent API that allows multiple
changes in memory, then let lvm_vg_write() control the commit to disk. In
some cases (for example, lvresize calls fsadm) this may not be possible.
However, since we are using an object model and dividing things into small
operations, the most logical model seems to be the lvm_vg_write model, and
handling the special cases as they arrive. So as best as possible
we move towards this end.
A possible optimization would be to consolidate vg_remove (committing)
code with vgreduce code. A second possible optimization is making vgreduce
of the last device equivalent to vgremove. Today, lvm_vg_reduce fails if
vgreduce is called with the last device, but from an object model perspective
we could view this as equivalent to vgremove and allow it. My gut feel is
we do not want to do this though.
Author: Dave Wysochanski <dwysocha@redhat.com>
Later patches should consolidate the vgremove / vgreduce functions but for
now let's clarify what vg_remove actually does by changing the name.
Author: Dave Wysochanski <dwysocha@redhat.com>
Add a new constraint that vgname locks must be obtained in
alphabetical order. At this point, we have test coverage for
the 3 commands affected - vgsplit, vgmerge, and vgrename.
Tests have been updated to cover these commands.
Going forward any command or library call that must obtain
more than one vgname lock must do so in alphabetical order.
Future patches will update lvm2app to enforce this ordering.
Author: Dave Wysochanski <dwysocha@redhat.com>
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7BoKYEeKMr033xK6o4og7F13sGi|��� already in use on "/dev/sdb1"
is now
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi already in use on "/dev/sdb1"
Eliminate busy loop during pvcreate of a "normal" partition.
_md_sysfs_attribute_snprintf() would busy loop if the device it was
given was not a blkext-based MD partition.
Rather than being cute with a busy-loop prone 'goto check_md_major' in
_md_sysfs_attribute_snprintf(): explicitly check if the provided device
is a blkext-based partition (blkext_major()); and then check that the
get_primary_dev() determined parent is an MD device (md_major()).
The device-mapper mirror CTR table has been changing over time. This has
now been corrected to handle the old and new methods for invoking the
'block_on_errors' and 'cluster' features. (The code that does this was
accidentally committed in the previous check-in. This check-in finishes
the job.)
Rename private _primary_dev() to a public get_primary_dev() and reuse it
to allow retrieval of the MD sysfs attributes (raid level, etc) for MD
partitions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Improve lib/device/device.c:_primary_dev()'s ability to look up the
primary device associated with all partitions; including blkext
(e.g. partitions directly on MD). The same will also work for obscure
sysfs paths; e.g.: paths with mangled names like the HP cciss driver
uses: /sys/block/cciss!c0d0/cciss!c0d0p1/
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Adds 'data_alignment_detection' config option to the devices section of
lvm.conf. If your kernel provides topology information in sysfs (linux
>= 2.6.31) for the Physical Volume, the start of data area will be
aligned on a multiple of the ’minimum_io_size’ or ’optimal_io_size’
exposed in sysfs.
minimum_io_size is used if optimal_io_size is undefined (0). If both
md_chunk_alignment and data_alignment_detection are enabled the result
of data_alignment_detection is used.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the pvcreate --dataalignmentoffset option is not specified the start
of a PV's aligned data area will be shifted by the associated
'alignment_offset' exposed in sysfs (unless
devices/data_alignment_offset_detection is disabled in lvm.conf).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Documented which use-cases force the reinstatement of the nuanced
handling of pe_start. As soon as orphan PVs are eliminated much of this
will no longer be a concern ('preserve_pe_start' can be reenabled in
.pv_setup).
Added defensive 'if (pv->pe_align)' check in _text_pv_write()'s pe_start
loop.
If pv_setup was given a non-zero pe_start it would short-circuit
establishing a default pv->pe_align. pv->pe_align=0 would result
in a divide by zero in _mda_setup(). 'vgconvert -M2 $vgname' hit this.
.pv_write still properly preserves pe_start if it was supplied.
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
areas.
This preserved pe_start would quickly be readjusted to follow the first
mda anyway. An example use-case that hit this code path is: running
pvcreate on an already existing PV _without_ a preceeding pvremove.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Without this fix rounding the end of the first mda to a pe_align
boundary could silently exceed the disk_size.
Final 'if (start1 + mda_size1 > disk_size)' block serves as a safety
net.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Document existing pe_start policy.
Fix issue in _text_pv_setup() where existing pe_start case could have
the pv->pe_start set to pv->pe_align even though pe_start shouldn't ever
change.
vgconvert and pvcreate have a facility to preserve the existing start
of the on-disk data extents, known as pe_start.
They indicate this by passing the existing value to the pvsetup function
which must preserve it.
This patch avoids one particular case where the value could get
changed incorrectly now that the alignment settings are configurable.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
For now, a simple way to enforce the read/write semantics is to just save the
open mode of the VG. If the caller uses lvm_vg_create, the mode is write.
The caller using lvm_vg_open can use either read or write to open the VG.
Once we have this, we enforce the permissions on each API call and don't allow
a caller to modify a VG that has not been opened properly.
This may be better combined with the locking mode, but I view that as future
cleanup, past this initial release. The intial release should enforce the
basic object semantics though, as described in the lvm.h file.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Adding the ability to get the seqno is important for an application to
determine if something has changed in a VG. Otherwise, the only way to
know is to open the VG with write permission and hold the handle.
This function behaves a little bit different than vg_reduce_single, because
it allowes to remove even the latest pv. This has been done to be consistent
to lvm_vg_create, which creates an empty vg.
removed_pvs has been added to the volume_group struct. vg_reduce adds remove
pvs to this list to be able to commit the changes for the pvs in lvm_vg_comm
in liblvm2app.
Initialize removed_pvs list in format-specific volume_group constructors.
Ideally, we should have a base constructor here that initializes the general
non-format specific members of struct volume_group. But until then, there
are multiple places to initialize these members. Maybe a better patch would
be a base constructor patch for struct volume_group. That is more work
though.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Thomas Woerner <twoerner@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
The two liblvm functions that return a list of vgnames and vguuids use
cmd->mem to allocate the list. Make it clear to the caller that this
memory will be freed when the LVM handle is freed.
Clean up and clarify the return value of the functions. In the
case of a memory allocation error, add a couple log_errnos to the internal
code, and make it clear that memory allocation returns a NULL pointer.
If there are no VGs in the system, the list returned is an empty list.
Make a note of the fact that currently we return hidden VG names, how
these can be detected (always start with "#"), and that they should
not be used.
Author: Dave Wysochanski <dwysocha@redhat.com>
For liblvm 'get' functions, we should share code with the reporting functions.
This means we need common code to return the values for the fields.
In this patch we refactor a few of the fields needed in liblvm.
Unfortunately, for the simple fields that do derefernces of structure
members (for example, vg_extent_count), we cannot call the common function
from the reporting infrastructure without more refactoring. The reason is
that the dereference of the simple fields is done deep inside the reporting
code (to get the generic "data" pointer), and the display function is a
generic 'size32' function. We can fix these issues later with more
refactoring.
Should be no functional change and the testsuite should cover any possible
regressions. The only fields in the report affected by this patch are:
vg_size, vg_free, and pv_mda_count.
Author: Dave Wysochanski <dwysocha@redhat.com>
After some refactorings, we can now move the bulk of _lvcreate into the
internal library, and we can call from liblvm. In the future, we should
refactor lv_create_single further, probably by segtype, to reduce the
size of struct lvcreate_params. For now this is a reasonable refactor
and allows us to re-use the function from liblvm.
Author: Dave Wysochanski <dwysocha@redhat.com>
The implicit pvcreate require either moving the ORPHAN_VG lock outside
pvcreate_single or somehow having the function know or detect whether
the ORPHAN_VG lock is already held.
Author: Dave Wysochanski <dwysocha@redhat.com>
Passing NULL for pvcreate parameters gives you default parameters for
pvcreate_single.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
In preparation for implicit pvcreate during vgcreate / vgextend,
move bulk of pvcreate logic inside library.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
We must hold the VG_ORPHAN lock until we commit to disk. Otherwise,
we risk a race condition on vgcreate / vgextend. Reverts the following
commit:
commit 72a41480ba
Author: Dave Wysochanski <dwysocha@redhat.com>
Date: Fri Jul 10 20:09:21 2009 +0000
Move orphan lock obtain/release inside vg_extend().
With this change we now have vgcreate/vgextend liblvm functions.
Note that this changes the lock order of the following functions as the
orphan lock is now obtained first. With our policy of non-blocking
second locks, this should not be a problem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
These messages are unnecessary in the set functions. We check for this
condition and print a message in the vgchange tool but not the library
functions.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When converting to the new liblvm functions, the vgcreate code path
changed to create a new vg, then set values. As a result of this
change, and the fact that we give a user a message if they try to
set the same value of a VG attribute (extent_size, alloc_policy, etc),
you'll see these 2 extraneous "is already" messages with vgcreate:
tools/lvm vgcreate vg2 /dev/loop2
Physical extent size of VG vg2 is already 4.00 MB
Volume group allocation policy is already normal
Volume group "vg2" successfully created
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Since we are using errno values, we should use '0' as a default value
which indicates a non-error, rather than defining some made-up default
value that is not defined in errno. If we need to deviate from errno
values, this will most likely indicate a flaw in our design.
Add prototypes for lvm_errno and lvm_errmsg inside lvm.h and provide
a basic description of their function. This fixes a couple compile
warnings.
Author: Dave Wysochanski <dwysocha@redhat.com>
We provide a lock type that behaves like no_locking, but is not
clustered. Moreover, it also forbids any write locks. This magically (and
consistently) prevents use of clustered VGs, or changing local VGs with
--ignorelockingfailure. As a bonus, we can remove the special hacks in a few
places. Of course, people looking for trouble can always set their locking_type
to 0 to override.
Define the 5 main liblvm objects to be the pv, vg, lv, lvseg, and pvseg.
We need handles defined to all these objects in order for liblvm to be
equivalent to the reporting commands pvs, vgs, and lvs.
- move vg_t, lv_t, and pv_t from metadata-exported.h into lvm.h
- move lv_segment and pv_segment forward declarations into lvm.h
- add lvseg_t and pvseg_t to lvm.h
NOTE: We currently have an inconsistency in handle definitions.
lvm_t is defined as a pointer, while these other handles are just
structures. We should pick one scheme and be consistent - perhaps
define all handles as pointers (this is what I've seen elsewhere).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
The checks for RESIZEABLE_VG should now be inside the various functions that
have to do such operations.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Remove READ_REQUIRE_RESIZEABLE flag from vgsplit similar to the removal from
vgextend. Move the check inside the functions that actually move pvs from
one vg structure to another. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
In the future we may export these functions or something like them in liblvm
For now this helps in cleaning up the checks for RESIZEABLE since we can
use the internal library function vg_bad_status_bits.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Move the check for the RESIZEABLE flag inside the vg_extend function.
When we consolidated the vg locking, reading, and status flag checking,
we tied the check for the RESIZEABLE flag to the vg_read() call. The problem
with this is you cannot know what other APIs the application my or may not
call after a vg_read() call. Thus the READ_REQUIRE_RESIZEABLE flag is not
really ideal - ideally we should be checking for this flag on a specific
operation, not inside the vg_read() call. This patch moves one check inside
the library.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
xlate64 produces unsigned long long type, but PRIu64 is defined
to accept argument unsigned long type (on 64-bit machines).
On existing machines, both types have the same size, so it works,
but it is still wrong and produces a warning.
Fix it by using a cast to uint64_t --- according to the standard,
PRIu64 argument matches type uint64_t.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
With this change we now have vgcreate/vgextend liblvm functions.
Note that this changes the lock order of the following functions as the
orphan lock is now obtained first. With our policy of non-blocking
second locks, this should not be a problem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move the vg orphan lock inside vg_remove_single, now a complete liblvm
function. Note that this changes the order of the locks - originally
VG_ORPHAN was obtained first, then the vgname lock. With the current
policy of non-blocking second locks, this could mean we get a failure
obtaining the orphan lock. In the case of a vg with lvs being removed,
this could result in the lvs being removed but not the vg. Such a
scenario could have happened prior though with a different failure.
Other tools were examined for side-effects, and no major problems
were noted.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move check for active LVs outside of library function. The vgremove
liblvm function function will fail if there are active LVs. It will
be the application's responsibility to check this condition and remove
the LVs individually before calling vgremove. Note also that we've
duplicated the EXPORTED_VG check in vgremove_single (tools) and
vg_remove_single (library). Duplication seemed the only option here
since we don't want to do the automatic removal of LVs (in the tools)
if the vg is exported, and we still need to protect the library call
from removal if the vg is exported.
We still need to deal with the ORPHAN lock but vg_remove_single is now
very close to our liblvm function.
TODO: Refactor lvremove in a similar way.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
If there is syntax error in metadata, it now prints messages
like:
Couldn't read 'start_extent' for segment 'extent_count'.
Couldn't read all logical volumes for volume group vg_test.
The segment specification is wrong and confusing.
Patch fixes it by introducing "parent" member in config_node which
points to parent section and config_parent_name function, which
provides pointer to node section name.
Also it adds several LV references where possible.
vg_t *vg_create(struct cmd_context *cmd, const char *vg_name);
This is the first step towards the API called to create a VG.
Call vg_lock_newname() inside this function. Use _vg_make_handle()
where possible.
Now we have 2 ways to construct a volume group:
1) vg_read: Used when constructing an existing VG from disks
2) vg_create: Used when constructing a new VG
Both of these interfaces obtain a lock, and return a vg_t *.
The usage of _vg_make_handle() inside vg_create() doesn't fit
perfectly but it's ok for now. Needs some cleanup though and I've
noted "FIXME" in the code.
Add the new vg_create() plus vg 'set' functions for non-default
VG parameters in the following tools:
- vgcreate: Fairly straightforward refactoring. We just moved
vg_lock_newname inside vg_create so we check the return via
vg_read_error.
- vgsplit: The refactoring here is a bit more tricky. Originally
we called vg_lock_newname and depending on the error code, we either
read the existing vg or created the new one. Now vg_create()
calls vg_lock_newname, so we first try to create the VG. If this
fails with FAILED_EXIST, we can then do the vg_read. If the
create succeeds, we check the input parameters and set any new
values on the VG.
TODO in future patches:
1. The VG_ORPHAN lock needs some thought. We may want to treat
this as any other VG, and require the application to obtain a handle
and pass it to other API calls (for example, vg_extend). Or,
we may find that hiding the VG_ORPHAN lock inside other APIs is
the way to go. I thought of placing the VG_ORPHAN lock inside
vg_create() and tying it to the vg handle, but was not certain
this was the right approach.
2. Cleanup error paths. Integrate vg_read_error() with vg_create and
vg_read* error codes and/or the new error APIs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
NOTE: vg_set_alloc_policy() returns success if you try to set a value that
is already stored. The behavior of vgchange is the same though - it fails.
There is a fixme noted in the code about this inconsistency, which should
be resolved if possible.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In liblvm, we will reserve the word 'change' to mean an API that
both sets one or more values, and commits to disk. This will be
consistent with the LVM commandline. The existing vg_change_pesize()
function does not commit to disk, but just changes the extent_size
and ensures all internal structures are updated. This logic should
be contained in a function that sets the value.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
It would be nice to have one function that does all the validation
and setting of the VG's pesize. However, currently some checks
are in the higher-level function _vgchange_pesize(), and some
checks are in the lower function vg_change_pesize().
This patch moves most of the higher-level checks inside
vg_change_pesize. In one case a failure return code is
changed from ECMD_FAILED to EINVALID_CMD_LINE.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove unneeded LOCK_NONBLOCKING from vg_read() API and tools that
use it. We no longer need this flag anywhere since we now automatically
set LCK_NONBLOCK inside lock_vol() if vgs_locked().
For further details, see:
commit d52b3fd3fe
Author: Dave Wysochanski <dwysocha@redhat.com>
Date: Wed May 13 13:02:52 2009 +0000
Remove NON_BLOCKING lock flag from tools and set a policy to auto-set.
As a simplification to the tools and further liblvm, this patch pushes
the setting of NON_BLOCKING lock flag inside the lock_vol() call.
The policy we set is if any existing VGs are currently locked, we
set the NON_BLOCKING flag.
At some point it may make sense to add this flag back if we get an
RFE from a liblvm user, but for now let's keep it as simple as
possible.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove READ_CHECK_EXISTENCE and vg_might_exist().
This flag and API is no longer used now that we have a separate
API to check for existence.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove unneeded LOCK_KEEP from vg_read() interface.
Update comment to clarify cases where _vg_lock_and_read() may return
with an error but the lock held. Would be nice to make the vg_read()
interface consistent with regards to lock held and error behavior.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Update units_to_bytes() to support (S)ectors: 500 bytes.
- 500 byte (S)ectors is of questionable value but it adds to consistency
if a user happens to use --units S. This seems better than an error.
Updated test/t-covercmd.sh to test --units [hS]
Document the units that can be displayed via --units uniformly.
- (p)etabytes and (e)xabytes were missing in pvs, vgs and lvs man pages.
Made lvreduce man page "... in units of megabytes." consistent (with the
lvextend and lvresize man pages).
Sun May 3 13:06:14 CEST 2009 Petr Rockai <me@mornfall.net>
* Don't segfault in vg_release when vg->cmd is NULL.
Author: Petr Rockai <prockai@redhat.com>
Committer: Dave Wysochanski <dwysocha@redhat.com>
Sun May 3 12:32:30 CEST 2009 Petr Rockai <me@mornfall.net>
* Rework the toollib interface (process_each_*) on top of new vg_read.
Rebased 6/26/09 by Dave W.
- Add skipping message to process_each_lv
- Remove inconsistent_t.
Is an application uses query and set major:minor
to device, it should not fallback to default major by default.
Add new function whoich allows that (and use it in lvm2).
E.g.
# vgscan
Parse error at byte 2360 (line 54): expected a value
Failed to load config file /etc/lvm/lvm.conf
You have a memory leak (not released memory pool):
[0x818c788] library (12 bytes)
...
status flag after reading the volume group. Thus, no need to set the flag
in vg_read() or clear it later before calling _vg_bad_status_bits().
Also, add back in the !lockingfailed() as part of the CLUSTERED flag check.
It's unclear why it was removed when the check was moved from
_vg_bad_status_bits() to inside _vg_lock_and_read().
There was an open question about the last check in the 'if' stmt for
lockingfailed() with a previous patch submitted. However, I would
defer that right now as it is a separate item and this patch should
be no functional change by including the !lockingfailed().
Petr acked this patch on 5/26 I just forgot to check it in.
Acked-by: Petr Rockai <prockai@redhat.com>
Various tools need to check for existence of a VG before doing something
(vgsplit, vgrename, vgcreate). Currently we don't have an interface to
check for existence, but the existence check is part of the vg_read* call(s).
This patch is an attempt to pull out some of that functionality into a
separate function, and hopefully simplify our vg_read interface, and
move those patches along.
vg_lock_newname() is only concerned about checking whether a vg exists in
the system. Unfortunately, we cannot just scan the system, but we must first
obtain a lock. Since we are reserving a vgname, we take a WRITE lock on
the vgname. Once obtained, we scan the system to ensure the name does
not exist. The return codes and behavior is in the function header.
You might think of this function as similar to an open() call with
O_CREAT and O_EXCL flags (returns failure with -EEXIST if file already
exists).
NOTE: I think including the word "lock" in the function name is important,
as it clearly states the function obtains a lock and makes the code more
readable, especially when it comes to cleanup / unlocking. The ultimate
function name is somewhat open for debate though so later we may rename.
Because preload of table for snapshot can produce snapshot
metadata (in kernel cow header) read.
Code should suspend origin first to avoid possible deadlock
when preloading (thus calling snapshot in-kernel constructor)
for origin with suspended cow device.
(fixes previous commit)
Several commands calls process_each_vg() and in provided
callback it explicitly recovers VG if inconsistent.
(vgchange, vgconvert, vgscan)
It means that old VG is released and reread but the function
above (process_one_vg) tries to unlock and release old VG.
Patch moves the repair logic into _process_one_vg() function.
It always tries to read vg (even inconsistent) and then decides
what to do according new defined parameter.
Also patch unifies inconsistent error messages.
The only slight change if for vgremove command, where
it now tries to repair VG before it removes if force arg is given.
(It works similar way before, just the order of operation changed).
When some parts of lvm are built as shared libraries (for example with
--with-snapshots=shared), the 'make' command does not build these parts.
The shared parts are built with 'make install' command.
This bug can be seen if you go to 'lib' subdirectory and type 'make'.
If you type 'make', the shared libraries are not built, if you type
'make all', the shared libraries are built.
The reason for the bug is the line $(SUBDIRS): $(LIB_STATIC)
If make is executed without any arguments, it makes the first target
in the Makefile. If the first target is '$(SUBDIRS): $(LIB_STATIC)',
it only builds static libraries.
This patch moves '$(SUBDIRS): $(LIB_STATIC)' after
include $(top_srcdir)/make.tmpl. make.tmpl contains the 'all' target
as its first target, so 'make' will be equivalent to 'make all' and
shared libraries will be build with 'make' command.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
When mirror convert polling is started (mainly as backgound process,
in lvchange -a y or in lvconvert itself) it tries to read VG
and LV identified by its name.
Unfortunatelly, the VG can have already different LV under the same name,
and various more or less funny things can happen (note that
_finish_lvconvert_mirror suspends the volume for example).
(the typical example is our testing script which continuously recreates
LVs under the same name in the same VG.)
This patch adds optional uuid parameter which helps to properly
select the monitoring object. For lvconvert polling it is set to LV UUID
and both _get_lvconvert_vg and _get_lvconvert_lv uses it to read proper VG/LV.
(In the pvmove case it is NULL, here we poll for physical volume name).
If there is no free area for log, code should break the loop.
(Otherwise it uses uninitializes areas later.)
Easily reproducible using lvconvert --repair
- kill device with log
- run lvconvert --repair vg/lv (with no PV usable for log)
During vgreduce is failed mirror image replaced with error segment,
this segmant type has always area_count == 0.
Current code expects that there is at least one area with device,
patch fixes it by additional check (fixes segfault during vgreduce).
Also do not calculate readahead in every lv_info call, we only need
to cache PV readahead before activation calls which locks memory.
When we are stacking LV over device, which has for some reason
increased read_ahead (e.g. MD RAID), the read_ahead hint
for libdevmapper is wrong (it is zero).
If the calculated read_ahead hint is zero, patch uses read_ahead of underlying device
(if first segment is PV) when setting DM_READ_AHEAD_MINIMUM_FLAG.
Because we are using dev-cache, it also store this value to cache for future use
(if several LVs are over one PV, BLKRAGET is called only once for underlying device.)
This should fix all the reamining problems with readahead mismatch reported
for DM over MD configurations (and similar cases).
Current code, when need to ensure that volume is not
active on remote node, it need to try to exclusive
activate volume.
Patch adds simple clvmd command which queries all nodes
for lock for given resource.
The lock type is returned in reply in text.
(But code currently uses CR and EX modes only.)
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
We can temporarily violate max_lv during mirror conversion etc.
(If the operation fails, orphan mirror images are visible to administrator
for manual remove for example. Not that this should ever happen:-)
Force limit only for lvcreate (and vg merge) command.
Patch also adds simple max_lv tests into testsuite
The vg->lv_count parameter now includes always number of visible
logical volumes.
Note that virtual snapshot volume (snapshotX) is never visible,
but it is stored in metadata with visible flag.
link_lv_to_vg and unlink_lv_from_vg are the only functions
for adding/removing logical volume from volume group.
Only these function should manipulate with vg->lvs list.
Later patch initializes lv->vg after the LV structure is prepared,
so pass through cmd context and do not use vg->cmd here.
Also move LV id calculation (which uses lv->vg too).
Also properly free memory pool if operation fails.