IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- allocate environment dynamically (still missing some limit?)
- try to recover, if destroy failed (do not destroy lvm here) and free memory
- check strdup() return codes
- report failure to log
- do not print NULL in exclusive lock loop
A kernel patch is on its way for 2.6.35 adding support for dm-mod module
autoload. Udev v155 and higher is able to read static node information given
in modules.devname (extracted by depmod before) and will create such nodes
at its start. The first access to such node will load the module automatically
(directly in kernel) before the actual read/write operation is processed.
Activate only the first replicator-dev LV, that activates all other
related LVs from Replicator. In case of error during this activation,
it will not retry again for other heads (less confusing error log).
length(array) is specific to GNU awk and doesn't work in mawk.
Use a return value of "split" function to indicate array size, this is
supported in both gawk and mawk.
This patch fixes the following errors during "make install" when mawk is
installed as a default awk.
mawk: scripts/relpath.awk: line 25: illegal reference to array from
mawk: scripts/relpath.awk: line 25: illegal reference to array to
mawk: scripts/relpath.awk: line 27: illegal reference to array from
mawk: scripts/relpath.awk: line 32: illegal reference to array to
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Adding function _add_partial_replicator_to_dtree() to create
partial tree for Replicator target.
Using dm_tree_node_set_presuspend_node() for Replicator.
As for _process_one_vg() we need similar retry loop for
process_each_lv_in_vg(). This patch retries to process
failed LVs with reopened VGs.
Patch does not add any extra repeated invocations if there is not
found any missing VG during LV processing.
Patch modifes behavior of _process_one_vg().
In the first pass vg_read() collectis for replicator sorted list of
additional VGs during lock_vol().
If any other VG is needed by the replicator and it is not yet opened
then next iteration loop is taken with all collected VGs.
Flag vg->cmd_missing_vgs detects missing VGs.
Introduce struct cmd_vg to store information about needed
volume group name, vgid, flags and the pointer to opened VG.
Keep VGs list in alphabetical order for locking order.
Introduce functions:
cmd_vg_add() add new cmd_vg entry.
cmd_vg_lookup() search cmd_vgs for vg_name.
cmd_vg_read() open VGs in cmd_vgs list.
cmd_vg_release() close VGs in reversed order.
Add pointer to linked list of opened VGs. List temporarily keeps
the information about needed or locked and opened VGs for replicator target.
Also add cmd_missing_vgs flag information for quick check and
also for possible continuos process_each_lv() usage where we need
to detect whether failure has been caused by missing VG or
some other reason.
Adding configure.in support for Replicators.
Adding basic lib lvm support for Replicators.
Adding flags REPLICATOR and REPLICATOR_LOG.
Adding segments SEG_REPLICATOR and SEG_REPLICATOR_DEV.
Adding basic methods for handling replicator metadata.
For deactivation of Replicator check in advance that all heads
have open_count == 0. For this presuspend_node is used as all
head nodes are linking this control node.
Introducing dm_tree_node_set_presuspend_node() for presuspending child
node (i.e. replicator control target) before deactivation of parent node
(i.e. replicator-dev target).
This patch presents no functional change to current dtree - only
replicator target currently sets presuspend node for dev nodes.
Patch adds failed_lvnames to the list of parameters for process_each_lv_in_vg().
If the list is not NULL it will be filled with LV names of failing LVs
during function execution.
Application could later reiterate only on failed LVs.
lvm2app forces applications to start with a volume group name,
open the volume group, then operate on individual pvs. In some
cases the application may want to start with a device name rather
than the volume group name. Today, if an application wants to
do this, it must iterate through all the volume groups to find
the volume group that the specific device is attached to.
These new interfaces allow the application to avoid such overhead.
Bump the lvm2app version number to 3.
Earlier patches added some infrastructure to lookup a vgname from
a pvname. We now can cleanup some of the pvchange and other code
by requiring callers that want to modify some pv property:
1) lookup the vgname by the pvname
2) use the vgname to obtain a vg handle
3) get the pv handle from the vg handle
This should work going forward and be a much cleaner interface,
as we move away from pvs as standalone objects.
Some commands start with a pvname, but we'd like to force users to
start with a vg handle to obtain a pv handle. Our best option seems
to be providing a way to look up the vgname from the pvname, and then
require them to use vg_read/vg_open.
In addition to the pvname lookup function, this patch also provides a
lookup by pvid. The lookup by pvid can be used in conjunction with
lvmcache_get_pvids to process all pvs in the system.
The pvid find function first calls lvmcache_vgname_from_pvid, which may
cause the label to be read if it is not in the cache. If the vgname is
returned is an orphan, we then check to see if there are metadata areas,
and if not, we scan every PV on the system by calling scan_vgs_for_pvs().
In most cases we should not need to do this, and by using the info->mdas
count, we avoid calling pv_read() as prior code did. So this patch is a
bit cleaner and should allow us to refactor more of the pv code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
are active mirrors or snapshots.
We don't have the mechanisms in place to change the device-mapper
tables for those targets that have behavioral differences between
cluster and single machine instances. Allowing users to change
the attribute but not changing the target's behavior can lead to
data corruption.
The following bugs are fixed/avoided by this patch:
235123 - vgchange -c [ny] do not change target types when necessary
289331 - RFE: switching from cluster domain to local domain needs to deactivate volume somehow
289541 - when changing from local to cluster, volumes can not appear to be deactivated
This should avoid various races between dmeventd on multiple nodes
in cluster where one node already repairing device and another
run full scan and locks the device.
the device cache file is dumped both in vgscan and clvmd process.
Unfortunately, clvmd calls lvmcache_label_scan,
it properly destroys persistent filter, but during
persistent_filter_dump it merges old cache content back!
This causes that change in filters is not properly propagated
into device cache after vgscan on cluster.
(Only new devices are added.)
https://bugzilla.redhat.com/show_bug.cgi?id=591861
to a file using log/file, with log/overwrite set and dump this file in
STACKTRACE. The overall effect is that only the command that ran last before
the failure has been triggered will get its debug output logged. This is
similar to how we treat coredumps.
lvconvert testing is now in its own test, t-lvconvert-mirror-basic ... it
doesn't do anything fancy but it does run lvconvert through a lot of
combinations.
I have also merged the remaining t-mirror-lvconvert tests into
t-lvconvert-mirror and abolished the former. The latter will be split again
later into more thematic divisions. (The previous split was rather arbitrary,
may I even say random...)
Use Requires.private: instead of Libs.private:
Use UDEV_PC and SELINUX_PC for Require.private:
It looks like usage of Requires.private is prefered from Libs.private.
However pkg-config documentation is really poor here. But here is
short outcome:
There is a difference in Libs.private: and Requires.private: where
we specify libselinux instead of -lselinux -lsepol.
We leave resolving of query like 'pkg-config --libs --static devmapper'
on taking proper selinux and udev libs to their .pc files instead of
hardcoding them into our .pc file which is might give incorrect answer.
- i.e. dependency of libselinux package might change and we may return
wrong list of linked libraries.
http://bugs.freedesktop.org/show_bug.cgi?id=4738http://err.no/personal/blog/tech/2008-03-25-18-07_pkg-config,_sonames_and_Requires.private
A shortcut for --ignorelockingfailure, --ignoremonitoring, --poll n options
and LVM_SUPPRESS_LOCKING_FAILURE_MESSAGES environment variable used all at
once in initialisation scripts (e.g. rc.sysinit or initrd).
Target install_dm_plugin installs files to libdir/device-mapper.
Target install_lvm2_plugin installs files to libdir/lvm2.
Both targets creates relative links to libdir to keep the code
compatible with current dlopen handling.
Once we will be able to read plugins from subdir, links
could be removed.
(with table provided).
This remove ioctl generates udev events like any other hence it needs to be
synchronized properly as well. Also, add dm task type in debug log when
setting a cookie (for better debugging).
fails, the test will carry on but will issue a warning. The harness detects
such warnings from tests and marks tests that passed with warnings with a
special status.
We can use it even in read-only environment where a try to initialise
file-based locking fails (not to mention other processing related with
lvm2 init). Simply, we want to output the version only, nothing else.
And this should always work.
is active and while it is in-active.
+for i in $(seq 0 4); do
+ for j in $(seq 0 4); do
+ for k in core disk mirrored; do
+ for l in core disk mirrored; do
The testing code still needs some improvement. I'd like to add
the ability to test specifying the PVs to be added/removed during
a convert. It will also be important to test partial PV
specification during down converts (i.e. request to remove more
mirror images than we have provided PVs for).
This rule appeared in udev v152 and it helps us to support spurious events
where we didn't have any flags set (events originated in udevadm trigger
or the watch rule). These flags are important to direct the rule application.
Now, with the help of this rule, we can regenerate old udev db content.
To implement this correctly, we need to flag all proper DM udev events with
DM_UDEV_PRIMARY_SOURCE_FLAG. That happens automatically for all ioctls
generating events originated in libdevmapper.
being able to remove more images from a mirror than the
number of PVs directly specified for removal.
The effort to fix bug 581611 corrected a bug that was unnoticed
at the time. The loop in _remove_mirror_images that looks over
the specified PVs was allowing devices that were previously
counted and moved to the end of the list to be double-counted.
This resulted in the number of devices needed for removal always
being satisfied - even if the user did not specify enough PVs
for removal to satisfy the request. When 581611 was fixed, this
double-counting no longer took place and the result was to remove
only the minimum of the number of PVs specified or the number
that was asked to be removed.
By simply always setting 'new_area_count' (as used to be done
only in the else statement), we return to the previous behavior.
Indeed, this is exactly what the double-counting was allowing
to happen before the fix of 581611.
Allow lv_remove_with_dependencies() to know the top-level LV that was
requested to be removed (otherwise it recurses and we lose context).
A merging snapshot cannot be removed directly but the associated origin
can be. Disallow removal of a merging snapshot unless the associated
origin is also being removed.
(process_each_lv_in_vg) provides it.
Also, removing this check from _lvs_single now allows displaying hidden
LVs that are specifically named on the command line.
There's no need for foreign udev rules to touch LVM reserved devices
(snapshot, pvmove, _mlog, _mimage, _vorigin) even if they happen to
be visible. The same applies for /dev/disk content - no need to create
any content for these devices (and so no need to run any "blkid" etc.).
This also prevents setting any inotify "watch" from udev rules on such
devices that is a source of race conditions (the rules need to honor
DM_UDEV_DISABLE_OTHER_RULES_FLAG for this to work though).
1) Test that the primary mirror image cannot be removed while
the mirror set is sync'ing.
2) Test that you cannot start a second mirror up-convert while
one is already in progress.
The trouble is that if the sync/conversion finishes before the
tests occur, the tests will fail by why of success where there
should have been failure. This means the sync/conversion must
happen very quickly, but this is possible because the test
mirrors we are creating are so small.
In order to decrease the likelyhood of these test failing (or
more correctly, failing to test the right thing), I've increase
the size of the mirrors. It will still be remotely possible that
the tests will fail (by way of failing to test the right thing).
If this continues to happen, more involved mechanisms will need
to be put in place. (Perhaps these will still be created, but
this change should be a remedy until that time.)
These two old-test/regex utils are usable for testing output of regex
processing at core level. For getting them usable configure need to create
Makefile. This is currently disable by default.
Reintroduce split teardown (teardown() calls teardown_devs()) because
t-topology-support.sh only needs the teardown_devs() subset of the full
teardown() between each iteration of the topology tests -- in particular
the $TESTDIR must not get removed between each topology test iteration.
prepare_loop() must return if prepare_scsi_debug_dev() already
established $LOOP.
Also fix (and simplify) the unsafe scsi-debug device discovery in
prepare_scsi_debug_dev().
Ensure we can create devices for use in tests before running the real
tests. This may in various cases, and may involve machine configuration
rather than a failure in some specific test.
For example, if a test is run with LVM_TEST_DIR=/tmp, selinux is enabled,
and the default security context is set to "<<none>>" for /tmp, all the
tests will fail, unable to create devices, since dmsetup will fail, a
result of machpathcon() returning an error code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
This version number change reflects the memory handling change
for string-based pv/vg/lv string based attributes.
In addition, when adding support for tags, I forgot to increase
the version number.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Everywhere else in the API the caller can rely on lvm2app taking care of
memory allocation and free, so make the 'name' and 'uuid' properties of a
vg/lv/pv use the vg handle to allocate memory.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
Internally, we used DM names instead of UUIDs while processing event
handlers. This caused problems while trying to vgrename a VG with active LVs
where the names are being changed and so the devices were not found then.
The patch also contains a little bit of refactoring, moving "build_dlid" code
found in dev_manager.c to "build_dm_uuid", now in lvm-string.c (so we have
build_dm_uuid and build_dm_name at one place).
lvm2app needs a link back to the vg in order to use the vg handle for
memory allocations as well as other things. This patch adds the field
to struct physical_volume, and sets pv->vg when reading a vg from disk or
extending a vg by using the helper function previously added,
add_pvl_to_vgs(). Moves and renames are handled with separate code
inside move_pv() and vgmerge(). Add pv->vg check to vg_validate().
A NULL value in pv->vg signifies membership in the orphan VG.
Note though in the case of pv_read() on a device with metadatacopies == 0,
more devices may need to be read for an authoritative answer.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Now that we have library functions to add/delete a pv from the vg->pvs
list, call them from everywhere.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a delete function to manage the vg->pvs list.
NOTE: It may be possible to do further cleanup to these add/del functions
by passing a 'pv' as input instead of 'pv_list'. The pv_list is used for
functions which do allocations (lvcreate) while other places in the code
just manage a list of 'pv' (e.g. import functions, vgextend, etc).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move the increment of vg->pv_count next to the place where we add to
vg->pvs. It looks safe to do this since the only caller of import_pool_vg()
calls import_pool_pvs() immediately afterward, and there is no way
import_pool_vg() can fail (always returns 1). However, if there's a
memory allocation failure inside import_pool_pvs(), we will end up with
a different count in vg->pv_count that with the original code. In any
case, vg->pv_count should be as close to dm_list_size(&vg->pvs) as
possible, as is the case everywhere else in the code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The dm_list * parameter is unnecessary since we are passing in 'vg'
and the only caller of import_pool_pvs() passes '&vg->pvs' in the
dm_list * parameter. Just use vg->pvs directly in the function.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
{core|disk|mirrored}. However, under the options section they are
described as {core|disk} - even though the 'mirrored' argument is
described.
s/'{core|disk}'/'{core|disk|mirrored}'/
Fix unwanted modification of $(top_builddir)/make.tmpl.
Using dependency rules to install rules for udev.
There is minor problem, with concurent usage of builddir
and srcdir could lead to missuse of 10-dm.rules which
could be found in VPATH from different builddir.
However current solution uses intermediate target so
the generated 10-dm.rules exists only for short period of time
during make install execution.
Patch is inspired by Debian's extra patch.
- removes OWNER & GROUP make vars they are parts of INSTALL command.
- adds INSTALL_PROGRAM for executable, uses $(INSTALL)
- adds INSTALL_DATA for non-executable data, uses ($INSTALL)
- adds INSTALL_WDATA for writable non-executable data, uses ($INSTALL)
- adds configure option --enable-write_install - to support
installatin of writable files used by distribution
- replaces usage of ifeq @LIB_SUFFIX@ with $(LIB_SUFFIX)
- installs .a files from static builds without executable flag
- installs .a files to $(usrlibdir) instead of $(libdir)
- installs all static binaries to $(staticdir)
- create .so links for devel package in $(usrlibdir) instead of
$(libdir)
- makes .so and .so.LIB_VERSION files within builddir
- removes VERSIONED_SHLIB and created versioned LIB_SHARED automagicaly
- install LIB_SHARED via install_lib_shared target
- install plugins via install_lib_shared_plugin target
- prints whole 'install' command during installation instead of less
informative "Installing $(something) $(somewhere)"
- install multiple man pages with one INSTALL command
- use DISTCLEAN_TARGETS instead of creating multiple distclean targets
Usage of VPATH makes troubles when used within $(builddir).
Not only source files are being found through VPATH,
but targets as well. (make --debug=v)
Thus if user builds the code in $(srcdir) and also in some $(builddir)
he gets mangled results as some generated files (i.e. .export.sym)
are 'reused' from $(srcdir) instead of $(builddir).
This patch switches to use vpath were we could explicitly name
suffixes that should be looked via vpath - we must take care,
we do not generate files with these suffixes:
.c, .in, .po, .exported_symbols
A user specifying duplicate paths on the cmdline of vgcreate will
get a message similar to the following:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Internal error: Duplicate PV id jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8 detected for /dev/loop3 in vgtest2.
This is caught by vg_validate(), but it would be good to find
this condition earlier in the vgcreate code. add_pv_to_vg()
currently checks by pvname, but does not look for duplcate pvids.
This patch adds the check for duplicate pvids and results in new
error output as follows:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Physical volume '/dev/loop5 (jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8)' listed more than once.
Unable to add physical volume '/dev/loop5' to volume group 'vgtest2'.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When moving parts of striped LVs, pvmove wouldn't care about leaving you with
two stripes on the same disk. Now --alloc anywhere is needed for that.
(Tried and gave up on two alternative approaches before the one committed here.)
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Refactor adding to the vg->pvs list and incrementing the count, which
will allow further refactoring. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Simple refactor to mov code that updates the vg extent counts from a
single pv's counts close to the code that adds a pv to vg->pvs and
updates vg->pv_count. No functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In add_pv_to_vg(), we should only add the pv to vg->pvs after all
internal checks have passed. The check for vg->extent_count exeeding
maximum was after we added the pv to the list, so this function could
return a state of vg->pvs that did not reflect other parameters such
as vg->pv_count.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
to check for presence of this module and avoid using --frames
option for genhtml in this case.
Fix arg list for AC_PATH_PROG for lcov and genhtml.
(detecting empty LCOV and GENHTML string in Makefiles).
Patch fixes generation of coverage files for dmeventd and adds support for clvmd.
Path names are stripped, so the the html looks better.
Frames 'previews' is enabled for generated pages.
Using top_srcdir was wrong here - though we still can't easily use builddir.
Requiers using shell variables before execution of binaries build outside
of srcdir.
Because we have now strong rule for lock ordering:
- VG locks must be taken in alphabetical order
- ORPHAN locks must be the last
vgs_locked() is now not needed.
This fixes problem with orphan locking, e.g.
vgremove VG1 | vgremove VG2
lock(VG1) | lock(VG2)
lock(ORPHAN) | lock(ORPHAN) -> fail, non-blocking
https://bugzilla.redhat.com/show_bug.cgi?id=578413
(More similar places in code.)
Physical segments were still allocated from global
command context mempool.
This leads to very high memory usage when
activating large VG (vgchange).
(Memory usage was about 2G when >3000LVs).
Fix it by properly using vg->vgmem private pool,
so all the memory is released early.
New memory pool parameter is needed here for pv_split_segment
function.
Also fix the same problem in some minor allocations
(vg description, lv segment split).
In addition to previous patch, we really do not need
to search for segment which was just allocated in
split request.
Make pv_split_segment function return newly allocated
(split) segment also.
(So after this patch, there is only one user
of slow find_peg_by_pe).
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
vg_validate call is an adept to optimisation, it is very
ineeficient and slow.
Anyway, we should call it only before writing data to disk.
The call in lvmcache was just temporary validation,
we realy do not need to revalidate cached metadata
every time.
(Actually, I added that there just to prove that cache works
properly and forgot to remove it.)
Patch removes it from lvmcache completely, this can hit only
internal bug in export function (and this bug must
be detected in any vg_write call anyway before).
The _read_vg uses already hash for PVs to optimise
reading of large VGs and avoiding repeated PV list traversing.
Use the same aproach to speed up parsing VG with many LVs.
If dmeventd runs with -d flag, it doesn't fork into backgroud.
The command kill(getppid(), SIGTERM) attempts to kill the parent dmeventd
process, however, if there is no parent, it kills whatever process spawned
dmeventd. In case of debugging with gdb, the parent is gdb, thus
kill(getppid(), SIGTERM) kills the debugger.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
we are running the repair manually. If we don't ignore, then dmeventd
and the manually run repair can collide. (We should still get clean
results in such a case, but it makes it harder to validate the test
results.)
Code moves initilization of stats values to _memlock_maps().
For dmeventd we need to use mlockall() - so avoid reading config value
and go with _use_mlockall code path.
Patch assumes dmeventd uses C locales!
Patch needs the call or memlock_inc_daemon() before memlock_inc()
(which is our common use case).
Some minor code cleanup patch for _un/_lock_mem_if_needed().
If the error path of _register_for_event() calls _free_thread_status()
_lib_put() call is missing.
To make thing simpler move this _lib_put() into common error path code.
As the header file <sys/mman.h> was not included in dmeventd.c
thus missed definition of MCL_CURRENT so this patch only makes
it obvious we were not locking memory here.
This patch has no functional change.
Later part of this patch set handles mlockall() via memlock_inc_daemon().
clvmd does not propagate DMEVENTD_MONITOR_IGNORE.
Update get_activation_monitoring_mode() to check if the VG that the
LV is being activated in is clustered. If so, skip it.
Any get_activation_monitoring_mode() error will cause the associated LV
(or VG) to be skipped during activation. Both vgchange_single() and
lvchange_single(), which call get_activation_monitoring_mode(), are
called by their respective process_each_..() method.
in clvmd, dmevend, man, tests.
Don't include dependency files for clow and cscope.out targets
Improve dependency tracking for dmeventd and liblvm2cmd sources.
to obtain sources. Create make.tmpl target for
simplier generation of cflow files with the help of
CFLOW_LIST, CFLOW_LIST_TARGET, CFLOW_TARGET.
Still cflow usage is not perfect.
Move daemons/ and lib/ subtargets to their Makefiles so we don't get
double cleanup error during execution of distclean target.
Instead of duplicating clean target inside distclean target,
just use it as a subtarget and avoid add duplicating code.
This check-in enables the 'mirrored' log type. It can be specified
by using the '--mirrorlog' option as follows:
#> lvcreate -m1 --mirrorlog mirrored -L 5G -n lv vg
I've also included a couple updates to the testsuite. These updates
include tests for the new log type, and some fixes to some of the
*lvconvert* tests.
clvmd's do_lock_lv() already properly controls dmeventd monitoring based
on LCK_DMEVENTD_MONITOR_MODE in lock_flags -- though one small fix was
needed for this to work: _lock_for_cluster() must treat
dmeventd_monitor_mode()'s return as a tri-state value.
Also cleanup do_lock_lv() to:
- explicitly init_dmeventd_monitor() based on LCK_DMEVENTD_MONITOR_MODE
- no longer reset init_dmeventd_monitor() to default at the end of
do_lock_lv() -- it is unnecessary
This is the next preparatory step towards better --alloc anywhere
support and is not intended to break anything that currently works so
please report any problems - segfaults, bogus data in the new debug
messages, or if the code now chooses bizarre allocation layouts.
. Add "monitoring" option to "activation" section of lvm.conf
. Have clvmd consult the lvm.conf "activation/monitoring" too.
. Introduce toollib.c:get_activation_monitoring_mode().
. Error out when both --monitor and --ignoremonitoring are provided.
. Add --monitor and --ignoremonitoring support to lvcreate. Update
lvcreate man page accordingly.
. Clarify that '--monitor' controls the start and stop of monitoring in
the {vg,lv}change man pages.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This prevents some confusion when libudev was not found so udev_sync was disabled
automatically. Configure was successful though giving only a tiny warning.
Also, if "dmsetup udevcreatecookie" is used, never return 0x000000 as a result if
udev is not running and keep the output blank.
We need to know whether we should wait for any uevent or not when
using udev_sync. A kernel patch was posted recently that changed the
way uevents are sent on dm device resume - it is sent only if the
device has been suspended before. There's also a new DM_UEVENT_GENERATED_FLAG
in the ioctl to notify userspace whether the event was generated.
If the uevent was not generated (e.g. the situation where the device is
*not* suspended and we call a resume), we just call dm_udev_complete
explicitly from within libdevmapper itself to prevent infinite waiting
while trying to synchronise with udev processing.
Prevent lvresize from being able to resize internal LVs: mirror legs
(*_mimage_*), mirror log (*_mlog), snapshot placeholder LVs (snapshot*)
and others. Resizing these would leads to unexpected metadata and
sometimes crashes (in case of growing snapshot*).
When we pv_read() a device that has an orphan vgname, we might need to scan
the system to be sure this is true. However, if the PV has mdas, there's
no way possible for it to have an orphan vgname unless it is a true orphan.
Some areas of the code were optimized to take advantage of this fact, while
others were not (we would still do the expensive scan if a device had mdas
but had an orphan VG).
This patch unifies the code so that every place we are operating on such
a PV, we skip the expensive scan if there are mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Petr Rockai <prockai@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
This option should be configurable, but for now
do not set it at all.
(lvm2app is used in udisks probers and there
cac cause several nasty races when trying to update
lvmcache during rescan.)
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
(VDSO on 32bit is VSyscall on 64bit)
It seems it could be locked on 64bit kernels running 32bit binaries,
but it makes troubles on real 32bit machines where mlock() returns
error when trying to lock such map area. (0xffffe000)
Behavior of mlockall() seems to be similar.
This patch adds a new implementation of locking function instead
of mlockall() that may lock way too much memory (>100MB).
New function instead uses mlock() system call and selectively locks
memory areas from /proc/self/maps trying to avoid locking areas
unused during lock-ed state.
Patch also adds struct cmd_context to all memlock() calls to have
access to configuration.
For backward compatibility functionality of mlockall()
is preserved with "activation/use_mlockall" flag.
As a simple check, locking and unlocking counts the amount of memory
and compares whether values are matching.
For static builds dependency for SELinux libs is not handled by 'ar'.
Till better solution is found, for static builds STATIC_LIBS is used.
Patch updates SELinux detection to use 3rd & 4th parameter for Success/Fail.
Also removes detection of pthread from this check as we know which
version of libdevmapper we are going to link with lvm after merge.
SELinux header check moved to the SELinux test code.
Create new substituted variable PTHREAD_LIBS and link this library
only with tools/libs which really needs it - i.e. dmeventd.
Check for libpthread only for builds with clvmd or dmeventd.
Remove variable LIB_PTHREAD
Modify linking of readline library. Create new substituted varible
READLINE_LIBS - readline library is linked ONLY with tools that really use
it - i.e. lvm. (Static lvm does not use readlin).
Previous behaviour put this library into the variable LIBS and thus
linked it with all created object files of lvm project (i.e. plugins...).
READLINE detection is simplified.
Termcap library is linked in only if readline library doesn't have its own
dependency (i.e. old distributions).
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
- increase timeout to 30 secs (on Chrissie request)
- source both cluster and clvmd for options (like all the other cluster
init scripts)
- add clustered_vgs and _lvs commodity fns
- move rh_status* fns at the top, so they can be reused
- heavily cleanup start and stop fns from redundant code and unnecessary
loops
- improve output from different operations
- make the init script lsb compliant
- don´t force kill of the daemon, send only a TERM signal and then wait
for it to exit
- Resolves rhbz#533247
lvm2 devices have always UUID set even if imported from lvm1 metadata.
Patch removes name argument from dev_manager_info call and converts
all activation related calls to use query by UUID.
Also it simplifies mknode call (which is the only user on mknodes parameter).
Fix add/remove tag function headers.
Fix a lot of little problems with doxygen comments.
Clarify the basic objects and their handles, and place functions with their
appropriate handles/objects.
All this cleanup moves automatic documentation of lvm2app much closer to being
useful as official documentation. In the future I will add some examples
and plan to build the examples as part of the unit tests.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add lvm2app functions to manage LV tags.
For lvm_lv_get_tags(), we return a list of tags, similar to other
functions that return lists. An empty list is returned if there
are no tags. NULL is returned if there is a problem obtaining
the list of tags.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add lvm2app functions to manage VG tags.
For lvm_vg_get_tags(), we return a list of tags, similar to other
functions that return lists. An empty list is returned if there
are no VG tags. NULL is returned if there is a problem obtaining
the list of tags.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a supporting function to copy a list of internal tags to lvm2app list.
We need to put this here because of the lvm_str_list_t type which we export
in lvm2app.h. If we didn't export this type, we could put this in the
internal library and use struct str_list.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We need to allocate memory for the tag and copy the tag value before we
add it to the list of tags. We could put this inside lvm2app since the
tools keep their memory around until vg_write/vg_commit is called, but
we put it inside the internal library to minimize code in lvm2app.
We need to copy the tag passed in by the caller to ensure the lifetime of
the memory until the {vg|lv} handle is released.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Similar refactoring to vgchange - pull out common parts and put into
library function for reuse. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Pull out common code to be called from tools as well as lvm2app.
Leave archive() at tool level so we can use from vgcreate
as well as vgchange. Should be no functional change.
- add stack macro in vgchange
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a merging snapshot to the deptree, using the "error" target, rather
than avoid adding it entirely. This allows proper cleanup of the -cow
device without having to rename the -cow to use the origin's name as a
prefix.
Move the preloading of the origin LV, after a merge, from
lv_remove_single() to vg_remove_snapshot(). Having vg_remove_snapshot()
preload the origin allows the -cow device to be released so that it can
be removed via deactivate_lv(). lv_remove_single()'s deactivate_lv()
reliably removes the -cow device because the associated snapshot LV,
that is to be removed when a snapshot-merge completes, is always added
to the deptree (and kernel -- via "error" target).
Now when the snapshot LV is removed both the -cow and -real devices
get removed using uuid rather than device name. This paves the way
for us to switch over to info-by-uuid queries.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There's a tiny period of time when the _mimage device is visible during
downconversion from mirror to linear. Since it is visible, we need to
create the symlinks, otherwise warning messages will be issued about udev
not creating those symlinks. We have to rely on udev flags completely.
- add DM_UDEV_DISABLE_LIBRARY_FALLBACK udev flag to rely on udev only
- export dm_udev_create_cookie function to create new cookies on demand
- add --udevcookie, udevcreatecookie and udevreleasecookie for dmsetup
(to support "udev transactions" where one cookie value can be used for
several dmsetup calls)
- don't use DM_UDEV_DISABLE_CHECKING env. var. anymore and set the state
automatically (based on udev and libdevmapper dev path comparison)
Internally we store sizes in sectors, but lvm2app exports sizes
in bytes. We could get fancier and allow units configuration but
this fix should do for now.
Fixes rhbz561422.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We unfortunately don't yet _know_, in dev_manager_snapshot_percent(), if
a snapshot-merge target is active (activation is deferred if dev is
open); so we can't short-circuit origin devices based purely on existing
LVM LV attributes.
Set 'fail_if_percent_unsupported' in dev_manager_snapshot_percent() for
a merging origin LV, otherwise passing unsupported LV types to _percent
will lead to a default successful return with percent_range as
PERCENT_100.
For a merging origin, PERCENT_100 will result in a polldaemon that runs
infinitely (because completion is PERCENT_0).
When activating a merging origin it is valid, and expected, to not have
a node in the deptree for both the origin and its merging snapshot. The
_cached_info() caller is only concerned with whether a device is open.
If there isn't a node in the tree the associated device is definitely
not open.
This change was deferred to help ease the review of previous refactoring
related to using process_each_lv() for lvconvert's merge support. Not
that doing so _really_ helped but...
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Switch lvconvert's --merge code over to using process_each_lv(). Doing
so adds support for a single 'lvconvert --merge' to start merging
multiple LVs (which includes @tag expansion).
Add 'lvconvert --merge @tag' testing to test/t-snapshot-merge.sh
Adjust man/lvconvert.8.in to reflect these expanded capabilities.
The lvconvert.c implementation requires rereading the VG each iteration
of process_each_lv(). Otherwise a stale VG instance associated with
the LV passed to lvconvert_single_merge() would result in stale VG
metadata being written back out to disk. This overwrote new metadata
that was written when a previous snapshot LV finished merging (via
lvconvert_poll). This is only an issue when merging multiple LVs that
share the same VG (a single VG is typical for most LVM configurations on
system disks).
In the end this new support is very useful for performing a "system
rollback" that requires multiple snapshot LVs be merged to their
respective origin LV.
The yum-utils 'fs-snapshot' plugin tags all snapshot LVs that it creates
with a common 'snapshot_tag' that is unique to the yum transaction.
Rolling back a yum transaction, that created LVM snapshots with the tag
'yum_20100129133223', is as simple as:
lvconvert --merge @yum_20100129133223
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
refactoring.
Document the need to cleanup the "name" args passed around polldaemon,
lvconvert and pvmove. It is quite a mess.
Annotate the unused nature of the existing poll_fns->get_copy_vg
methods' 'uuid' arg.
depending on if the mirror has a 'core' or 'disk' log. When there
is a disk log, the new leg is added by stacking a new mirror on
top of the old (one leg is the old mirror and the other leg is the newly
added device). When the log is a 'core' log, the new leg is simply added
to the existing mirror and all the devices are re-synced.
The logic that handles collapsing the stacked 'disk' log mirror was
having the effect of causing 'core' logged mirrors to begin resync'ing
for a second time. I have used the 'CONVERTING' flag to indicate that
a mirror is converting by way of stacking. This is no longer set for
up-converting core logs. The final 'collapse' logic can safely be skipped
for 'core' log mirrors - getting rid of the second resync.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
Allow the number of logical extents to be expressed (for a snapshot) as
a percentage of the total space in the Origin Logical Volume with the
suffix %ORIGIN.
Update the relevant man pages accordingly. Eliminate inconsistencies
between the man pages and tools/commands.h
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
*_safe. This had the effect of segfaulting the log daemon when
converting a mirror from one log type to another.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
When activation of pvmove mirror fails on cluster, some nodes
still possibly succeeded in activation.
- Explicitly deactivate that mirror to be sure
- properly pair suspend/resume calls to not cause memory lock problems in clvmd
Code cannot simply call _finish_pvmove on cluster in this situation, because
changed LVs are suspended twice (causing memory inbalance) and also temporary
mirror is activated when it is not expected (and we know that it failed already).
Patch prepares special function which remove temporary mirror references from
metadata and then resumes changed LVs.
In dev_manager_info 0 means error and 1 info is returned,
not that device exists (that value is part of info struct).
Fix query by uuid only (no name) which returns 0 when device
does not exist.
Support "wait before testing" using '+' in pvmove and lvconvert
interval. Doing so overrides the new default of sleeping after checking
the LV's progress.
Sleeping before checking progress can lead to extraneous polldaemons
being left running. These polldaemons would have otherwise exited had
they checked before sleeping. Checking progress before sleeping helps
workaround the subtly unreliable nature of "finished" state checking
in _percent_run.
Update test/t-mirror-names.sh to use '+' when providing its lvconvert
interval.
more descriptive message if locking fails instead of
"Locking type -1 initialisation failed."
Use read-only locking instead of misleading ignorelocking option
in message.
All this seems to do is provide a memory leak so remove it.
The only caller of _alloc_pv() later explicitly sets
pv->vg_name = fmt->orphan_vg_name so clearly this allocation
should be removed. I also saw no where in the code where
strncpy was used to assign pv->vg_name - only direct assignments
and strdup's.
DSO is currently not dl_close-ing pluing during it is unregister handling,
so clear structure and related counter, so there are no memory problems.
Futher fixes are needed.
merge completes. This narrows the scope of this "hack" (which still
needs a proper fix within the deptree).
This stops dmeventd from trying to access snapshot devices that were
already removed.
LVM2 test (rather than using the traditional loop device).
prepare_scsi_debug_dev currently assumes exclussive access to the
scsi_debug module. Any script that tries to use prepare_scsi_debug_dev
when scsi_debug is unavailable or already loaded into the kernel will be
skipped.
t-topology-support.sh shows how prepare_scsi_debug_dev function can be
used repeatedly (within a script) to test LVM2 ontop of a ramdisk-based
SCSI device w/ arbitrary scsi_debug features.
isn't already available (in /proc/mdstat).
switch to requiring 2.6.33 for the alignment_offset tests; 2.6.{31,32}
alignment_offset values aren't reliable. 2.6.33 _should_ have mkp's
alignment_offset fixes but so far it doesn't (as of 2.6.33-rc4).
For mirror repair (and similar tasks) it can happen that full
device rescan is issued from clvmd.
Because code can be in the middle of repair (calling suspend)
clvmd should never try to scan suspended devices
(otherwise it causes deadlock).
Also code must not change ignore_suspended_device flag when
doing refresh_filters (called from lvmcache scan code).
'const'. Be consistent with its use (and dev_manager_snapshot_percent()).
Pass 'lv' from dev_manager_snapshot_percent() to _percent() to
_percent_run(). _percent_run() always dereferenced 'lv' (when
initializing segh) even though it may have been NULL (as was the case
until now for dev_manager_snapshot_percent()).
If a "snapshot-origin" LV (snapshot-merge whose merge was deferred
becuase it was open) was passed to _percent_run() it would always return
100%.
Update _percent_run() to NOT return PERCENT_100 et. al. if
->target_percent() wasn't ever called and supplied 'lv' is a merging
origin. A default return of 100% does not work for snapshot-merge.
Also tweak a related lvconvert log_error() to include "Aborting merge."
bitmap tracking was switched from the e2fsprogs implementation to
the device-mapper implementation (dm_bitset_t). The latter has a
leading uin32_t field designed to hold the number of bits that are
being tracked. The code was not properly handling this change in
all places. Specifically, when getting the bitmap to/from disk.
Endian adjustments will likely need to be made on the accounting
field as well, since bitmaps are passed between machines on
start-up.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
that were necessary to be passed on to userspace.
The cluster mirror table (log portion only) used to look like this:
clustered-disk <parm_count> <disk> <region_size> <uuid> \
[[no]sync] [block_on_error]
Now it looks like this:
userspace <parm_count> <uuid> clustered-disk <disk> <region_size> \
[[no]sync]
So, there is one extra argument in the latter case - this was
unaccounted for.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Eliminate 'merging_snapshot' from 'struct logical_volume' and just use
'snapshot' for origin lv's reference to the merging snapshot; also set
MERGING in the origin lv's status.
If either the origin or snapshot that is to be merged is open the merge
will not start; only the merge metadata will be written. The merge will
start on the next activation of the origin (or via lvchange --refresh)
IFF both the origin and snapshot are closed.
Merge on activate is particularly important if we want to merge over a
mounted filesystem that cannot be unmounted (until next boot) --- for
example root.
snapshots are suspended, new origin is created, snapshots are resumed, new
origin is resumed. So it allocates memory while suspended.
To fix it, move vg_commit after suspend_lv, so that the suspend code will
treat it as precommitted vg and will preload new origin prior to suspend.
NOTE: agk doesn't like this "hack"; need to revisit and fix
This is useful for when the snapshot is still active and merging hasn't
started yet; it shows a merge is pending. Once merging starts the
merging snapshot will be hidden but can still be displayed with 'lvs -a'
Report snapshot origin with merging snapshot as 'O' instead of 'o':
Before merge starts this shows that a merge is pending. While merging
the snapshot will be hidden, 'O' enables a user to see that there is a
snapshot merging.
"snapshot-merge" target based on whether the LV is a merging snapshot.
When activating a snapshot-merge target do not attempt to monitor the
LV for events; the polldaemon will monitor the snapshot as it is
merged.
Allow "snapshot-merge" target's usage to be parsed via standard
"snapshot" methods.
NOTE: follow on fixes to the _percent_run change are still needed
Introduces new libdevmapper function dm_tree_node_add_snapshot_merge_target
Verifies that the kernel (dm-snapshot) provides the 'snapshot-merge'
target.
Activate origin LV as snapshot-merge target. Using snapshot-origin
target would be pointless because the origin contains volatile data
while a merge is in progress.
Because snapshot-merge target is activated in place of the
snapshot-origin target it must be resumed after all other snapshots
(just like snapshot-origin does) --- otherwise small window for data
corruption would exist.
Ideally the merging snapshot would not be activated at all but if it is
to be activated (because snapshot was already active) it _must_ be done
after the snapshot-merge. This insures that DM's snapshot-merge target
will perform exception handover in the proper order (new->resume before
old->resume). DM's snapshot-merge does support handover if the reverse
sequence is used (old->resume before new->resume) but DM will fail to
resume the old snapshot; leaving it suspended.
To insure the proper activation sequence dm_tree_activate_children() was
updated to accommodate an additional 'activation_priority' level. All
regular snapshots are 0, snapshot-merge is 1, and merging snapshot is 2.
Make 'merging_snapshot' pointer that points from the origin to the
segment that represents the merging snapshot.
Import/export 'merging_store' metadata.
Do not allow creating snapshots while another snapshot is merging.
Snapshot created in this state would certainly contain invalid data.
NOTE: patches at the end of this series will remove 'merging_snapshot'
and will introduce helpful wrappers and cleanups.
This spurious 'break' has been here since this code was first committed
in June 2005 and stopped the algorithm behaving as described in the
comment above it and rendered the variable 'already_found_one' useless.
1. Found bug in 'redundant log' implementation that caused
problems when converting a linear that spanned multiple
devices to a mirror (wasn't checking for NULL value of
provided parameter in _alloc_parallel_area)
2. Testsuite was failing to perform tests when 'not' modifier
was used. This allowed a couple issues to slip through.
Added a 'not_sh' modifier that negates tests performed by
functions defined in the shell source file.
3. Was initializing a variable to far down, which cause
previously set value to be overridden. (This was the
result of the collision of the "redundant log" and
lvconvert fix patches.)
successfully created it must _exit() once it completes.
Update _become_daemon() to differentiate between a failed fork() and a
successful fork().
Added lvm_return_code() to lvmcmdline.[ch]
Upon successful fork(), _become_daemon() must assert that the locks that
are currently held belong to the parent, not the child. All of the
child's internal state saying 'this process holds a lock' has to be
reset.
A proper lvmcache_locking_reset() should follow later.
date: 2010/01/07 20:42:55; author: jbrassow; state: Exp; lines: +11 -0
The patch fixes some lvconvert issues (WRT mirror <-> mirror).
1) 'exisiting_mirrors' and 'lp->mirrors' where taken to be in 'n-1'
notation (i.e a 2-way mirror is '1' and a linear is '0'), but the
variables were in 'n' notation.
2) After adding the redundant mirror log support, I was calculating
log_count by looking at the mirror log LV, but didn't take into
account the fact that there could be no mirror log!
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Sometimes it is really needed to switch off udev checking and the warnings we show when
we detect that udev has not done its job right - the messages like "Udev should have done
this and that. Falling back to direct node creation/removal. " etc.
This would be especially handy while setting DM_DEV_DIR env var that could be set to a
different location than standard /dev (udev can't create nodes/symlinks out of that one
directory that is configured into udevd). The exact same situation happens while we're
running our tests.
It is pretty much the same as reducing the number of
mirror legs, but we just don't delete them afterwards.
The following command line interface is enforced:
prompt> lvconvert --splitmirror <n> -n <name> <VG>/<LV>
where 'n' is the number of images to split off, and
where 'name' is the name of the newly split off logical volume.
If more than one leg is split off, a new mirror will be the
result. The newly split off mirror will have a 'core' log.
Example:
[root@bp-01 LVM2]# !lvs
lvs -a -o name,copy_percent,devices
LV Copy% Devices
lv 100.00 lv_mimage_0(0),lv_mimage_1(0),lv_mimage_2(0),lv_mimage_3(0)
[lv_mimage_0] /dev/sdb1(0)
[lv_mimage_1] /dev/sdc1(0)
[lv_mimage_2] /dev/sdd1(0)
[lv_mimage_3] /dev/sde1(0)
[lv_mlog] /dev/sdi1(0)
[root@bp-01 LVM2]# lvconvert --splitmirrors 2 --name split vg/lv /dev/sd[ce]1
Logical volume lv converted.
[root@bp-01 LVM2]# !lvs
lvs -a -o name,copy_percent,devices
LV Copy% Devices
lv 100.00 lv_mimage_0(0),lv_mimage_2(0)
[lv_mimage_0] /dev/sdb1(0)
[lv_mimage_2] /dev/sdd1(0)
[lv_mlog] /dev/sdi1(0)
split 100.00 split_mimage_0(0),split_mimage_1(0)
[split_mimage_0] /dev/sde1(0)
[split_mimage_1] /dev/sdc1(0)
It can be seen that '--splitmirror <n>' is exactly the same
as '--mirrors -<n>' (note the minus sign), except there is the
additional notion to keep the image being detached from the
mirror instead of just throwing it away.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Made .update_metadata optional in 'struct poll_functions' definitions;
eliminated _update_lvconvert_mirror() stub.
Tweak a mirror-specific error message in the generic polldaemon code.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The default log option for a mirror is 'disk'. If the log
type is not explicitly stated on the command line when
converting from an X-way mirror to a Y-way mirror, 'disk'
is chosen. So, if you have a 'core' log mirror and you
convert, your result will contain a 'disk' log.
This patch remembers what the old log type was. If the
user is merely trying to switch the number of mirror
images, the log type is now kept the same.
There is one historical behaviour I left in place...
If you have a 2-way, core-log mirror and you use lvconvert to
specify you want a 2-way mirror - without specifying the
log type - you will get a 2-way, disk-log mirror.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Informal-IRC-ACK-by: agk
The logic was that lvconvert repair volumes, marking
PV as MISSING and following vgreduce --removemissing
removes these missing devices.
Previously dmeventd mirror DSO removed all LV and PV
from VG by simply relying on
vgreduce --removemissing --force.
Now, there are two subsequent calls:
lvconvert --repair --use-policies
vgreduce --removemissing
So the VG is locked twice, opening space for all races
between other running lvm processes. If the PV reappears
with old metadata on it (so the winner performs autorepair,
if locking VG for update) the situation is even worse.
Patch simply adds removemissing PV functionality into
lvconcert BUT ONLY if running with --repair and --use-policies
and removing only these empty missing PVs which are
involved in repair.
(This combination is expected to run only from dmeventd.)
Version >= 1.8.0 of the DM snapshot target appends metadata sectors used
to a snapshot's status. This patch allows LVM2 to accurately determine
if the snapshot store is empty. Knowing when a snapshot store is empty
is important in the context of snapshot-merge (means merge is complete).
Also update LVM2 to be aware of the possibility for "Merge failed" in
the snapshot-merge target's status.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
dm_tree_activate_children() callers.
Otherwise resume_lv and its variants can fail silently.
Catching these failures is especially important now that dm targets like
crypt and snapshot-merge can fail in .preresume
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
the background polldaemon is allowed to start. It can be used
standalone or in conjunction with --refresh or --available y.
Control over when the background polldaemon starts will be particularly
important for snapshot-merge of a root filesystem.
Dracut will be updated to activate all LVs with: --poll n
The lvm2-monitor initscript will start polling with: --poll y
NOTE: Because we currently have no way of knowing if a background
polldaemon is active for a given LV the following limitations exist and
have been deemed acceptable:
1) it is not possible to stop an active polldaemon; so the lvm2-monitor
initscript doesn't stop running polldaemon(s)
2) redundant polldaemon instances will be started for all specified LVs
if vgchange or lvchange are repeatedly used with '--poll y'
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch tries to correctly track changes in lvmcache related to commit/revert.
For vg_commit: if there is cached precommitted metadata, after successfull commit
these metadata must be tracked as committed.
For vg_revert: remote nodes must drop precommitted metadata and its flag in lvmcache.
(N.B. Patch do not touch LV locks here in any way.)
All this machinery is needed to properly solve remote node cache invalidaton which
cause several problems recently observed.
Lock mode is int masked by LCK_TYPE_MASK, always.
Patch also remove uneccessary masking lock flag on sender side,
if masking is needed, it is don on client side already.
- Add drop_precommitted flag to force drop precommitted metadata
- add lvmcache_commit_metadata() which upgrades precommitted metadata in cache
No functional change in this patch - just preparation for following change.
And decode flags in humar readable form in client.
And clean some trailing whitespaces.
No functional change in this patch (only debugging messages changed).
The use_precommitted flag indicates, that we want to use precommitted metadata
(used in suspend call to preload table with precommitted data).
But if there are no such data, committed metadata are read but the cache
still contains that precommitted flag.
(The problem is that later possible drop_metadata call will not invalidate
device in cache.)
The wrong precommitted state is stored in on remote nodes during normal
suspend/resume cycle _without_ vg_write/commit.
Use the PRECOMMITTED status flag here instead (which is always set if using
precommited metadata here).
If renaming snapshot with virtual origin, the origin is renamed too.
But the code must resume LVs in reverse order to properly
pair memlock (in cluster locking).
(The resume of snapshot resumes origin too and later resume
is ignored otherwise.)
(So the tests can run under cluster locking and do not require
cluster mirror or snapshots.)
Add vgscan before block device readahead change
(flush long running process - clvmd - dev cache.)
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
The memlock_inc() fix is wrong, memlock count is not
propagated to long living process (clvmd) and just
it underflow there.
Also suspend is needed to pre-load precommited metadata
on other nodes (remapping to error taget in this case).
With explicit suspend we generate lock request and code
can update memlock count.
(Infinitely "locked" memory caused that fs_unlock() was not
called properly and on cluster nodes remains
old links in /dev/mapper for not active devices.)
(N.B. failing of suspend call here is not handled as fatal
error - the LV is going to be removed later anyway.)
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).
The LV locks make sense only for clustered LVs.
Properly check cluster flag and never issue cluster lock here.
There are several places in code, where it is already checked, this
patch add this check to all needed calls.
In previous code the lock behaviour was inconsistent,
for example, the pre/post callback can take lock even for local volume,
but deactivate call do not released this lock and it remains held forever.
The local LV lock request now just let run the underlying activation code
on local node, the same process like in local locking.
(Again, this is important for new mirror repair calls, here for local
mirrors but with cluster locking enabled.)
This is unnoticed regression from commit 31672ff60e
The pre/post callback need to convert lock always, local node
is going to modify metadata in this case, it it fails conversion,
the call is ignored.
Also it fixes bug when the lock is not yet held, we cannot set LKF_CONVERT
in this case, it will fail because this lock do not exist.
Note that the automatic conversion is still disabled in activate
call, so the original fix (reactivation of exlusive LV) should
be still in place.
(Code already not fail if unlocking not locked resource.)
This is needed in pre/post lock_lv call, where we can
request the same lock on local node becuase of suspend call.
- do_command and lock_vg expect flags (no change here)
Bug fixes:
- lock_vg should check for NONBLOCK on lock_cmd, flags have this bit masked-out
- do_pre/post_command expect do not mask flag at all, this causes that
the code inside is never run! (see following patches, these functions
expect plain command without flags)
Patch should not cause any problems, only real change is
removing LCK_LOCAL bit from lock type flag, it is never used there.
(LCK_LOCAL is part arg[1] bits anyway.)
If there is problem deactivate LV and
_init_mirror_log is called with remove_on_failure = 1,
remove the newly created log LV from metadata.
(This can happen if there is active device with the same name
but different UUID.)
The main reason for this "workaround" patch is to
- do not keep _mlog volume in metadata, so user can repeat the action
- print better error message describing the real problem
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
/dev/vg_bar/lv1_mlog: not found: device not cleared
Aborting. Failed to wipe mirror log.
Error locking on node bar-01: Input/output error
Unable to deactivate mirror log LV. Manual intervention required.
Failed to create mirror log.
# lvcreate -m 2 -n lv1 -l 1 --nosync vg_bar
WARNING: New mirror won't be synchronised. Don't read what you didn't write!
Aborting. Unable to deactivate mirror log.
Failed to initialise mirror log.
I see "Deactivated" message when I activate and "Activated" message when
I deactivate. The code uses "activate" as boolean but it can be any one
of the enum values from CHANGE_AY, CHANGE_AN, CHANGE_AE, etc.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Acked-by: Dave Wysochanski <dwysocha@redhat.com>
There's a new change udev event generated since kernel 2.6.32 that
notifies userspace about a change in read-only attribute for block
devices (with DISK_RO=1 environment variable set).
We need to detect this and disable the rule application so the
meaning of this change event is not interchanged with the regular
change event used while resuming/renaming DM devices.
If there's anybody awaiting this notification in foreign rules,
he can still check for this env var and do the appropriate actions
separately.
At this point they probably do not matter but going forward they
may - depends on future patches for replicator, etc. I think
these probably got missed because they were 'flags' so I changed
the name to 'status' to be consistent. So the on-disk
things 'flags' and the in structure 'status' (bits).
NOTE: WHATS_NEW already has entry for this in current release.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
pvmove suspends all moved LVs + pvmoveX mirrored LV itself.
This suspends even underlying pvmoveX and following explicit
suspend call is just noop.
But in resume the pvmoveX volume is no longer underlying
device for moved LVs, so it performs full resume with memlock
decrease.
Code must call memlock_inc() if suspend is requested, volume
is already suspended and error is not requested.
These are no longer used by anyone. The dm_list defines are all in
libdevmapper.h and libdm/datastruct/list.c contains any function definitions.
There is some code in "old-tests" that still use this but this code is not
being maintained.
Thanks to Zdenek for spotting this.
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
If the vg_read() returned error, no lock was taken,
so always call vg_release().
Otherwise this can happen because of missing FAILED_*:
# vgchange -a y x --ignorelockingfailure
Volume group "x" not found
Internal error: Attempt to unlock unlocked VG x
The sysfs filter initialise hash of available devices using
scan of /sys/block. We need to refresh even this hash
when performing full scan otherwise the newly appeared
device could be rejected, because there is no entry
in sysfs filter.
This easily could happen when attaching new device
to cluster node. (Only force refresh of context
in clvmd -R works here now).
Unfortunately consequences of this are much worse,
missing device part on that node is replaced with missing segment
(even when no partial arg is selected) and this directly
lead to data corruption.
See https://bugzilla.redhat.com/show_bug.cgi?id=538515
Simply fix it by refreshing device filters in lvmcache
before performing the full device scan.
(on one node a storage connection failed):
# vgchange -a y vg_bar ; echo $?
Error locking on node bar-02: Refusing activation of partial LV lv1. Use --partial to override.
1 logical volume(s) in volume group "vg_bar" now active
0
So activation fails on one node, error is correctly printed but
status code is wrong.
This patch fixes the top level (vgchange) to return proper code
(and print # of activated LVs).
(lvchange returns error properly here.)
(This affects only cluster locking because only cluster
locking module set LCK_PRE_MEMLOCK.)
With currect code you get
# vgchange -a n
Internal error: _memlock_count has dropped below 0.
when using cluster locking.
It is caused by _unlock_memory calls here
if ((flags & (LCK_SCOPE_MASK | LCK_TYPE_MASK)) == LCK_LV_RESUME)
memlock_dec();
Unfortunately it is also (wrongly) called in immediate unlock
(when LCK_HOLD is not set) from lock_vol
(LCK_UNLOCK is misinterpreted as LCK_LV_RESUME).
Avoid this by comparing original flags and provide memlock
code type of operation (suspend/resume).
All hidden (not visible) volumes should be activated through
other visible volumes.
(There are already exceptions like snapshot, mirror log and image,
which should be cleaned one day...)
This solves problems for future types of hidden volumes,
which can have special meaning and must not be activated implicitly
(e.g. key store volume).
This provides better support for environments where udev rules are installed
but udev_sync is not compiled in (however, using udev_sync is highly
recommended). It also provides consistent and expected functionality even
when '--noudevsync' option is used.
There is still requirement for kernel >= 2.6.31 for the flags to work though
(it uses DM cookies to pass the flags into the kernel and set them in udev
event environment that we can read in udev rules).
'last_rule' option has been removed from udev (version >= 147).
From now on, we require foreign rules to check and honor
ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG} instead. Foreign
rules should be skipped totally when this flag is set.
- fix missing unlocking of VG
lvcreate -l 100%PVS -n lv1 vg_test
Please specify physical volume(s) with %PVS
Internal error: Volume Group vg_test was not unlocked
- if no PVS specified, use all available
Fix segfault if %PVS in lvresize without PVs list.
indented metadata lines.
Macro outnl() is using exported out_newline() instead of direct
call f->fn(), that required the visibility of the internal
struct formatter.
Rename fill_default_pvcreate_params to pvcreate_params_set_defaults.
Rename pvcreate_validate_restore_params to pvcreate_restore_params_validate.
Rename pvcreate_validate_params to pvcreate_params_validate.
- add copyright notice for 10-dm.rules.in,
- set DM_UDEV_DISABLE_{DISK, OTHER}_RULES_FLAG in 11-dm-lvm.rules directly
for inappropriate and non-top-level subdevices in case we use older kernels
where DM_COOKIE is not used (and therefore there are no flags passed from
the LVM process itself). This applies for older kernels (version < 2.6.31),
- remove unnecessary filters in 95-dm-notify.rules - the DM_COOKIE env var
itself is set for change/remove udev events and for DM devices only so
there's no need to double-check this.
Similar to other vg_set_* functions, we create a vg_set_clustered() function
which does a few checks and sets a flag. This is where we check for
any limitations of clusters.
The DRBD uses underlying device so code should prefer top
device if duplicate is found.
Patch also introduce
dev_subsystem_part_major and dev_subsytem_name
functions to easily handle all these replication susbystems
and not hardcode md_major call.
See https://bugzilla.redhat.com/show_bug.cgi?id=530881
for full problem description.
Option --all is only partially documented currently, so document in all
commands. Also make {pv|vg|lv}{display|s} man pages consistent with help
output. Remove ununsed 'disk_ARG' parameter. Leave --trustcache out of
the man page output. Update --units argument to show all possible units.
- we have these levels when the udev rules are processed:
10-dm.rules --> [11-dm-<subsystem>.rules] --> [12-dm-permissions.rules] -->
13-dm-disk.rules --> [...all the other foreign rules...] --> 95-dm-notify.rules
- each level can be disabled now by
DM_UDEV_DISABLE_{DM, SUBSYSTEM, DISK, OTHER}_RULES_FLAG
- add DM_UDEV_DISABLE_DM_RULES_FLAG to disable 10-dm.rules
- add DM_UDEV_DISABLE_OTHER_RULES_FLAG to disable all the other (non-dm) rules.
We cutoff these rules by using the 'last_rule', so this one should really be
used with great care and in well-founded situations. We use this for lvm's
hidden and layer devices now.
- add a parameter for add_dev_node, rm_dev_node and rename_dev_node so it's
possible to switch on/off udev checks
- use DM_UDEV_DISABLE_DM_RULES_FLAG and DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG
if there's no cookie set and we have resume, remove and rename ioctl.
This could happen when someone uses the libdevmapper that is compiled with
udev_sync but the software does not make use of it. This way we can switch
off the rules and fallback to libdevmapper node creation so there's no
udev/libdevmapper race.
- remove default permissions set in 95-dm-notify.rules (and add a hint in 12-dm-permissions.rules to set it by the user directly)
- add multipath DM_ACTION=="PATH_FAILED" filter
- remove unnecessary filters in the headers of the rules (we can simply use DM_UDEV_RULES_VSN instead)
- fix symlink priorities in /dev/disk/ (snapshot volumes have low priority for FS UUID symlinks so it will not overwrite symlinks for the origin)
[root@xxxx-01 ~]# lvconvert -m 1 --corelog VG/cmirror
Unable to convert the log of inactive cluster mirror cmirror
I've tried to clean-up the message a little more, so the name
of the mirror stands out more while preserving the sense that
it's not a problem with the specific device, but the fact that
it is inactive that is causing the problem.
New msg:
Unable to convert the log of an inactive cluster mirror, cmirror
Per discussion on lvm-devel mailing list and part of debian patch set,
don't set defaults for owner and group, since nobody seems to use them, and
still allow override.
Going forward, we would like to allow users to specify the total
number of metadatacopies in a VG rather than on a per-PV basis. In
order to facilitate that, introduce --pvmetadatacopes to replace
--metadatacopies everywhere. We still allow --metadatacopies for
pv commands, but require --pvmetadatacopies for vg commands.
Eventually we will introduce --vgmetadatacopies. Once we do that,
we should either deprecate --metadatacopies or make it a synonym based
on the command (pvmetadatacopies for pv commands, and vgmetadatacopies
for vg commands). The latter option would likely just require a simple
'strncpy' check against cmd->command->name to qualify the merge_synonym
call.
Update nightly tests to cover the pvmetadatacopies synonym.
Note that this patch is the result of an eariler review comment for
the implicit pvcreate patches. Should apply cleanly on top of the
implicit pvcreate patches (I applied after patch 10/10 in that series).
NOTE: This patch will require --pvmetadatacopies for vgconvert as
--metadatacopies is no longer accepted.
Going forward, we would like to allow users to specify the total
number of metadatacopies in a VG rather than on a per-PV basis. In
order to facilitate that, introduce --pvmetadatacopes to replace
--metadatacopies everywhere. We still allow --metadatacopies for
pv commands, but require --pvmetadatacopies for vg commands.
Eventually we will introduce --vgmetadatacopies. Once we do that,
we should either deprecate --metadatacopies or make it a synonym based
on the command (pvmetadatacopies for pv commands, and vgmetadatacopies
for vg commands). The latter option would likely just require a simple
'strncpy' check against cmd->command->name to qualify the merge_synonym
call.
Update nightly tests to cover the pvmetadatacopies synonym.
Note that this patch is the result of an eariler review comment for
the implicit pvcreate patches. Should apply cleanly on top of the
implicit pvcreate patches (I applied after patch 10/10 in that series).
NOTE: This patch will require --pvmetadatacopies for vgconvert as
--metadatacopies is no longer accepted.
vgcreate and vgextend now implicitly call pvcreate on devices not
currently initialized for LVM use. Previously these commands would
fail with an error, so clarify the new behavior in the man page.
Adds implicit pvcreate support when calling vgcreate or vgextend with
device paths that are not yet PVs. This changes the behavior of vgcreate
and vgextend from failing with an error message to implicitly pvcreating.
Split pvcreate_validate_params into recovery and non-recovery parameters.
This is necessary so we can call the non-recovery validate function from
vgextend / vgcreate. Note in the pvcreate tool case, we must call the
recovery validation function first (see treatment of pe_start and --zero),
and that we add a call to fill_default_pvcreate_params before the validation
functions.
We need defaults for pvcreate_params at a higher level - this will
allow us to use a common function from the tools to take defaults,
then fill in any non-defaults from the commandline.
Future patches will refactor vgcreate/vgextend to call this function
if one or more pvcreate parameters are given on the commandline.
Another refactoring for implicit pvcreate support. We need to get
the pvcreate parameters somehow to the vg_extend routine. Options
seemed to be:
1. Attach the parameters to struct volume_group. I personally
did not like this idea in most cases, though one could make an
agrument why it might be ok at least for some of the parameters
(e.g. metadatacopies).
2. Pass them in to the extend routine. This second route seemed
to be the best approach given the constraints.
Future patches will parse the command line and fill in the actual
values for the pvcreate_single call.
Should be no functional change.
Should be no functional change. If this parameter is set to NULL, just fail
the extend if the device is not already a PV. If non-NULL, try pvcreate_single
before failing. Note that pvcreate_single() handles the log_error in case
of failure so we just return 0 if pvcreate_single() fails.
is granted at one mode and an attempt to convert it wthout the LCK_CONVERT
flag set then it will return errno=EBUSY.
This fixes a pretty bad bug in which an LV could be activated exclusively on
one node and lvchange -ay on another would convert it to shared!
It might break some things in other areas, but I doubt it.
Now uppercase letters imply Si units, so use lowercase everywhere.
We could stay with uppercase, but then we'd have to deal with rounding, etc.
Also, some output / error messages change slightly (instead of "GB" we're
now saying "GiB").
One test enhancement might be to add some new tests for the units changes
but for now let's just get the test back to passing.
lv_deactivate now returns always success, because tree deactivation
functions (see dm_tree_deactivate_children) always returns success.
Because code should return failure in lv_deactivate at least,
fix it by checking for device existence after real deactivation call.
(After discussion this was prefered solution to dm tree function rewrite
which affects snapshots and mirrors.)
Add configure --enable-units-compat to set si_unit_consistency off by default.
Use standard output units for 'PE Size' and 'Stripe size' in pv/lvdisplay.
confuses me, so I've added a comment at the top of the function to
remind me of this.
I also found that 'mirror_emit_segment_line' was returning 0 (return_0)
on failure /and/ success. It now returns 1 for success and 0 for failure -
just like '_emit_areas_line' and its calling function, '_emit_segment_line'.
Clean up VG_RESIZEABLE flag by creating vg_is_resizeable().
Update comment - we no longer have ALLOW_RESIZEABLE.
Also use vg_is_exported() in one place missed by earlier patch.
Should be no functional change.
Remove the checks for vg_read_error() in most of the tools callback
functions and instead make the check in _process_one_vg() more general.
In all but vgcfgbackup, we do not want to proceed if we get any error
from vg_read(). In vgcfgbackup's case, we may proceed if the backup
is to proceed with inconsistent VGs. This is a special case though,
and we mark it with the READ_ALLOW_INCONSISTENT flag passed to
process_each_vg (and subsequently to _process_one_vg).
NOTE: More cleanup is needed in the vg_read_error() path cases.
This patch is a start.
This patch is all just cleanup and no other patch depends on it.
Replace explicit dereference and check with vg_is_exported().
Update a few copyrights and remove unnecessary whitespace.
Should be no functional change.
Of the vgs field vg_attr, a few of the most likely to be used attributes
are clustered, exported, and partial. This patch adds the following 3
functions:
uint64_t lvm_vg_is_clustered(const vg_t vg)
uint64_t lvm_vg_is_exported(const vg_t vg)
uint64_t lvm_vg_is_partial(const vg_t vg)
We don't have to call dm_cookie_supported with dm_udev_get_sync_support
this way. Also, it's necessary for the current code to work properly on
systems without cookie support (older kernels).
- add DM_UDEV_RULES_VSN to provide a variable to be checked for in the other
rules (e.g. to check that DM rules are actually installed, we can alternate
functionality in the other rules based on this information, also we have
versioning support for the rules)
- set proper sbin path for dmsetup and blkid, /sbin first, then /usr/sbin.
This is necessary for anaconda to work properly.
- add 'last_rule' for cryptsetup's temporary devices (symlinks in /dev/mapper
only)
The test/api directory TARGET line will be reserved for non-interactive
unit tests. Building the interactive test can still be done with "make test"
from the test/api dir.
When clogd was renamed to cmirrord, somehow git got the remove of the old
files but not the add of the new files. This patch adds the new files.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The test/api directory TARGET line will be reserved for non-interactive
unit tests. Building the interactive test can still be done with "make test"
from the test/api dir.
Now that we've refactored the internal library functions that do the
vg_remove, we can handle the deferred commit of a lvm_vg_remove() inside
lvm_vg_write(). This makes the VG create/remove API more consistent in
terms of disk commits - they now both require an lvm_vg_write() to commit
the create or remove to disk.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Now that we've split vg_remove_single into two routines, in the first routine
that only manipulates memory, we move the PVs from the vg->pvs list to the
vg->removed_pvs list. Then later, we iterate through this list to write the
removed PVs to disk, which removes them from the volume group and places them
into the internal ORPHAN VG.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Split vg_remove_single into vg_remove_check (mandatory checks before
vgremove) and vg_remove (do actual remove by committing to disk).
In liblvm, we'd like to provide an consistent API that allows multiple
changes in memory, then let lvm_vg_write() control the commit to disk. In
some cases (for example, lvresize calls fsadm) this may not be possible.
However, since we are using an object model and dividing things into small
operations, the most logical model seems to be the lvm_vg_write model, and
handling the special cases as they arrive. So as best as possible
we move towards this end.
A possible optimization would be to consolidate vg_remove (committing)
code with vgreduce code. A second possible optimization is making vgreduce
of the last device equivalent to vgremove. Today, lvm_vg_reduce fails if
vgreduce is called with the last device, but from an object model perspective
we could view this as equivalent to vgremove and allow it. My gut feel is
we do not want to do this though.
Author: Dave Wysochanski <dwysocha@redhat.com>
Later patches should consolidate the vgremove / vgreduce functions but for
now let's clarify what vg_remove actually does by changing the name.
Author: Dave Wysochanski <dwysocha@redhat.com>
Add a new constraint that vgname locks must be obtained in
alphabetical order. At this point, we have test coverage for
the 3 commands affected - vgsplit, vgmerge, and vgrename.
Tests have been updated to cover these commands.
Going forward any command or library call that must obtain
more than one vgname lock must do so in alphabetical order.
Future patches will update lvm2app to enforce this ordering.
Author: Dave Wysochanski <dwysocha@redhat.com>
These functions are really identical but for clarity I made them separate
functions in this patch.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
If the destination vgname comes before the source vgname, we must open the
destination first because of the locking rules. Thus, do a strcmp and set
the flag based on the comparison.
Author: Dave Wysochanski <dwysocha@redhat.com>
Slight functional change. If we open the destination first, we cannot
know the 'fmt'. In this case we use the default metadata type unless
the user has specified -M on the cmdline. If not, in most cases this
is fine since we use the LVM2 default metadata type. However, if the
user is specifying a non-default metadata type (e.g. lvm1) and the order
of the names is such that we have to open the destination (vg_to) first,
we have a problem. So in this case, we require the use of -M and vgsplit
will fail with an error if not. I've updated the man page to recommend
the usage of -M in this case.
Author: Dave Wysochanski <dwysocha@redhat.com>
Move the creating/opening of the destination vg into its own function so later
we can reorder the source / destination vg opening based on the alphabetical
lock order rule.
Should be no functional change but code is a bit tricky.
Author: Dave Wysochanski <dwysocha@redhat.com>
Introduce 'lock_vg_from_first' flag to retain which vg was locked first.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
This will aid in future refactorings and allow for us to reorder the source
and destination vg based on alphabetical names.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
interface it should be using, it can still be overriden with -I.
If corosync isn't running or there is no information then the usual
checking will happen.
This code only builds if corosync is available.
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7BoKYEeKMr033xK6o4og7F13sGi|��� already in use on "/dev/sdb1"
is now
# pvcreate -u udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi /dev/sdc
uuid udwxr7-BoKY-EeKM-r033-xK6o-4og7-F13sGi already in use on "/dev/sdb1"
Eliminate busy loop during pvcreate of a "normal" partition.
_md_sysfs_attribute_snprintf() would busy loop if the device it was
given was not a blkext-based MD partition.
Rather than being cute with a busy-loop prone 'goto check_md_major' in
_md_sysfs_attribute_snprintf(): explicitly check if the provided device
is a blkext-based partition (blkext_major()); and then check that the
get_primary_dev() determined parent is an MD device (md_major()).
The device-mapper mirror CTR table has been changing over time. This has
now been corrected to handle the old and new methods for invoking the
'block_on_errors' and 'cluster' features. (The code that does this was
accidentally committed in the previous check-in. This check-in finishes
the job.)
This check-in includes the touch-ups, make file changes, copyrights,
and other necessities to include the cluster log daemon into the
build system.
[autoconf still needs to be run to generate the 'configure' and
'Makefile' files.]
Eliminate dependency on outside library, since the same functionality
exists in our tree.
[It is important that the bitops work in the same way, as the bitmaps
must remain backwards compatible. I haven't tested every architecture,
but the x86* archs work. My test involved using the old ext2fsprogs
bitops, memcpy'ing the bits over to the LVM bitset array and ensuring
that only the bits set via the old methods were set.]
This patch update pv_t handle to be consistent with lvm_t - define as a pointer
to internal struct physical_volume.
Author: Dave Wysochanski <dwysocha@redhat.com>
This patch update lv_t handle to be consistent with lvm_t - define as a pointer
to internal struct logical_volume.
Author: Dave Wysochanski <dwysocha@redhat.com>
This patch update vg_t handle to be consistent with lvm_t - define as a pointer
to internal struct volume_group.
Author: Dave Wysochanski <dwysocha@redhat.com>
Full changes
- Fix vgextend error path when lock_vol(VG_ORPHANS) fails
- Move lock_vol(VG_ORPHANS) before archive(vg) - safe & simpler error paths
- Remove legacy comment/code that no longer applies
Found in review - Milan Broz <mbroz@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The changes to remove LCK_NONBLOCK from the LVM locks broke clvmd because the
code was clearly wrong but working anyway! The constant was being masked rather
than the variable that was supposed to match against it.
Rename private _primary_dev() to a public get_primary_dev() and reuse it
to allow retrieval of the MD sysfs attributes (raid level, etc) for MD
partitions.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Improve lib/device/device.c:_primary_dev()'s ability to look up the
primary device associated with all partitions; including blkext
(e.g. partitions directly on MD). The same will also work for obscure
sysfs paths; e.g.: paths with mangled names like the HP cciss driver
uses: /sys/block/cciss!c0d0/cciss!c0d0p1/
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Adds 'data_alignment_detection' config option to the devices section of
lvm.conf. If your kernel provides topology information in sysfs (linux
>= 2.6.31) for the Physical Volume, the start of data area will be
aligned on a multiple of the ’minimum_io_size’ or ’optimal_io_size’
exposed in sysfs.
minimum_io_size is used if optimal_io_size is undefined (0). If both
md_chunk_alignment and data_alignment_detection are enabled the result
of data_alignment_detection is used.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If the pvcreate --dataalignmentoffset option is not specified the start
of a PV's aligned data area will be shifted by the associated
'alignment_offset' exposed in sysfs (unless
devices/data_alignment_offset_detection is disabled in lvm.conf).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Documented which use-cases force the reinstatement of the nuanced
handling of pe_start. As soon as orphan PVs are eliminated much of this
will no longer be a concern ('preserve_pe_start' can be reenabled in
.pv_setup).
Added defensive 'if (pv->pe_align)' check in _text_pv_write()'s pe_start
loop.
If pv_setup was given a non-zero pe_start it would short-circuit
establishing a default pv->pe_align. pv->pe_align=0 would result
in a divide by zero in _mda_setup(). 'vgconvert -M2 $vgname' hit this.
.pv_write still properly preserves pe_start if it was supplied.
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
areas.
This preserved pe_start would quickly be readjusted to follow the first
mda anyway. An example use-case that hit this code path is: running
pvcreate on an already existing PV _without_ a preceeding pvremove.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Without this fix rounding the end of the first mda to a pe_align
boundary could silently exceed the disk_size.
Final 'if (start1 + mda_size1 > disk_size)' block serves as a safety
net.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Document existing pe_start policy.
Fix issue in _text_pv_setup() where existing pe_start case could have
the pv->pe_start set to pv->pe_align even though pe_start shouldn't ever
change.
vgconvert and pvcreate have a facility to preserve the existing start
of the on-disk data extents, known as pe_start.
They indicate this by passing the existing value to the pvsetup function
which must preserve it.
This patch avoids one particular case where the value could get
changed incorrectly now that the alignment settings are configurable.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A patch to the kernel, adding the 'luid' field to dm_ulog_request,
will allow us to properly identify log instances. We will now
be able to definitively identify which logs are to be removed/
suspended/resumed. This replaces the old faulty behavior of
assuming the logs were the same if they had the same UUID and
incrementing/decrementing a reference count.
For now, a simple way to enforce the read/write semantics is to just save the
open mode of the VG. If the caller uses lvm_vg_create, the mode is write.
The caller using lvm_vg_open can use either read or write to open the VG.
Once we have this, we enforce the permissions on each API call and don't allow
a caller to modify a VG that has not been opened properly.
This may be better combined with the locking mode, but I view that as future
cleanup, past this initial release. The intial release should enforce the
basic object semantics though, as described in the lvm.h file.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Adding the ability to get the seqno is important for an application to
determine if something has changed in a VG. Otherwise, the only way to
know is to open the VG with write permission and hold the handle.
This addresses a a large amount of Alasdair's review. Subsequent patches
will address remaining issues.
Addressed:
// FIXME Mention that's also required on error.
// FIXME Be consistent in terminology. It's called "system_dir" then last sentence says "system directory setting". Is it referring to "system_dir" there or something else?
// FIXME Mention it frees all resources and cannot be used subsequently?
// FIXME What does "any system configuration" mean?
// FIXME Expand on that explanation a bit, now that we know what the other fns look like.
// FIXME Not sure about that - it needs to scan sometimes. "will not" or "might not" ?
// FIXME: That's a FIXME in the code!!!
// FIXME What does "copied" mean in this context???
// FIXME Say what struct the returned struct dm_list is a list of...
// FIXME "This API" ? This function creates an object in memory?
// FIXME This function commits the Volume Group object referenced by the VG handle to disk?
// FIXME Where is "Name" defined? Absolute pathname?
Outstanding:
// FIXME Version function first? No structs or handles needed for that.
// FIXME Sort out this alignment. "Set an" directly below "system_dir" looks awful. Indent differently? More blank lines?
// FIXME Check how doxygen processes this. E.g. "return: LVM handle. You must use lvm_error() to check there were no errors and confirm that the handle is valid for passing to other functions."
// FIXME Find a better name. lvm_init.
// FIXME Consider renaming according to the new name for lvm_create.
// FIXME Please can we use dm_malloc throughout?
Allowing the caller to override the LVM configuration with an API will
enable them to use things such as device filters.
While very flexible, there is some danger to this API in that it will
make it harder to debug setups that have a changing config and deduce
what might have happened. At some point we may want to limit the scope
of this API but for now it is as open as the --config option to lvm commands.
Update exported symbols. When I renamed lvm_reload_config to lvm_config_reload
I forgot to rename so I renamed that one here.
This I believe is the last liblvm API for now. ;-)
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
These lower-priority interfaces are not currently implemented in liblvm
but are on the TODO list in the near term.
Author: Thomas Woerner <twoerner@redhat.com>
Acked-by: Dave Wysochanski <dwysocha@redhat.com>
Extend lvm_vg_write to remove pvs removed in lvm_vg_reduce. The lvm
volume_group internal structure removed_pvs is used for that. The list is
empty afterwards.
Add new test for vg->pvs emptyness to lvm_vg_write to prevent to write empty
vgs. This is needed because of lvm_vg_reduce and lv_vg_create, which creates
empty vgs.
Signed-off-by: Thomas Woerner <twoerner@redhat.com>
Acked-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
This function behaves a little bit different than vg_reduce_single, because
it allowes to remove even the latest pv. This has been done to be consistent
to lvm_vg_create, which creates an empty vg.
removed_pvs has been added to the volume_group struct. vg_reduce adds remove
pvs to this list to be able to commit the changes for the pvs in lvm_vg_comm
in liblvm2app.
Initialize removed_pvs list in format-specific volume_group constructors.
Ideally, we should have a base constructor here that initializes the general
non-format specific members of struct volume_group. But until then, there
are multiple places to initialize these members. Maybe a better patch would
be a base constructor patch for struct volume_group. That is more work
though.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Thomas Woerner <twoerner@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
The two liblvm functions that return a list of vgnames and vguuids use
cmd->mem to allocate the list. Make it clear to the caller that this
memory will be freed when the LVM handle is freed.
Clean up and clarify the return value of the functions. In the
case of a memory allocation error, add a couple log_errnos to the internal
code, and make it clear that memory allocation returns a NULL pointer.
If there are no VGs in the system, the list returned is an empty list.
Make a note of the fact that currently we return hidden VG names, how
these can be detected (always start with "#"), and that they should
not be used.
Author: Dave Wysochanski <dwysocha@redhat.com>
The general naming scheme for most liblvm APIs is:
lvm_<object>_<action>
As there are likely to be other things to do on the lvm 'config' object
(i.e. lvm_config_set_device_filter), we should use consistent naming.
Author: Dave Wysochanski <dwysocha@redhat.com>
Limited implementation but other types of activation should probably have
separate calls. We also currently do not handle pvmoves or lvconverts.
Author: Dave Wysochanski <dwysocha@redhat.com>
For now, liblvm will return -1 (fail) / 0 (success) or
NULL (fail) / non-NULL (success). Upon failure, lvm_errno and
lvm_errmsg should be used to determine the precise error.
Author: Dave Wysochanski <dwysocha@redhat.com>
For now, we use the following scheme.
For APIs that return an int, success is 0, fail is -1.
APIs that return handles, success is non-NULL, fail is NULL.
At this early stage, liblvm error handling mechanism is subject to change,
but for now we go with this simple scheme consistent with system
programming.
Author: Dave Wysochanski <dwysocha@redhat.com>
Hard to divide into smaller patches because of the moving around and
commenting, so just put in the new file diffs. We will review and can
update as necessary.
Author: Dave Wysochanski <dwysocha@redhat.com>
Add a very simple version of lvm_vg_remove_lv.
Since we currently can only create linear LVs, this simple remove function
is adequate. We must refactor lvremove_single a bit.
Author: Dave Wysochanski <dwysocha@redhat.com>
Add the most straightforward 'get' functions required for anaconda.
These are the ones that return simple uint64_t values.
The other more complex ones involve the lv_attr bits. These will
come in a separate patch series since each lv_attr bit will be returned
in a separate API instred of returning the string and requiring the
user to parse it.
Author: Dave Wysochanski <dwysocha@redhat.com>
For liblvm 'get' functions, we should share code with the reporting functions.
This means we need common code to return the values for the fields.
In this patch we refactor a few of the fields needed in liblvm.
Unfortunately, for the simple fields that do derefernces of structure
members (for example, vg_extent_count), we cannot call the common function
from the reporting infrastructure without more refactoring. The reason is
that the dereference of the simple fields is done deep inside the reporting
code (to get the generic "data" pointer), and the display function is a
generic 'size32' function. We can fix these issues later with more
refactoring.
Should be no functional change and the testsuite should cover any possible
regressions. The only fields in the report affected by this patch are:
vg_size, vg_free, and pv_mda_count.
Author: Dave Wysochanski <dwysocha@redhat.com>
After some refactorings, we can now move the bulk of _lvcreate into the
internal library, and we can call from liblvm. In the future, we should
refactor lv_create_single further, probably by segtype, to reduce the
size of struct lvcreate_params. For now this is a reasonable refactor
and allows us to re-use the function from liblvm.
Author: Dave Wysochanski <dwysocha@redhat.com>
The main _lvcreate function should deal with extents - the 'size' parameter
is just an intermediate step.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
Create a new structure, lvcreate_cmdline_params, to store parameters only
relevant for the cmdline, not the library call to lvcreate.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
Move extents calculation adjustments into their own local functions
right after we read the vg. This calculation really is not part of
the LV create function but is rather an adjustment to the parameters
based on what is given on the cmdline. So we move it outside the main
_lvcreate.
Should be no functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
A couple simple refactorings of _lvcreate - should be no functional change.
Move tags_ARG parsing into _lvcreate_params. Also use lp->voriginsize
instread of arg_count(). These refactorings make it easier to move the
bulk of _lvcreate into the library.
Author: Dave Wysochanski <dwysocha@redhat.com>
Although the tools do not currently do this, we update lvm_vg_extend
to do an implicit pvcreate on an uninitialized device. The tools will
soon be refactored to do this as well, but more work is needed in the
tools. For now we update lvm_vg_extend since this is the behavior
required by liblvm.
With this change, the simple liblvm unit test, test/api/vgtest.c
should pass whether or not the device is initialized.
Author: Dave Wysochanski <dwysocha@redhat.com>
The implicit pvcreate require either moving the ORPHAN_VG lock outside
pvcreate_single or somehow having the function know or detect whether
the ORPHAN_VG lock is already held.
Author: Dave Wysochanski <dwysocha@redhat.com>
Passing NULL for pvcreate parameters gives you default parameters for
pvcreate_single.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
In preparation for implicit pvcreate during vgcreate / vgextend,
move bulk of pvcreate logic inside library.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
We must hold the VG_ORPHAN lock until we commit to disk. Otherwise,
we risk a race condition on vgcreate / vgextend. Reverts the following
commit:
commit 72a41480ba
Author: Dave Wysochanski <dwysocha@redhat.com>
Date: Fri Jul 10 20:09:21 2009 +0000
Move orphan lock obtain/release inside vg_extend().
With this change we now have vgcreate/vgextend liblvm functions.
Note that this changes the lock order of the following functions as the
orphan lock is now obtained first. With our policy of non-blocking
second locks, this should not be a problem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The lvm_list_vg_{names|ids} functions do not do a scan so we provide
a liblvm function that does a scan.
Author: Dave Wysochanski <dwysocha@redhat.com>
These functions provide the capability of enumerating all vgnames and
vgids in the system. They do not do a scan of the system.
Author: Dave Wysochanski <dwysocha@redhat.com>
Caller must free the memory of the uuid / name returned.
This may not be the best memory management policy since it may lead to
memory leaks if the caller has code like this:
if (!lvm_vg_get_name(vg))
Maybe we don't care - if we do we can use pools tied to handles later
or some other scheme.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Thomas Woerner <twoerner@redhat.com>
- Use vgmem pool to allocate a list of lvm_*_list structs
- Allocate a new list each call (list may have changed since last call)
- Add to liblvm's exported symbols
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Thomas Woerner <twoerner@redhat.com>
- pv_t, vg_t, lv_t
- include libdevmapper.h: needed for struct dm_list
These list structures will be needed in later APIs to return a list of
handles to one object, given another object. For example, lvm_vg_list_lvs()
will return a list of LV handles (lv_t's) given a VG handle (vg_t). We
need a structure to do this so we define the LV structure, as well as the
other structures at this point.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Some of the error interface is still TBD. Rather than exporting a lot
of codes, etc, just use a simple pass / fail. The allows our unit test
to not segfault if trying to create a VG that already exists.
lvm_vg_open() calls internal vg_read() function which is the entry point
for reading an existing VG. In addition to the mode, we include a 'flags'
parameter for future extensions.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reverts some of my 'cleanup' from last night. For now we will use pass/fail
on API calls (either 'int' return or NULL/non-NULL handle), then use
lvm_errno() to get more specific errors.
Author: Dave Wysochanski <dwysocha@redhat.com>
made to compensate for the changes in the kernel-side component
that recently went upstream. (Things like: renamed structures,
removal of structure fields, and changes to arguments passed
between userspace and kernel.)
These messages are unnecessary in the set functions. We check for this
condition and print a message in the vgchange tool but not the library
functions.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When converting to the new liblvm functions, the vgcreate code path
changed to create a new vg, then set values. As a result of this
change, and the fact that we give a user a message if they try to
set the same value of a VG attribute (extent_size, alloc_policy, etc),
you'll see these 2 extraneous "is already" messages with vgcreate:
tools/lvm vgcreate vg2 /dev/loop2
Physical extent size of VG vg2 is already 4.00 MB
Volume group allocation policy is already normal
Volume group "vg2" successfully created
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Since we are using errno values, we should use '0' as a default value
which indicates a non-error, rather than defining some made-up default
value that is not defined in errno. If we need to deviate from errno
values, this will most likely indicate a flaw in our design.
Add prototypes for lvm_errno and lvm_errmsg inside lvm.h and provide
a basic description of their function. This fixes a couple compile
warnings.
Author: Dave Wysochanski <dwysocha@redhat.com>
We provide a lock type that behaves like no_locking, but is not
clustered. Moreover, it also forbids any write locks. This magically (and
consistently) prevents use of clustered VGs, or changing local VGs with
--ignorelockingfailure. As a bonus, we can remove the special hacks in a few
places. Of course, people looking for trouble can always set their locking_type
to 0 to override.
In _process_one_vg, we should never proceed if the VG read fails with certain
conditions. If we cannot allocate or construct the volume_group structure,
we should not proceed - this is true regardless of the tool calling the
iterator. In other cases, when the volume group structure is constructed but
there is some error (PVs missing, metadata corrupted, etc), some tools may
want to process the VG while others may not.
Author: Dave Wysochanski <dwysocha@redhat.com>
In vg_backup_single, we should error out if we vg_read_error(vg) and the
error code we received was anything other than FAILED_INCONSISTENT.
Original code contained an error because C operator precedence.
Note - this was part of the vg_read() so no WHATS_NEW entry neceesary.
Author: Dave Wysochanski <dwysocha@redhat.com>
liblvm unit test case uses the following APIs:
- lvm_create, lvm_destroy
- lvm_vg_create, lvm_vg_extend, lvm_vg_set_extent_size, lvm_vg_write,
lvm_vg_remove, lvm_vg_close
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Add the following VG APIs to liblvm/.exported_symbols:
-lvm_reload_config
-lvm_vg_create
-lvm_vg_extend
-lvm_vg_set_extent_size
-lvm_vg_write
-lvm_vg_close
-lvm_vg_remove
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Add some liblvm APIs for VGs. Most of these APIs simply call into the internal
liblvm library. Ideally we should call the liblvm functions directly from
the tools. However, until we convert more of the code to liblvm functions,
things like the cmd_context will get in the way. For now just implement the
liblvm functions as wrappers around the internal functions, with a little
error checking and return code handling. We put all these vg APIs into a
new file, lvm_vg.c
The following APIs are implemented:
lvm_vg_create, lvm_vg_extend, lvm_vg_set_extent_size, lvm_vg_write,
lvm_vg_remove, lvm_vg_close.
Still TODO:
- cleanup error handling by using lvm_errno() and related APIs
- cleanup naming / clarify which functions commit to disk vs not
- implement more 'set' functions
- decide on 'set' / 'change' nomenclature
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
This needs initialized to non-NULL before using the archive() call.
Normally this is set to the cmdline string when lvm is called from a tool.
We could think about using it in another way, as a potential audit trail
of liblvm calls, or just leave it set to the default "liblvm", similar to
what clvmd does. For now, just set it to "liblvm".
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Define the 5 main liblvm objects to be the pv, vg, lv, lvseg, and pvseg.
We need handles defined to all these objects in order for liblvm to be
equivalent to the reporting commands pvs, vgs, and lvs.
- move vg_t, lv_t, and pv_t from metadata-exported.h into lvm.h
- move lv_segment and pv_segment forward declarations into lvm.h
- add lvseg_t and pvseg_t to lvm.h
NOTE: We currently have an inconsistency in handle definitions.
lvm_t is defined as a pointer, while these other handles are just
structures. We should pick one scheme and be consistent - perhaps
define all handles as pointers (this is what I've seen elsewhere).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
The checks for RESIZEABLE_VG should now be inside the various functions that
have to do such operations.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Remove READ_REQUIRE_RESIZEABLE flag from vgsplit similar to the removal from
vgextend. Move the check inside the functions that actually move pvs from
one vg structure to another. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
In the future we may export these functions or something like them in liblvm
For now this helps in cleaning up the checks for RESIZEABLE since we can
use the internal library function vg_bad_status_bits.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Move the check for the RESIZEABLE flag inside the vg_extend function.
When we consolidated the vg locking, reading, and status flag checking,
we tied the check for the RESIZEABLE flag to the vg_read() call. The problem
with this is you cannot know what other APIs the application my or may not
call after a vg_read() call. Thus the READ_REQUIRE_RESIZEABLE flag is not
really ideal - ideally we should be checking for this flag on a specific
operation, not inside the vg_read() call. This patch moves one check inside
the library.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
xlate64 produces unsigned long long type, but PRIu64 is defined
to accept argument unsigned long type (on 64-bit machines).
On existing machines, both types have the same size, so it works,
but it is still wrong and produces a warning.
Fix it by using a cast to uint64_t --- according to the standard,
PRIu64 argument matches type uint64_t.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Orphan lock is now obtained second and released first, and all tools
are consistent in this regard.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
With this change we now have vgcreate/vgextend liblvm functions.
Note that this changes the lock order of the following functions as the
orphan lock is now obtained first. With our policy of non-blocking
second locks, this should not be a problem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move the vg orphan lock inside vg_remove_single, now a complete liblvm
function. Note that this changes the order of the locks - originally
VG_ORPHAN was obtained first, then the vgname lock. With the current
policy of non-blocking second locks, this could mean we get a failure
obtaining the orphan lock. In the case of a vg with lvs being removed,
this could result in the lvs being removed but not the vg. Such a
scenario could have happened prior though with a different failure.
Other tools were examined for side-effects, and no major problems
were noted.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Move check for active LVs outside of library function. The vgremove
liblvm function function will fail if there are active LVs. It will
be the application's responsibility to check this condition and remove
the LVs individually before calling vgremove. Note also that we've
duplicated the EXPORTED_VG check in vgremove_single (tools) and
vg_remove_single (library). Duplication seemed the only option here
since we don't want to do the automatic removal of LVs (in the tools)
if the vg is exported, and we still need to protect the library call
from removal if the vg is exported.
We still need to deal with the ORPHAN lock but vg_remove_single is now
very close to our liblvm function.
TODO: Refactor lvremove in a similar way.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
If there is syntax error in metadata, it now prints messages
like:
Couldn't read 'start_extent' for segment 'extent_count'.
Couldn't read all logical volumes for volume group vg_test.
The segment specification is wrong and confusing.
Patch fixes it by introducing "parent" member in config_node which
points to parent section and config_parent_name function, which
provides pointer to node section name.
Also it adds several LV references where possible.
vg_t *vg_create(struct cmd_context *cmd, const char *vg_name);
This is the first step towards the API called to create a VG.
Call vg_lock_newname() inside this function. Use _vg_make_handle()
where possible.
Now we have 2 ways to construct a volume group:
1) vg_read: Used when constructing an existing VG from disks
2) vg_create: Used when constructing a new VG
Both of these interfaces obtain a lock, and return a vg_t *.
The usage of _vg_make_handle() inside vg_create() doesn't fit
perfectly but it's ok for now. Needs some cleanup though and I've
noted "FIXME" in the code.
Add the new vg_create() plus vg 'set' functions for non-default
VG parameters in the following tools:
- vgcreate: Fairly straightforward refactoring. We just moved
vg_lock_newname inside vg_create so we check the return via
vg_read_error.
- vgsplit: The refactoring here is a bit more tricky. Originally
we called vg_lock_newname and depending on the error code, we either
read the existing vg or created the new one. Now vg_create()
calls vg_lock_newname, so we first try to create the VG. If this
fails with FAILED_EXIST, we can then do the vg_read. If the
create succeeds, we check the input parameters and set any new
values on the VG.
TODO in future patches:
1. The VG_ORPHAN lock needs some thought. We may want to treat
this as any other VG, and require the application to obtain a handle
and pass it to other API calls (for example, vg_extend). Or,
we may find that hiding the VG_ORPHAN lock inside other APIs is
the way to go. I thought of placing the VG_ORPHAN lock inside
vg_create() and tying it to the vg handle, but was not certain
this was the right approach.
2. Cleanup error paths. Integrate vg_read_error() with vg_create and
vg_read* error codes and/or the new error APIs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
NOTE: vg_set_alloc_policy() returns success if you try to set a value that
is already stored. The behavior of vgchange is the same though - it fails.
There is a fixme noted in the code about this inconsistency, which should
be resolved if possible.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In liblvm, we will reserve the word 'change' to mean an API that
both sets one or more values, and commits to disk. This will be
consistent with the LVM commandline. The existing vg_change_pesize()
function does not commit to disk, but just changes the extent_size
and ensures all internal structures are updated. This logic should
be contained in a function that sets the value.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
It would be nice to have one function that does all the validation
and setting of the VG's pesize. However, currently some checks
are in the higher-level function _vgchange_pesize(), and some
checks are in the lower function vg_change_pesize().
This patch moves most of the higher-level checks inside
vg_change_pesize. In one case a failure return code is
changed from ECMD_FAILED to EINVALID_CMD_LINE.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove unneeded LOCK_NONBLOCKING from vg_read() API and tools that
use it. We no longer need this flag anywhere since we now automatically
set LCK_NONBLOCK inside lock_vol() if vgs_locked().
For further details, see:
commit d52b3fd3fe
Author: Dave Wysochanski <dwysocha@redhat.com>
Date: Wed May 13 13:02:52 2009 +0000
Remove NON_BLOCKING lock flag from tools and set a policy to auto-set.
As a simplification to the tools and further liblvm, this patch pushes
the setting of NON_BLOCKING lock flag inside the lock_vol() call.
The policy we set is if any existing VGs are currently locked, we
set the NON_BLOCKING flag.
At some point it may make sense to add this flag back if we get an
RFE from a liblvm user, but for now let's keep it as simple as
possible.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove READ_CHECK_EXISTENCE and vg_might_exist().
This flag and API is no longer used now that we have a separate
API to check for existence.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove unneeded LOCK_KEEP from vg_read() interface.
Update comment to clarify cases where _vg_lock_and_read() may return
with an error but the lock held. Would be nice to make the vg_read()
interface consistent with regards to lock held and error behavior.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Remove LOCK_KEEP and READ_CHECK_EXISTENCE from vgsplit.
These flags are no longer necessary. We now check for existence
in a differnet function, and it is not necessary to keep the lock.
Removing these flags simplifies the new vg_read() interface.
After this patch, we can fully remove LOCK_KEEP.
READ_CHECK_EXISTENCE needs a bit more work before full removal.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Update units_to_bytes() to support (S)ectors: 500 bytes.
- 500 byte (S)ectors is of questionable value but it adds to consistency
if a user happens to use --units S. This seems better than an error.
Updated test/t-covercmd.sh to test --units [hS]
Document the units that can be displayed via --units uniformly.
- (p)etabytes and (e)xabytes were missing in pvs, vgs and lvs man pages.
Made lvreduce man page "... in units of megabytes." consistent (with the
lvextend and lvresize man pages).
Fix vg_read() error paths to properly release upon vg_read_error().
Note that in the iterator paths (process_each_*()), we release
inside the iterator so no individual cleanup is needed. However there
are a number of other places we missed the cleanup. Proper cleanup
when vg_read_error() is true should be calling vg_release(vg), since
there should be no locks held if we get an error (except in certain
special cases, which IMO we should work to remove from the code).
Unfortunately the testsuite is unable to detect these types of memory
leaks. Most of them can be easily seen if you try an operation
(e.g. lvcreate) with a volume group that does not exist. Error
message looks like this:
Volume group "vg2" not found
You have a memory leak (not released memory pool):
[0x1975eb8]
You have a memory leak (not released memory pool):
[0x1975eb8]
Author: Dave Wysochanski <dwysocha@redhat.com>
Sun May 3 12:54:28 CEST 2009 Petr Rockai <me@mornfall.net>
* Convert vgrename to vg_read_for_update.
Rebased 6/26/2009 - Dave W.
Author: Petr Rockai <prockai@redhat.com>
Committer: Dave Wysochanski <dwysocha@redhat.com>
Sun May 3 12:32:30 CEST 2009 Petr Rockai <me@mornfall.net>
* Rework the toollib interface (process_each_*) on top of new vg_read.
Rebased 6/26/09 by Dave W.
- Add skipping message to process_each_lv
- Remove inconsistent_t.
Sun May 3 11:40:51 CEST 2009 Petr Rockai <me@mornfall.net>
* Convert the straight instances of vg_lock_and_read to new vg_read(_for_update).
Rebased 6/26/09 by Dave W.
Sun May 3 11:40:51 CEST 2009 Petr Rockai <me@mornfall.net>
* Convert the straight instances of vg_lock_and_read to new vg_read(_for_update).
Author: Petr Rockai <prockai@redhat.com>
Committer: Dave Wysochanski <dwysocha@redhat.com>
Is an application uses query and set major:minor
to device, it should not fallback to default major by default.
Add new function whoich allows that (and use it in lvm2).
- validate the specified device is a PV and that it is in a VG
- automatically enable DEBUG (-d) if >= 4 -v instances were supplied
- preserve TMP_LVM_SYSTEM_DIR if it contains an lvm.conf and -d was
specified
- fix handling of special-case where PV is listed as "unknown device"
- more descriptive error when a PV is missing ("unknown device")
- unset LVM_SYSTEM_DIR if it was not originally set
- skip final vgscan if no changes were made
http://bugzilla.redhat.com/show_bug.cgi?id=500861
- Update list of fields/columns for each command (a few missing).
- Update list order to match "-o help" output (easier to verify field list)
- Add "{pv|vg|lv}_all" description.
- Move "-o help" sentence above column/field list.
New sample man page for lvs (pvs and vgs are similar):
-o, --options
Comma-separated ordered list of columns. Precede the list with ’+’ to append to the
default selection of columns instead of replacing it.
Use -o lv_all to select all logical volume columns, and -o seg_all to select all logical
segment columns.
Use -o help to view the full list of columns available.
Column names include: lv_uuid, lv_name, lv_attr, lv_major, lv_minor, lv_read_ahead,
lv_kernel_major, lv_kernel_minor, lv_kernel_read_ahead, lv_size, seg_count, origin, ori-
gin_size, snap_percent, copy_percent, move_pv, convert_lv, lv_tags, mirror_log, modules,
segtype, stripes, stripesize, regionsize, chunksize, seg_start, seg_start_pe, seg_size,
seg_tags, seg_pe_ranges, devices.
With --segments, any "seg_" prefixes are optional; otherwise any "lv_" prefixes are
optional. Columns mentioned in vgs (8) can also be chosen.
The lv_attr bits are:
E.g.
# vgscan
Parse error at byte 2360 (line 54): expected a value
Failed to load config file /etc/lvm/lvm.conf
You have a memory leak (not released memory pool):
[0x818c788] library (12 bytes)
...
status flag after reading the volume group. Thus, no need to set the flag
in vg_read() or clear it later before calling _vg_bad_status_bits().
Also, add back in the !lockingfailed() as part of the CLUSTERED flag check.
It's unclear why it was removed when the check was moved from
_vg_bad_status_bits() to inside _vg_lock_and_read().
There was an open question about the last check in the 'if' stmt for
lockingfailed() with a previous patch submitted. However, I would
defer that right now as it is a separate item and this patch should
be no functional change by including the !lockingfailed().
Petr acked this patch on 5/26 I just forgot to check it in.
Acked-by: Petr Rockai <prockai@redhat.com>
- it can support multiple segments, but note that
to work properly, correct IV (initialization vector)
offset parameter must be set properly.
Because most usage of IV start offset is when we join
several crypto segments together (so iv_offset is the segment
start offset), DM_CRYPT_IV_DEFAULT is defined to simplify
the process.
Function accepts the string in cipher agrument (already
including chainmode and iv type; chainmode and iv parameters are NULL
in this case) or user can provide split parameters which will
join into dm-crypt cipher specification "cipher-chainmode-iv".
All these parameters must be supplied in correct dm-crypt format.
Various tools need to check for existence of a VG before doing something
(vgsplit, vgrename, vgcreate). Currently we don't have an interface to
check for existence, but the existence check is part of the vg_read* call(s).
This patch is an attempt to pull out some of that functionality into a
separate function, and hopefully simplify our vg_read interface, and
move those patches along.
vg_lock_newname() is only concerned about checking whether a vg exists in
the system. Unfortunately, we cannot just scan the system, but we must first
obtain a lock. Since we are reserving a vgname, we take a WRITE lock on
the vgname. Once obtained, we scan the system to ensure the name does
not exist. The return codes and behavior is in the function header.
You might think of this function as similar to an open() call with
O_CREAT and O_EXCL flags (returns failure with -EEXIST if file already
exists).
NOTE: I think including the word "lock" in the function name is important,
as it clearly states the function obtains a lock and makes the code more
readable, especially when it comes to cleanup / unlocking. The ultimate
function name is somewhat open for debate though so later we may rename.
- PPC uses 64k page, some caculations are wrong in tests
- file utility is buggy on PPC and cannot detect swap, use blkid instead
- read ahead is quietly rounded down according to page size
(for now use some common divisor value in test)
Because preload of table for snapshot can produce snapshot
metadata (in kernel cow header) read.
Code should suspend origin first to avoid possible deadlock
when preloading (thus calling snapshot in-kernel constructor)
for origin with suspended cow device.
(fixes previous commit)
Several commands calls process_each_vg() and in provided
callback it explicitly recovers VG if inconsistent.
(vgchange, vgconvert, vgscan)
It means that old VG is released and reread but the function
above (process_one_vg) tries to unlock and release old VG.
Patch moves the repair logic into _process_one_vg() function.
It always tries to read vg (even inconsistent) and then decides
what to do according new defined parameter.
Also patch unifies inconsistent error messages.
The only slight change if for vgremove command, where
it now tries to repair VG before it removes if force arg is given.
(It works similar way before, just the order of operation changed).
When some parts of lvm are built as shared libraries (for example with
--with-snapshots=shared), the 'make' command does not build these parts.
The shared parts are built with 'make install' command.
This bug can be seen if you go to 'lib' subdirectory and type 'make'.
If you type 'make', the shared libraries are not built, if you type
'make all', the shared libraries are built.
The reason for the bug is the line $(SUBDIRS): $(LIB_STATIC)
If make is executed without any arguments, it makes the first target
in the Makefile. If the first target is '$(SUBDIRS): $(LIB_STATIC)',
it only builds static libraries.
This patch moves '$(SUBDIRS): $(LIB_STATIC)' after
include $(top_srcdir)/make.tmpl. make.tmpl contains the 'all' target
as its first target, so 'make' will be equivalent to 'make all' and
shared libraries will be build with 'make' command.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
When mirror convert polling is started (mainly as backgound process,
in lvchange -a y or in lvconvert itself) it tries to read VG
and LV identified by its name.
Unfortunatelly, the VG can have already different LV under the same name,
and various more or less funny things can happen (note that
_finish_lvconvert_mirror suspends the volume for example).
(the typical example is our testing script which continuously recreates
LVs under the same name in the same VG.)
This patch adds optional uuid parameter which helps to properly
select the monitoring object. For lvconvert polling it is set to LV UUID
and both _get_lvconvert_vg and _get_lvconvert_lv uses it to read proper VG/LV.
(In the pvmove case it is NULL, here we poll for physical volume name).
If there is no free area for log, code should break the loop.
(Otherwise it uses uninitializes areas later.)
Easily reproducible using lvconvert --repair
- kill device with log
- run lvconvert --repair vg/lv (with no PV usable for log)
During vgreduce is failed mirror image replaced with error segment,
this segmant type has always area_count == 0.
Current code expects that there is at least one area with device,
patch fixes it by additional check (fixes segfault during vgreduce).
Also do not calculate readahead in every lv_info call, we only need
to cache PV readahead before activation calls which locks memory.
Use lvconvert --repair in dmeventd mirror DSO.
for now.
It replaces bad behaviour in one set of circumstances with bad behaviour
in a different set. We think the behaviour needs to be more configurable.
When we are stacking LV over device, which has for some reason
increased read_ahead (e.g. MD RAID), the read_ahead hint
for libdevmapper is wrong (it is zero).
If the calculated read_ahead hint is zero, patch uses read_ahead of underlying device
(if first segment is PV) when setting DM_READ_AHEAD_MINIMUM_FLAG.
Because we are using dev-cache, it also store this value to cache for future use
(if several LVs are over one PV, BLKRAGET is called only once for underlying device.)
This should fix all the reamining problems with readahead mismatch reported
for DM over MD configurations (and similar cases).
Current code, when need to ensure that volume is not
active on remote node, it need to try to exclusive
activate volume.
Patch adds simple clvmd command which queries all nodes
for lock for given resource.
The lock type is returned in reply in text.
(But code currently uses CR and EX modes only.)
This means two things:
1) Non-mirrored LVs will be no longer affected by mirror monitoring. (Before,
if you had a LV that went partially missing on a VG where a mirror leg failed,
this LV would be removed automatically by dmeventd... Probably not an
unrecoverable dataloss bug, but still quite unpleasant.)
2) If enough parallel PV space is available at the time of the mirror failure,
the failed devices will be automatically replaced using this spare space. Which
(and whether) free space may be used is still not configurable, but is a
planned feature. Since it is relatively easy to undo the action by converting
the mirror manually, I don't consider this to be a showstopper. In fact, I
think the compromise is much better than what we have now.
pvmove now keep suspended devices if temporary mirror creation fails.
We can try to restore previous state if it is first attempt to activate
pvmove (code basically run the same code as --abort automatically).
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
We can temporarily violate max_lv during mirror conversion etc.
(If the operation fails, orphan mirror images are visible to administrator
for manual remove for example. Not that this should ever happen:-)
Force limit only for lvcreate (and vg merge) command.
Patch also adds simple max_lv tests into testsuite
The vg->lv_count parameter now includes always number of visible
logical volumes.
Note that virtual snapshot volume (snapshotX) is never visible,
but it is stored in metadata with visible flag.
link_lv_to_vg and unlink_lv_from_vg are the only functions
for adding/removing logical volume from volume group.
Only these function should manipulate with vg->lvs list.
Later patch initializes lv->vg after the LV structure is prepared,
so pass through cmd context and do not use vg->cmd here.
Also move LV id calculation (which uses lv->vg too).
Also properly free memory pool if operation fails.
The snapshot segment (snapshotX) is created twice
during the text metadata segment processing.
This can cause temporary violation of max_lv count.
Simplify the code, snapshot segment is properly initialized
in init_snapshot_seg function now and do not need to be replaced
by vg_add_snapshot call.
The vg_add_snapshot() is now usefull only for adding new
snapshot and it shares the same initialization function.
The snapshot name is always generated, name paramater can be
removed from function call.
As a simplification to the tools and further liblvm, this patch pushes
the setting of NON_BLOCKING lock flag inside the lock_vol() call.
The policy we set is if any existing VGs are currently locked, we
set the NON_BLOCKING flag.
Should be no functional change.
The seg variable is temporary variable for list iterator,
code cannot expect that after iteration it remains NULL
(it contains non-NULL pointer here id list is empty).
Patch fixes first_seg function so it now correctly returns NULL
for empty segment list.
Buildsystem support device-mapper only install,
but generic install tagret includes both dm+lvm2.
For distribution which uses separate install_device-mapper,
there is no way how to install lvm2 only
(so after installing lvm2 for packaging purposes
built system must remove installed device-mapper files).
Fix it by allowing lvm2_install target, similarily like
install_cluster for clvmd.
(install = install_device-mapper + install_lvm2)
The dataalign value must always be aligned according
to MDA area.
The currect code checks if calculated value collides with
MDA area but not if the value is so small that it is
located before MDA starts.
Unfortunatelly there can be also MDA in the end of the device.
The patch adds simple check to avoid this miscalculation.
Patch expects that first MDA always starts on <= pagesize boundary
(this is true for all allowed label sector parameters).
Add lvs origin_size field.
Fix linux configure --enable-debug to exclude -O2.
Still a few rough edges, but hopefully usable now:
lvcreate -s vg1 -L 100M --virtualoriginsize 1T
Run backup of metadata on remote nodes in the
same place like local node - when calling backup().
Introduce backup_locally() which calls only
local backup if needed.
Remote backup is now trigerred by LCK_VG_BACKUP flag
combination (special VG lock).
This lock type will call check_current_backup()
(including backup_locally() call) and updates
metadata on all nodes.
(Patch fixes non-functional remote backup,
current call during VG lock never triggers.)
The backup() call store metadata from memory.
But in cluster backup() call performs
remote nodes metadata backup and it reads data from disk.
For metadata backup consistency,
patch moves all backup() calls after vg_commit.
(Moreover, some tools already do that this way.)
- Rename unlock_all to destroy_lvhash,
this function is called in cluster shutdown
unlocks everything and clean up allocated info space.
- Tidy lv_hash_lock use
.
Except adding free(lvi) in lv_has destructror
there is no functional change.
If user requests report attribute from PVSEG type
and PV is orphan (or all devices is set), the report
is empty.
Try for example (when only orphan PV are present)
#pvs
#pvs -o +devices
# pvs /dev/sdb1
PV VG Fmt Attr PSize PFree
/dev/sdb1 lvm2 -- 46.58G 46.58G
# pvs -o +devices /dev/sdb1
(no output)
The problem is caused by empty pv->segments list.
Fix it by providing fake segment here (similar to fake structures
in _pvsegs_sub_single() calls.
# pvs -a -o devices
Volume group name (null) has invalid characters
Skipping volume group (null)
...
_pvsegs_sub_single creates fake vg, we need to check
that pv is real here.
Since now, all code reading volume group is responsible for releasing
the memory allocated by calling vg_release(vg).
(For simplicity of use, vg_releae can be called for vg == NULL,
the same logic like free(NULL)).
Also providing simple macro for unlocking & releasing in one step,
tools usualy uses this approach.
The global memory pool (cmd->mem) should be used only for global
physical volume operations.
This patch have to be applied with all subsequent patches to complete
memory pool per vg logic.
Using separate memory pool has quite bit memory saving impact when
using large VGs, this is mainly needed when we have to use
preallocated and locked memory (and should not overflow from that
memory space).
The all_pvs list, used in vg_read, should make its own private
copy of pv structures, otherwise (when vg will use its own pool)
it will point to released memory pool.
The same applies for get_pvs() call.
Patch adds pv_list copy helper and adds explicit memory pool
parameter into _copy_pv.
(Please note that all these helper functions cannot guarantee that
vg related fields are valid - proper vg read & lock must be used
if it is requested.)
Currently PV commands, which performs full device scan, repeatly
re-reads PVs and scans for all devices.
This behaviour can lead to OOM for large VG.
This patch allows using internal metadata cache for pvs & pvdisplay,
so the commands scan the PVs only once.
(We have to use VG_GLOBAL otherwise cache is invalidated on every
VG unlock in process_single PV call.)
If the vg in process_each_segment_in_pv is NULL, the pv struct
can be incomplete (for example lv_segs are not copied in get_pvs()
call).
We need use the new pv from just read-in volume group.
(The same code is in pvdisplay already.)
In libdm/Makefile.in, we need to cleanup the symlink properly.
Adding to CLEAN_TARGETS seemed like the simplest way to do this
in the current build framework. We could redo dependencies for
VERSIONED_SHLIB, but for now just add to CLEAN_TARGETS.
For scripts/Makefile.in, we should be adding to DISTCLEAN_TARGETS.
The generic rule in make.tmpl.in takes care of the cleanup.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Author: Takahiro Yasui <tyasui@redhat.com>
By gnu coding stds, 'distclean' should remove all files generated
by ./configure in addition to what 'clean' does.
Author: Takahiro Yasui <tyasui@redhat.com>
Patch fixes these problems:
- during the snapshot creation process, it needs create 2 LVs,
one is cow, second becomes snapshot.
If the code fails in vg_add_snapshot, code lvcreate will not remove
LV cow volume.
- if max_lv is set and VG contains snapshot, it can happen that
during the activation lv_count is temporarily increased over the limit
and VG metadata are not properly processed
see https://bugzilla.redhat.com/show_bug.cgi?id=490298
- vgcfgrestore alows restore with max_lv set to lower valuer that actual
LV count. This later leads to situation that max_lv is completely ignored.
- vgck doesn't call vg_validate(). It should at least try:-)
Signed-off-by: Milan Broz <mbroz@redhat.com>
We would like to declare our handles pv_t, vg_t, and lv_t in
the external library header lvm.h. However, these are already
defined in metadata-exported.h for the use of some of the
in-progress liblvm APIs. Thus, we cannot both define
them in lvm.h and include metadata-exported.h in the external
library C files. We could use preprocessor tricks (#ifndef)
but for now we just avoid the include.
The original liblvm.a has been moved to liblvm-internal.a.
We now use liblvm.a for the new application library and build
it inside liblvm directory.
Change dependencies so tools depend on liblvm application library,
and application library depends on liblvm internal.
(for example when CLVMD_CMD_LOCK_VG for is called during vgscan).
If clvmd calls LV lock, it calls
/* clean the pool for another command */
dm_pool_empty(cmd->mem);
to clean up memory pool after command.
Unfortunately, do_refresh_cache() do not call this
and because during it operation it allocates some memory,
the pool increases.
Also do_refresh_cache should use lvm_lock
(it manipulates with lvm internal data).
The same applies for lvm_backup command.
Signed-off-by: Milan Broz <mbroz@redhat.com>
if rlocn not defined (there is no metadata area).
In most cases it fails in validate_name(),
unfortunately there are situatuions, when
validate_name is ok and later code fails with
checksum error.
Reproducer:
# dd if=/dev/zero of=/dev/loop0
# pvcreate --metadatasize 637k /dev/loop0
Physical volume "/dev/loop0" successfully created
# pvs /dev/loop0
/dev/loop0: Checksum error
PV VG Fmt Attr PSize PFree
/dev/loop0 lvm2 -- 1.00M 1.00M
Signed-off-by: Milan Broz <mbroz@redhat.com>
-
Using argv[] list in exec_cmd() to allow more params for external commands.
Fsadm does not allow checking mounted filesystem.
Fsadm no longer accepts 'any other key' as 'no' answer to y/n.
Fsadm improved handling of command line options.
New structure lvm (used as an alias to cmd_context), new type definition lvm_t
for the lvm handle. Added functions lvm_create, lvm_destroy and
lvm_reload_config using the new handle.
Modified test/api/test.c:
Use new lvm.h header file and lvm_t handle.
Removed lib/lvm2.h
Author: Thomas Woerner <twoerner@redhat.com>
This patch is not fully tested and leaves some related bugs unfixed.
Intended behaviour of the code now:
pe_start in the lvm2 format PV label header is set only by pvcreate (or
vgconvert -M2) and then preserved in *all* operations thereafter.
In some specialist cases, after the PV is added to a VG, the pe_start
field in the VG metadata may hold a different value and if so, it
overrides the other one for as long as the PV is in such a VG.
Currently, the field storing the size of the data area in the PV label
header always holds 0. As it only has meaning in the context of a
volume group, it is calculated whenever the PV is added to a VG (and can
be derived from extent_size and pe_count in the VG metadata).
When reporting explicitly label attributes (pv_uuid for example), we do not
need to read metadata.
This patch separate the label fileds and removes scan_vgs_for_pvs
in process_each_pv() if not needed.
(There should be no user visible change in output.)
In libdm, we only ever use 'fields', while the tools use 'options' and
'fields' interchangeably.
Ideally it would be good to use 'fields' consistently everywhere.
However, 'options' most likely comes from the tool commandline '-o' and
'--options' which cannot be changed.
We display '0' for these fields now in this case. Ideally these values are
undefined for an orphan PV but today there is no way to specify undefined
with display functions such as _size64_disp().
"test" never prints anything. Therefore, "return $(test ...)" is equivalent to
just "return;" which means success in sh (same as return 0). We can however,
thanks to set -e, use "test foo = bar" as an assertion.
PS: test a == b is invalid syntax. It is either = or -eq: = is textual and -eq
is numeric comparison.
For example in LVM2, "pv_all" gives all PV fields.
"seg_all" gives all LV segment fields.
"all" gives all fields of the final report type. I think this is more
useful than just adding the current prefix.
So "lvs -o seg_all" gives all the LV segment fields, whilst
"lvs --segments -o all" adds in LV and VG fields too.
"lvs -o all -O vg_name" has report type LVS+VGS so includes all LV and all
VG fields.
Reports the size of the smallest metadata area in a PV or a VG.
Useful to confirm pvcreate --metadatasize or pvmetadatasize setting in
/etc/lvm/lvm.conf file.
NOTE: Actual value in these fields will most always differ from that
given in pvcreate options due to rounding and alignment effects.
There is a rudimentary make file in place so people can build by hand
from 'LVM2/daemons/clogd'. It is not hooked into the main build system
yet. I am checking this in to provide people better access to the
source code.
There is still work to be done to make better use of existing code in
the LVM repository. (list.h could be removed in favor of existing list
implementations, for example. Logging might also be removed in favor
of what is already in the tree.)
I will probably defer updating WHATS_NEW_DM until this code is linked
into the main build system (unless otherwise instructed).
Checks added for DM device names to allow only names < DM_NAME_LEN,
otherwise a part of lengthy name would be silently ignored and could
cause confusion while using dmsetup. Also, the name should not contain
'/' character, if it is used in context of creating a new device
or renaming the existing one (because we do not consider full path
to devices, they do not exist in filesystem yet) and appropriate error
messages are shown.
pvcreate $DEV
vgcreate -s 1k vg_test $DEV
lvcreate -l 1 -n lv1 vg_test
..
/dev/vg_test/lv1: write failed after 1024 of 4096 at 0: No space left on device
Just check for maximum write size in set_lv.
It fails for 1k PE now.
Patch adds log_region_size into allocation habdle struct
and use it in _alloc_parallel_area() for proper log size calculation
instead of hardcoded 1 extent - which can fail.
Reproducer for incorrect log size calculation:
DEV=/dev/sd[bcd]
pvcreate $DEV
vgcreate -s 1k vg_test $DEV
lvcreate -m1 -L 12M -n mirr vg_test
https://bugzilla.redhat.com/show_bug.cgi?id=477040
The log size calculation is mostly copied from kernel code.
Check for major/minor collision is added in _add_dev_to_dtree()
where we already read info by uuid,
so in the case of requesting major/minor it queries device-mapper
by major/minor for device availability.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=204992
Very simple / crude method of removing 'is_static' from initialization.
Why should we require an application tell us whether it is linked
statically or dynamically to libLVM? If the application is linked
statically, but libraries exist and dlopen() calls succeed, why
do we care if it's statically linked?
This allows us to remove one argument from create_toolcontext() and
moves it closer to a generic library init function.
In the arg_*() functions, we just use _the_args() directly.
For now we leave the first parameter to these
arg_*() functions (struct cmd_context *) because
of the number of files involved in removing the
parameter.
In preparation for removing cmd->args.
IMO, it makes more sense to put these accessor functions
in the same location as the static array _the_args.
Next patch will update arg_* functions to use _the_args[]
directly and remove cmd->args.
Problem is dm_report_init() may return NULL and subsequent call to
dm_report_set_output_field_name_prefix() doesn't handle NULL value.
Example:
pvs --nameprefixes --rows --unquoted --noheadings -opv_name,fred
Logical Volume Fields
---------------------
lv_uuid - Unique identifier
lv_name - Name. LVs created for internal use are enclosed in brackets.
...
Physical Volume Segment Fields
------------------------------
pvseg_start - Physical Extent number of start of segment.
pvseg_size - Number of extents in segment.
Unrecognised field: fred
Segmentation fault
Move init_full_scan_done(0) and init_mirror_in_sync(0) from init_lvm()
after call to create_toolcontext() to _init_globals(), called from bottom
of create_toolcontext(). No functional change.
Author: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: James Cameron <james.cameron@hp.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
init_formats() sets up the command formats, and currently sets cmd->fmt_backup
but does not set cmd->fmt to a default value. This seems incorrect so we
set it to cmd->default_settings.fmt before returning.
The call we remove here may set cmd->fmt based on a command line setting.
But it is safe to remove this, because the only caller of init_lvm() that
cares about the cmdline override is the cmdline tools (clvmd does not care),
called from lvm2_main(). After lvm2_main() calls init_lvm(), it later calls
lvm_run_command(). In lvm_run_command(), we have a call to _apply_settings(),
which has the identical assignment of cmd->fmt that this patch removes.
This is very obvious - _init_logging() makes the identical init_msg_prefix()
and init_cmd_name() calls with cmd->default_settings so these calls are
clearly redundant after calling create_toolcontext().
Very similar argument to removal of init_debug() and other calls.
create_toolcontext() calls _process_config() which sets
cmd->default_settings.activation, then calls
set_activation(cmd->default_settings.activation). Later, create_toolcontext()
sets cmd->current_settings = cmd->default_settings. So these calls
set_activation(cmd->current_settings.activation) are redundant.
Identical argument to previous patch which removed archive_enable() calls.
We add a new parameter to backup_init() which sets the enable value based
on the cmd->default_settings.backup value. This value was used to set
cmd->current_settings.backup, used in the removed backup_enable() call.
_init_backup() calls archive_init(), which originally set 'enabled' to
a hardcoded '1' value. This seems incorrect based on my read of other
areas of the code so here we add a 'enabled' paramter to archive_init().
We pass in cmd->default_settings.archive, which is obtained from the
config tree. Later in create_toolcontext, cmd->current_settings is
set to cmd->default_settings. The archive_enable() call we remove
here was using cmd->current_settings to set the 'archive' enable
value. The final value of cmd->archive_params->enabled should thus
be equivalent to the original code.
This one we actually need to move. _init_logging() is called from
create_toolcontext(), which makes this call:
/* Test mode */
cmd->default_settings.test =
find_config_tree_int(cmd, "global/test", 0);
But it does not call init_test(). So we need an init_test() somewhere.
The most logical place is to put it inside _init_logging(), since this
is where the config value is read and default_settings are set. Placing
the init_test() call here matches what is done with other variables and
seems to make sense.
This variable is set at the top of create_toolcontext() to 0.
Nothing later in create_toolcontext() changes the value.
In init_lvm(), nothing between create_toolcontext() call and this assignment
changes the value. Thus, the assignment is redundant.
The rationale for removing init_verbose() call is very similar to removing
init_debug() call. create_toolcontext() calls _init_logging() which
makes these calls:
/* Verbose level for tty output */
cmd->default_settings.verbose =
find_config_tree_int(cmd, "log/verbose", DEFAULT_VERBOSE);
init_verbose(cmd->default_settings.verbose + VERBOSE_BASE_LEVEL);
And being that create_toolcontext() copies default_settings into
current_settings at the bottom, the init_verbose() call we are removing:
init_verbose(cmd->current_settings.verbose + VERBOSE_BASE_LEVEL);
is redundant.
We can safely remove because create_toolcontext() calls _init_logging(),
which makes these calls:
/* Debug level for log file output */
cmd->default_settings.debug =
find_config_tree_int(cmd, "log/level", DEFAULT_LOGLEVEL);
init_debug(cmd->default_settings.debug);
Then at the bottom of create_toolcontext() we do this:
cmd->current_settings = cmd->default_settings;
So the call we are removing from init_lvm() functions (clvmd and lvmcmdline):
init_debug(cmd->current_settings.debug);
Just sets the value of debug based on 'cmd->current_settings.debug'.
Since cmd->current_settings is equivalent to cmd->default_settings, and
init_debug() was called with cmd->default_settings, the call we remove is
redundant.
for losetup, break out of the loop when successful setup of loop device,
and only look at 7 loop devices (default loop module setting)
for blockdev, use old option if new one is not available
with the second vgcreate overwriting the first.
Obtain lock before calling vg_create(), which checks for existence of vgname
and fails if it already exists.
Prior to this patch, "lvremove -f vgname" would fail if vgname contained
one or more snapshot LVs. Now this passes, but has a side-effect.
If you issue "lvremove vgname" where vgname contains one or more snaps,
you will get an extra "y/n" prompt to remove the same snapshot.
Example:
$ lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lvsnap vgtest swi-a- 16.00M lvtest 0.05
lvtest vgtest owi-a- 64.00M
$ lvremove vgtest
Do you really want to remove active logical volume "lvsnap"? [y/n]: n
Logical volume "lvsnap" not removed
Do you really want to remove active logical volume "lvsnap"? [y/n]: n
Logical volume "lvsnap" not removed
Command failed with status code 5.
Fixing this will most likely require modification of the iterator
function, process_each_lvs_in_vg() to iterate over snaps in some
cases (e.g. lvs, vgdisplay -v) but not in others (lvremove).
(Avoids having same mirror table loaded twice concurrently by first
using a 'zero' table to set the size of the device so when mirror
table is preloaded it doesn't have to be activated immediately.)
snapshot DSO unregistered itself when snapshot changed state to invalid.
This can cause a race (and several timeouts), when for example another snapshot
is added and in the middle of operation (suspend/resume) the monitoring thread
unregister itself.
Fix it by keeping the snapshot monitored after invalidation - just reset
threshold to not really print any messages to syslog.
If the PV has two metadata areas, second one is located at the end of the device.
Do not allow resize of PV or second metadata area can be overwritten.
(The check was active only for orphan PVs.)
Fixes problem when after downconvert to lvm1 VG is broken:
# lvcreate -n lv1 -l 4 vg_test
Invalid LV in extent map (PV /dev/sdb1, PE 0, LV 0, LE 0)
...
- add lvcreate rejects repeated invocation test
- fix pvs metadata test for partial failure test
- add pvchange reject --addtag to lvm1 pv test
(All fixes by Jaroslav Stava)
Original code would create "*.so" symbolic links if there were no actual
files ending in "so". The second iteration would then cause an error
in the test logs.
Function _text_pv_write doesn't use memory pool but static buffer,
call dm_pool_free in error path in _raw_write_mda_header is wrong.
Move pool free only to path where is the memory pool used.
Do not override the default action of AC_CHECK_LIB([readline],...
(i.e., leave the ACTION-IF-FOUND parameter blank) so that the
subsequent check for rl_completion_matches can use -lreadline.
Also, replace AC_CHECK_FUNC+AC_DEFINE with an equivalent AC_CHECK_FUNCS call.
Failure to check for label_write() return code caused the following test
to indicate it passed when it really failed:
pvcreate rejects labelsector > 1000000000000
The "status" field is treated as it ever has been, unknown flags there are
treated as fatal metadata errors. However, in the "flags" field, any unknown
flags will be ignored and silently dropped. This improves
backward-compatibility possibilities. (Any versions without support for this
new "flag" field will drop the field altogether, which is same as ignoring all
the flags there.)
failed to link against liblvm2cmd.
Dmeventd DSOs *require* lvm2cmd to be linked in.
For the future:
1) AC_SUBST does not create Makefile variables, only @foo@-style substitutions
2) When using `test', whitespace around `=' is essential:
test a=b is true, as is test a=a
The warning is bogus and is only seen on certain versions of gcc.
However using the enum does seem to clarify the intent of the code - only
3 possible md minor superblock versions.
Related compiler warning:
device/dev-md.c:53: warning: 'sb_offset' may be used uninitialized in this function
* configure.in (LVM2CMD_LIB): Define if --enable-cmdlib.
* dmeventd/mirror/Makefile.in (CLDFLAGS): Use $(LVM2CMD_LIB) rather
than hard-coding -llvm2cmd.
* dmeventd/snapshot/Makefile.in (CLDFLAGS): Likewise.
* configure.in: Define READLINE_SUPPORT not when processing
--enable-readline or --disable-readline, but rather only after
determining that readline support is desired and the readline
library is available/usable.
Related compiler warning:
log/log.c:242: warning: declaration of 'error_message_produced' shadows a global declaration
../include/log.h:98: warning: shadowed declaration is here
The new error checking code caught some commands that were returning '0' as
an exit status for success. This is incorrect and resulted in a benign error
message displayed (see below). As of today, all commands should return a
value defined in lib/commands/errors.h (1-5). This results in an exit code of
0 on success, or > 0 on failure (as stated in the lvm.8 man page).
Before change:
1. Make sure no PVs are on the system
2. Run 'pvs'
Command failed with status code 0.
After change:
<no output>
Specific test case:
1. pvcreate /dev/loop1; vgcreate vg1 /dev/loop1; lvcreate -L 64M -n lv1 vg1
2. vgremove vg1 (will prompt user)
3. CTRL-C
Code will exit with:
Do you really want to remove volume group "vg2" containing 2 logical volumes? [y/n]:
Volume group "vg2" not removed
Command failed with status code 5.
Internal error: Volume Group vg2 was not unlocked
Device '/dev/loop1' has been left open.
After change:
Do you really want to remove volume group "vg2" containing 2 logical volumes? [y/n]:
Volume group "vg2" not removed
Command failed with status code 5.
which is not used since the switch away from async version saLck
. num_nodes should equal to member_list_entries, i.e.
joined_list_entires is 0 when a node leaves the group.
Thanks to Xinwei Hu for the patch.
It does 2 things.
1. The cpg_deliver_callback make a compare between target_nodeid and our_nodeid.
It turns out openais set target_nodeid to 0 sometimes. for broadcasting ? I change the behavior so that lvm will process_remote also on target_nodeid == 0
2. The joined_list passed to cpg_confchg_callback doesn't include the already exist nodes in the group, which leads to an incomplete node_hash. I simply add all other nodes in member_list to node_hash also.
Thanks to Xinwei Hu for this patch.
Handles non-clustered as well as clustered. For clustered,
the best we can do is try exclusive local activation. If this
succeeds, we know it is not active elsewhere in the cluster.
Otherwise, we assume it is active elsewhere.
This bug has been around for a long time as far as I can tell.
Without this fix, a vgsplit would unconditionally move the
'hidden/internal' snapshot LVs, and result in corrupted metadata
in the following case:
vg1: contains lv1, lv1snap, both on pvset1
vg1: contains lv2, on pvset2
"vgsplit vg1 vg2 pvset2"
would result in "snapshot0" hidden LV being moved to vg2, and
the origin and cow being left in vg1. The tools detect the
corruption in vg2, but not in vg1.
When vg_lock_and_read() calls were added, they were done so incorrectly for
the destination VG (vg_to). This resulted in the VG lock not obtained when
a new VG was the destination (vg_lock_and_read() would fail in the vg_read()
clause, which would then release the lock before returning NULL), and could
result in corrupted destination VG.
The fix was to put back the original lock_vol() and vg_read() calls for 'vg_to'.
The failure of vg_read() indicates "vg does not exist", and we key off that
to determine whether we are dealing with a new or existing VG as the
destination.
The first two error messages were also the result of the incorrect
vg_lock_and_read() calls:
Volume group "new" not found
cluster request failed: Invalid argument
New volume group "new" successfully split from "vg"
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=438249
BEFORE:
tools/lvm lvresize -l +4 vg22/lv1linear
Volume group "vg22" not found
Volume group vg22 doesn't exist
AFTER:
tools/lvm lvresize -l +4 vg22/lv1linear
Volume group "vg22" not found
- Add validation on pv_count, lv_count, and snap_count after split
NOTE: Some of these counts are misleading. If you compare "lvs" output
with these counts you will be left scratching your head what a "logical volume"
really is. ;-)
Fix unfilled paramater passed to fsadm from lvresize
Update fsadm to call lvresize if the partition size differs (with option -l)
Fix fsadm to support vg/lv name (like the rest of lv-tools)
Fix 'make check' to use DMDIR to check DM_DEV_DIR support in dmsetup.
Add basic test cases for mirrored LV.
Add basic test cases for lvconvert mirror.
Add basic test cases for pvmove.
Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Add new vgsplit and vgmerge tests.
Dave Wysochanski <dwysocha@redhat.com>
* test/t-000-basic.sh: Invoke initial test of lvm with its "version"
argument, so that the behavior of the tool doesn't depend on whether
readline was enabled at configure time.
Author: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Committer: Jim Meyering <meyering@redhat.com>
Fix missing VG unlocks in some pvchange error paths.
Add some missing validation of VG names.
Rename validate_vg_name() to validate_new_vg_name().
Change orphan lock to VG_ORPHANS.
Change format1 to use ORPHAN as orphan VG name.
This makes the tests more reproducible and helps isolate
them from any existing LVM set-up.
* test/Makefile.in (abs_builddir): Define.
(init.sh): Emit definition of abs_builddir.
* test/lvm-utils.sh (unsafe_losetup_): Keep only the portable,
iterative approach.
(dmsetup_has_dm_devdir_support_): New function.
(init_root_dir_): New function.
Invoke init_root_dir_ for all but the first test.
* test/test-lib.sh (this_test_): Adapt to test-name change.
Invoke lvm-utils.sh much later (after tmpdir creation), and
only if the current test is not being skipped.
Remove useless abs_top_srcdir definition.
Rename t0->test_dir_rand_.
* test/t-lvcreate-pvtags.sh: Skip this test if the available
version of dmsetup is not new enough.
Use global, $G_dev_, rather than hard-coded "/dev".
* test/t-lvcreate-usage.sh: Make --verbose output more useful.
Author: Jim Meyering <jim@meyering.net>
Committer: Jim Meyering <meyering@redhat.com>
* dmsetup/dmsetup.c (DEV_PATH): Remove definition.
(parse_loop_device_name): Add parameter: dev_dir.
Declare the "dev" parameter to be "const".
Use dev_dir, not DEV_PATH. Handle the case in which dev_dir
does not end in a "/".
(_get_abspath): Declare "path" parameter "const", to match.
(_process_losetup_switches): Add parameter: dev_dir.
Pass dev_dir to parse_loop_device_name.
(_process_switches): Add parameter: dev_dir.
Pass dev_dir to _process_losetup_switches.
(main): Set dev_dir from the DM_DEV_DIR envvar, else to "/dev".
Call dm_set_dev_dir.
* lib/libdm-common.c (dm_set_dev_dir): Rewrite to be careful
about boundary conditions, now that dev_dir may be tainted.
* man/dmsetup.8: Mention $DM_DEV_DIR.
Author: Jim Meyering <meyering@redhat.com>
DMSETUP=: to disable dmsetup checks (but let the script run
nevertheless); warn the user if this is the case
b) put the non-root and dmsetup warnings both at start and end of
output
Print just one line:
Use `COMMAND --help' for more information.
after "real" diagnostic(s), rather than all of the usage lines.
Otherwise, the 30-40+ lines of --help output could obscure the real diagnostic.
Author: Jim Meyering <jim@meyering.net>
* test/Makefile.in (srcdir, top_srcdir): Use @srcdir@, etc.
(top_builddir, abs_srcdir, abs_top_builddir, abs_top_srcdir): Likewise.
(so_name): Remove definition.
(.bin-dir-stamp): No longer create symlink in $(DMDIR) tree.
Prompted by suggestions from Alasdair Kergon.
* test/t1000-lvcreate-usage.sh (cleanup_): Redirect to a file,
rather than to /dev/null.
Change wording of some test titles.
Suggestions from Alasdair Kergon.
* test/Makefile.in: Add a copyright notice.
* test/lvm-utils.sh: Likewise.
* test/mkdtemp: Likewise.
* test/t0000-basic.sh: Likewise.
* test/t1000-lvcreate-usage.sh: Likewise.
* test/t3000-lvcreate-pvtags.sh: Likewise.
* test/t4000-pv-range-overflow.sh: Likewise.
* test/test-lib.sh: Likewise.
Author: Jim Meyering <jim@meyering.net>
* test/t1000-lvcreate-usage.sh: New tests.
* test/Makefile.in (T): Add it.
Derived from test cases by Dave Wysochanski.
Author: Jim Meyering <jim@meyering.net>
* test/Makefile.in (so_name): Use @DMDIR@.
(.bin-dir-stamp): Create symlink only if @DMDIR@ is nonempty.
(lvm-wrapper): Emit LD_LIBRARY_PATH setting only if @DMDIR@ is nonempty.
Based on a patch from Jun'ichi Nomura.
Author: Jim Meyering <jim@meyering.net>
* configure.in: Convert a relative dmdir directory name to the required
absolute form, e.g. in ./configure --with-dmdir=../device-mapper
Suggestion from Jun'ichi Nomura.
* configure: Regenerate.
Author: Jim Meyering <jim@meyering.net>
* Makefile.in (check): New target.
* configure.in (AC_CONFIG_FILES): Add test/Makefile.
* configure: Regenerate.
* test/.gitignore: New file.
* test/Makefile.in: New file.
* test/lvm-utils.sh: New script.
* test/mkdtemp (die, rand_bytes, mkdtemp): New script.
* test/t0000-basic.sh: New tests.
* test/t3000-lvcreate-pvtags.sh: New, failing test.
Derived from a script by Jun'ichi Nomura.
* test/t4000-pv-range-overflow.sh: New test.
* test/test-lib.sh: Testing framework, based on the one from git.
Author: Jim Meyering <jim@meyering.net>
* tools/toollib.c (xstrtouint32): New function.
(_parse_pes): Use xstrtouint32; don't cast strtoul's unsigned
long to uint32_t. Detect overflow.
Author: Jim Meyering <jim@meyering.net>
* lib/device/dev-io.c (dev_open_flags):
Use log_sys_error after failed stat to report strerror(errno).
Use a slightly different diagnostic to report mismatched device number.
(MIRROR_NOTSYNCED) is added to the LVM metadata. This flag is
not cleared when converting to linear. Subsequently, if you
up-convert the linear to a mirror, the flag remains - even though
an up-convert will always force a complete resync.
Move yes_no_prompt() into library (display.c, display.h).
Fixup includes as a result of movement of prior two functions.
Fixup force_t enum to be more descriptive.
log type. Previously, we had a '--corelog' argument that
would change the default type from 'disk' to 'core'. I
think that creates too much confusion - especially when
doing conversions on mirrors.
The new argument '--log' takes either "disk" or "core"
as a parameter. This could be expanded in the future
for additional logging types as well.
Examples:
# Creating a 2-way mirror
$> lvcreate -m1 ... # implicitly use default disk logging
$> lvcreate -m1 --log disk ... # explicit disk logging
$> lvcreate -m1 --log core ... # specify core logging
$> lvcreate -m1 --corelog ... # old way still works
# Conversion examples
$> lvconvert --log core ... # convert to core logging
$> lvconvert --log disk ... # convert to disk logging
$> lvconvert -mX --corelog ... # old way still works
$> lvconvert -mX ... # old way of converting to disk logging still works
Changes are reflected in the man pages.
* lib/misc/lvm-file.c (lvm_fclose): New function.
* lib/misc/lvm-file.h (lvm_fclose): Declare it.
* lib/config/config.c (write_config_file): Use the new function to detect
and diagnose unlikely write failure.
* lib/filters/filter-persistent.c (persistent_filter_dump): Likewise.
* lib/format_text/archive.c (archive_vg): Likewise.
* lib/format_text/format-text.c (_vg_write_file): Likewise.
* lib/log/log.c (fin_log): Similar, but use dm_fclose directly.
Include "\n" at end of each fprintf format string.
* dmeventd/dmeventd.c (_set_oom_adj): When writing to /proc/self/oom_adj,
detect failure even if it's hidden behind ferror. [Using dm_fclose's
extra ferror test here is probably not needed, since the amount written
is nowhere near BUFSIZ, but use it regardless, for consistency. ]
* lib/fs/libdevmapper.c (do_suspend): Detect fclose failure when
writing to suspend.
* lib/misc/lvm-file.h (is_same_inode): Define.
* lib/filters/filter-persistent.c (persistent_filter_dump): Use is_same_inode
in place of a direct st_ino-only comparison.
* lib/locking/file_locking.c (_release_lock, _lock_file): Likewise.
config/config.c:493: warning: format '%d' expects type 'int', but argument 5 has type 'long int'
Modified original patch from Jim Meyering <jim@meyering.net>
There are two fixes other than improving variable names and updating code
layout etc.
The loop counter is incremented by area_len instead of area_len * stripes;
the 3rd _check_stripe parameter is no longer multiplied by number of stripes.
Use loop to iterate through the now-ordered policy list in _allocate().
Check for failure to allocate just the mirror log.
Introduce calc_area_multiple().
Support mirror log allocation when there is only one PV: area_count now 0.
(See lvm-devel list archives for further details.)
e.g. lvcreate -l 100%FREE to create an LV using all available space.
lvextend -l 50%LV to increase an LV by 50% of its existing size.
lvcreate -l 20%VG to create an LV using 20% of the total VG size.
Include mirror log (untested) in _for_each_pv() processing.
Use MIRROR_LOG_SIZE constant.
Remove struct seg_pvs from _for_each_pv() for generalisation.
Avoid adding duplicates to list of parallel PVs to avoid.
event-driven model. Without changes to the way the cache gets updated, the
option is currently unreliable without a global lock to prevent any lvm2
commands from running concurrently.
Add --config for overriding most config file settings from cmdline.
Quote arguments when printing command line.
Remove linefeed from 'initialising logging' message.
Add 'Completed' debug message.
Don't attempt library exit after reloading config files.
Always compile with libdevmapper, even if device-mapper is disabled.
Fix some memory leaks in error paths found by coverity.
Use C99 struct initialisers.
Move DEFS into configure.h.
Clean-ups to remove miscellaneous compiler warnings.
[Some activation-related features will stop working for a while now.
Some types of activation are getting split into two steps, with the
first step using the precommitted metadata.]
The daemon side of this is mostly the same as the patch I sent out. To select
a timeout period different than the default and to get the timeout period,
I added two library calls, dm_set_event_timeout() and dm_get_event_timeout().
If people are against them, the other option is to tack extra arguments onto
dm_regiser_for_event() and dm_get_registered_device(). I also added a
-t option to dmevent, so people can try out timeouts.
log types. This means the threaded_syslog type is no longer valid. A new
fxn multilog_async is available to toggle between the two modes. If an
app is compiled without pthreads and tries to use async logging, no logging
will occur while async is enabled.
dmeventd has been modified to use the new code
I'm not positive I like the way the async_logger code calls the log fxn,
but it works for now. Suggestions for other ways to do it would be helpful
- don't echo after an 'action' call - action does the echo itself
- use vgdisplay/vgs to determine which VGs are marked clustered and only
deactivate those VGs (unless the LVM_VGS var is set in
/etc/sysconfig/cluster)
- multilog_add_type, multilog_del_type, multilog_custom, and
multilog_init_verbose all have different arguments.
- Primary change is that caller only passes in config info, and the
lib keeps track of state internally. No more exporting of
'struct log_data'.
- Custom callers now only get the custom data pointer passed into their
log fxn (that is set with multilog_custom)
- Added basic README that describes libmultilog
# dmevent -l
Also, changed the behaviour of dm_get_registered_device(), so that it doesn't
change the pointer you passed in without freeing the memory on a non-next call,
and doesn't free your pointer without setting it to NULL on a failed next call.
multilog_add_type()/multilog_del_type cycles correctly.
o fixed segfault in multilog_add_type()
o fixed test-multilog.c
o cleaned up libmultilog (list macros, indentation, braces, comments)
- add event_nr to thread_status struct and set appropriately so that the
thread actually waits for an event
- essentially make error_detected return true. Let the DSOs determine
how to interpret the status info
o more tweaks to libmultilog calls - the api isn't set in stone yet, so
don't get too comfortable.
o not sure the dmeventd in device-mapper/dmeventd works - i've been using
the one in lib/event/
o currently both daemons are set to log only to syslog
o changed
int dm_get_next_registered_device(char **dso_name, char **device,
enum event_type *events);
to
int dm_get_registered_device(char **dso_name, char **device,
enum event_type *events, int next)
so that the daemon is able to retrive the next one of the list without
running into locking issues.
o changed dmevent.c to use dm_get_registered_device()
o couldn't test this yet because of the comms issues
(daemon exits in do_process_request())
Improve reporting of node-specific locking errors so you'll get
somthing a little more helpfiul than "host is down" - it will now tell
you /which/ host it thinks is down.
./configure --with-clvmd
wil do this by default. Or you can choose which you want with
./configure --with-clvmd=gulm or
./configure --with-clvmd=cman
When clvmd with both included is run, it will automatically detect the cluster
manager in use.
Additional verbosity level -vvvv includes line numbers and backtraces.
Verbose messages now go to stderr not stdout.
Close any stray file descriptors before starting.
Refine partitionable checks for certain device types.
Allow devices/types to override built-ins.
and further reduce the number of ioctl calls made.
o Metadata area struct change.
o Make config file accessible to activation functions & get stripe_filler
from it.
o Allow kernel to return snapshot status as a fraction or a percentage.
checks clearer (incl. variable renaming); using a flag to indicate when
output data doesn't fit into supplied buffer instead of returning an error etc.
Clear many compiler warnings (i386) & associated bugs - hopefully without
introducing too many new bugs:-) (Same exercise required for other archs.)
Default compilation has optimisation - or else use ./configure --enable-debug
to back this out so you can do that commit, let me know. Also, if there's
an issue with the error message that's displayed, just change it in tools.h.
This causes a "device-mapper driver/module not loaded?" error message to
be displayed for the commands that require dm-mod, if the tools can't get
the driver version. It's not done for commands that don't require dm-mod.
This should clear up some problems people have had attempting to use lvm2
without rtfm'ing.
allocation policy. This can currently take one of three values:
typedef enum {
ALLOC_NEXT_FREE,
ALLOC_STRICT,
ALLOC_CONTIGUOUS
} alloc_policy_t;
Notice that 'SIMPLE' has turned into the slightly more meaningful NEXT_FREE.
ii) Put code into display.[hc] for converting one of these enums to a
text representation and back again.
ii) Updated the text format so this also has the alloc_policy field.
o Various other kernel side tidy-ups.
o Version number changes so we have the option of adding new ioctl commands
in future without affecting the use of existing ones should you later
revert to an older kernel but not revert the userspace library/tools.
o Better separation of kernel/userspace elements in the build process to
prepare for independent distribution of the kernel driver.
preparsed status info, shove it all into a string, and then parse it
again to get the info back out (which is what i was doing before)
o basically that's it...i like this *much* better than the previous
method and i think it makes the _status fxn more flexible if we need
to use it to get other info out.
o Not sure if the code in dev_manager is really optimal, but it works..
will look at adjusting it a bit now.
o I *think* it works right when one snapshot if full but others aren't,
but I haven't really been able to test it because the full snapshot
somehow resets itself and weird things start happening to the system...
o There is still a bit missing
+ all are missing the {PV,VG,LV} # - that is not applicable in LVM2
+ pvdisplay doesn't show how many LVs are contained on it
+ much of the snapshot information isn't available for lvdisplay
o Look at the code for other potiential FIXMEs :)
Lots of changes/very little testing so far => there'll be bugs!
Use 'vgcreate -M text' to create a volume group with its metadata stored
in text files. Text format metadata changes should be reasonably atomic,
with a (basic) automatic recovery mechanism if the system crashes while a
change is in progress.
Add a metadata section to lvm.conf to specify multiple directories if
you want (recommended) to keep multiple copies of the metadata (eg on
different filesystems).
e.g. metadata {
dirs = ["/etc/lvm/metadata1","/usr/local/lvm/metadata2"]
}
Plenty of refinements still in the pipeline.
Patrick, can you see if this fixes your cluster syncing problem please ?
If so I'll make it so it only syncs if you have actually written to the
device.
from lock_vol() - otherwise it now attempts to acquire the lock and then
immediately releases it.
o Extend the id field in struct logical_volume to hold VG uuid + LV uuid
for format1. This unique lvid can be used directly when calling lock_vol().
o Add the VG uuid to vgcache to make VG uuid lookups possible. (Another
step towards using them instead of VG names internally.)
o #defines for common lock flag combinations
o Try out hyphens instead of colons in device-mapper names - does this
make messages containing filenames easier to read?
o rewrote activate.c to use dev-manager, I'm sure these two will merge
at some point.
o Rename is broken ATM
o dev-manager puts the calls through to fs.c for layers that have the
'visible' flag set.
o Use first unused lv_number when creating new LV
o Use lv_number for refs to snapshots
o Update persistent minor logic after the lvcreate restructure
o Reset all parameters before use in lvcreate.
and add severe warning if it's used to make a device seem bigger than
it really is. This is not an option people should be using as it
breaks metadata integrity.
o Use uint64_t throughout (rather than unsigned long long)
o Convert a few messages that contain pathnames into the more common form:
pathname: message
I'm taking a different route from LVM1 here in that snapshots are a
seperate entity from the logical volumes, I think of them as an
application of an LV (or two lvs rather). As such there is a list of
snapshots held against the vg, and there is *not* a SNAPSHOT, or
SHAPSHOT_ORG flag in lv->status.
all since it only supports vg_write. It has been replaced with:
int archive_vg(struct volume_group *vg,
const char *dir,
const char *desc,
uint32_t retain_days,
uint32_t min_archive);
which is now called directly by tools/archive.c
o Text format now has a description and time field at the top level.
o archiving and backup set the description appropriately. eg,
for an archive:
description = "Created *before* executing 'lvextend test_vg/lvol0 -l +1'."
creation_time = 1013166332
for a backup:
description = "Created *after* executing 'lvextend test_vg/lvol0 -l +1'."
creation_time = 1013166332
This is preparing the way for a simple vgcfgundo command.
slightly different from the current LVM1 method.
lvcreate --persistent y --minor 10 (to specify when created)
lvchange --persistent n (to turn off)
lvchange --persistent y --minor 11 (to change)
--persistent uses a new LV status flag stored on disk
minor number is stored on disk the same way as LVM1 does
(but major number stored is 0; any LVM1 major/minor setting gets lost)
lvchange -ay --minor 12 (to activate using minor 12, regardless of the
on-disk setting, which doesn't get changed)
--minor == -m
--persistent == -M
missing from a VG. (Linear targets use the device-mapper 'error' target
which returns ioerror; striped targets use '/dev/ioerror' for now - which must
already exist e.g. as a sufficiently large block device version of /dev/zero).
by allocating the data block with an additional dbg_malloc.
o Added an assertion to check that no one is requesting alternate
alignment for memory allocated from pool. I can't see us needing this
for LVM2.
Otherwise LVM1 decides the PV structure is corrupt.
But do we need to keep both pv->pe_size and vg->extent_size
in internal metadata or can we generate pvd->pe_size when writing out
a PV that belongs to a VG?
This should be a rare occurrence so the aim is to recover if it's
straightforward to do so, otherwise just to abort the operation.
If people *knowingly* change device names, they should always run vgscan
afterwards.
A few bytes of memory gets leaked inside a pool each time an alias
has to be discarded - it's not worth restructuring the code to reuse it.
More of LVM2 needs updating to pass device objects (or uuids) about
instead of pathnames so that resolution of pathname->object only happens
once per operation.
dev_cache_get() should now always return the *current* device at the path given
dev_name_confirmed() replaces dev_name() whenever it's important to
know that name for the device is still current (ie when opening it).
If the cache doesn't know a current name, the function fails.
dev_open() guarantees that the file descriptor returned is for the dev_t
of the device structure it was passed.
o When opening device, return error if its cached name is incorrect (eg if
it's changed since the cache was generated). This prevents use until
the cache is rebuilt (eg with vgscan). Doesn't catch every case.
struct pv_list {
struct list list;
struct physical_volume pv;
};
to
struct pv_list {
struct list list;
struct physical_volume *pv;
};
o New function in toollib 'create_pv_list', which creates a list of pv's
from a given command line array of pv's.
o Changed lvcreate/extend to use this (fixes lvextend [pv list] bug).
Kernel driver has a version number (stored in kernel/VERSION).
The first two components of this (0.94) give the version number of the
ioctl interface. This number must be changed whenever a change is
made to the ioctl interface that breaks backwards compatibility.
The library has a version number (stored in VERSION) which is
used for linking.
The first and/or second component of this must be changed whenever
a change is made to the library API that breaks backwards
compatibility.
onto a new device). uuid specified must not already exist on the system.
o More message tidying.
o When checking for label, only read PV metadata.
o Add ataraid. [Needs moving into config/defaults files.]
o updated vgcfgrestore args
o change _check_for_open_devices only to check devices present in the hash
table instead of using dev_iter which triggers a full scan even when only
displaying command line help
Supply offset to start of variable data area (so struct size can change
without breaking backward compatibility)
Add command that just returns the driver version
o roll vgcache back to agk's implementation, we'll revisit this as part
of the cluster integration.
o change the extra_info field in a label to be a void *
is active in the device-mapper.
o Many operations can be carried out regardless of whether the VG is
active or not.
o vgscan does not activate anything - use vgchange.
o Change lvrename to support renaming of active LVs.
o Remove '//' appearing in some pathnames.
o Dummy lv_check_segments() for compilation.
I've split the old autobackup function into two seperate areas:
'archiving' is performed *before* a vg configuration is changed. This
produces a numbered backup in /etc/lvm/archive.
A 'backup' is performed *after* a vg change. So the directory /etc/lvm/backup
will hold the a copy of the current configuration.
Current version of LVM2 instead relies on /usr/include/libdevmapper.h
which gets installed by the device mapper package.
(Should this location now be configurable?)
at the top of the file.
o Changed completion_matches -> rl_completion_matches, and added some consts.
This will probably break things on pre readline 4.2 systems.
o There is now a _default_debug, and _default_verbose level, when
using lvm interactively -vv and -dd switches just effect the current
command.
o Added a --quiet switch which sets both verbose and debug to zero.
o Introduced the LVM_SYSTEM_DIR variable.
This makes more sense because the persistent cache, and backup directories
are config specific.
eg, I use /etc/lvm for running my real LV's
but I have another directory /dev/lvm_loops that contains a config
that allows only loopback devices, I use this for testing.
o You must list long args with no short option (eg. --version) at the
front of the args.h file.
o If an argument has no short option, set the short option in args.h to '\0'
o The index into the 'the_args' var is now stored as the option value
for getopt, iff there is no short opt.
A substantial speed-up - particularly in readline mode.
If the hints turn out to be wrong, the relevant parts get thrown away.
vgscan destroys it totally. In both cases it then rebuilds itself as
required.
- The iterator can find labels by string and also appropriate version number (==,
<= or any) if you want.
- Add labels_match() call that compares the two labels and returns an error if
they do not match.
- Write labels in sector 1 & last rather than 2 & last as per Joe.
o Changed pv_map.c to maintain the list of free areas in size order, which
is more helpful to the allocators. If you want to allocate a bit of an
area call consume_area(area, size), this will adjust the area if there's
some space left and shuffle it to the correct place in the list.
Not tested.
old dmfs-lv.c thats gone.
o Dropped out support for multiple tables in line with ioctl interface
o Some reordering to better support the userland library
o Updated to 2.4.16
I'm fairly happy with the way that this is working now, so the next job is
to start the integration with the ioctl interface so there is a single
common dm.[ch] and selectable interfaces (fs or ioctl).
Further improvements can be made even now, but I hope to wait until we've
got this going and integrated and the libdm parts working as well before
investigating other avenues.
a seperate chunk of memory from dbg_malloc for each pool_alloc. This
will allow the bounds checking code in dbg_malloc to do it's stuff.
o The normal implementation moved to pool-fast.c
o pool.c now just contains a #ifdef and includes the appropriate .c file.
Alasdair, could you make sure that gcc -MM get's passed all the
CFLAGS please, otherwise the dependencies get calculated incorrectly.
logical volumes. It includes:
format1 changes.
metadata.h changes.
lv_manip.c changed (striped allocation still not done though).
activate.c changes.
Nothing has been near a compiler as yet.
Alasdair can you look at changing display.c to use to output the mappings
in a more segment oriented format please ?
I haven't put the span list into struct physical_volume to represent allocated
extents. I think the burden of maintaining it for things like lv_extend may
out weigh it's uses.
o Changed disk-rep to use these
o if NDEBUG is not defined the dev_cache will check for open devices on
teardown.
I was hoping this would speed things up. But I'm still getting:
reti:/home/joe/sistina/LVM2/tools# time ./lvm vgchange -a n
Volume group vg0 successfully changed
real 0m5.751s
user 0m0.060s
sys 0m0.070s
even though I have only 1 device with the vg on it passing the filters.
N.B. This means that you have to take very great care in the event that
you want to access the dcache tree from in kernel
o Added extra field to allow out of memory conditions to result in the
correct error code. (This hasn't received a lot of testing...)
I've ditched the final project (which would have cleared my whole list)
since its got other complications which I don't have time to fix right
now. Still as Meatloaf says, two out of three ain't bad!
o Full signed arguments to lvreduce/lvextend
o Consistent lv_number/pe map use
o Populate pv->pe_allocated
o Fixes for allocation/writing of multiple LVs
o Created dmfs.h as a private header for the filesystem code
o Using seq_file.[ch] written by Al Viro as a generic mechanism for /proc
style files which have one record per line. We use a slight modification
here, so if you are using a recent -ac kernel you'll need to replace the
existing seq_file.[ch] with the ones here and do a bit of editing to make
it work. I'll submit the changes to Al Viro shortly as they are very
small and I think make sense generally.
o Using fail_writepage()
o Init code for filesystem now all in dmfs-super.c
o Some common code reduction amoung the dmfs-*.c files
o Auto allocation of major device number (default). You can specify a
particular major by using a module argument. If built in then you don't
get this option at the moment but it could be added if required.
o Hotplug support
o General tidying
o Updated projects.txt file
o Patches updated to 2.4.14
Please add to/edit this file as you think of new ideas or discover bugs. The
items in it are in no particular order. They are also only ideas and hence may
never get implemented depending on whether they turn out to be good ideas or
not.
filters in order.
eg,
f = composite_filter_create(2, regex_filter, persistent_filter);
ownership of the filters passes, they will be destroyed when f's
destroy method is called.
devices {
# first match is final, eg. /dev/ide/cdrom
# get's rejected due to the first pattern
filter=["r/cdrom/", # don't touch the music !
"a/hd[a-d][0-9]+/",
"a/ide/",
"a/sd/",
"a/md/",
"a|loop/[0-9]+|", # accept devfs style loop back
"r/loop/", # and reject old style
"a/dasd/",
"a/dac960/",
"a/nbd/",
"a/ida/",
"a/cciss/",
"a/ubd/",
"r/.*/"] # reject all others
}
Alasdair this is ready to roll into the tools now.
and builds a *very* efficient engine that will tell you which regex a string
matches with only a single pass through the string. To be used in the config
file when specifying devices.
o Anchor's aren't supported yet (^ and $) but that won't take long.
o Also when we get some realistic config files we may want to consider adding an
extra level of indirection to the dfa state in order to compress the table.
It all depends on how large typical tables get.
Things to note:
o Changes to the dm-*.c files have been kept as small as possible during
the development of the new fs interface and there are a few places where
the new code does odd things to give the original code what it wants. These
places will gradually go away during the next few days once we are sure the
new code is sound.
o I've spent most of my testing time looking at the parser since thats where
a lot of the changes are, I've not checked the actual I/O very much, but
then that code hasn't changed at all.
o The print operation in the target type operations is there to help in
debugging and will go away eventually
o There are some other printk's which will also go away once we are sure that
things are working correctly.
o I've tagged the old code with PRE_DMFS if you want to use that until this is
stable.
o There are no kernel patches for this yet (will fix after lunch... :-)
o Makefile needs some changes
o need to EXPORT_SYMBOL(deny_write_access); in ksyms.c
How to use the new interface ?
mount -t dmfs dmfs /mnt/dm
cd /mnt/dm
mkdir fish fish/tank
cd fish/tank
cat ~/my.table > table
cd ..
ln -s tank ACTIVE
Creates a logical volume called fish and activates a table called tank, if
there is a problem doing the link, look in /mnt/dm/fish/tank/errors to see
what is wrong.
If you see any odd things happening, let me know right away as I'm sure there'll
be one or two things that slipped through my testing.
o Error file routines (initial idea)
o Various fixes for bugs
o Tidy a few things
o Added a bit of debugging code ready for when this gets tested
o get_exclusive_write_access() function which will get moved into namei.c
I hope (and rewritten accordingly), should this become the final version
used.
Still a few more areas need thinking about, but in general much closer now I
think. Last area to sort out before testing is the symlink code which is
pretty close now... just a few more checks needed and the actual calls to
the core code.
o Fixed to work with highmem
o Added dmfs private inode struct for lv and table directories
o Fixed a number of errors/typos
o Status file read returns 0 so we can leave this until we've actually got
something to report in this now.
o New locking on tables.... still some issues to be worked out here but
closer now I think.
o Now use mapping of table directory to hold pages rather than mapping of
table file inode. Need to write a note to myself to fix issues with the
file length at the same time....
Well thats enough for tonight I think. The error file will be part of
tomorrows work.
o Redo write logic for table file
o Relax rules for symlink content by removing the rewriting function
Well I probably won't get a chance to work on this tomorrow, so this is my
changeset to date.
non-obvious, its time to simplify :-)
o Moving towards a simpler and more obviously correct interface
o Removed some fs operations in directories representing volumes
o Changed some file names
o Made things cleaner
more changes to follow...
o Changed dm_remove() to accept a struct mapped_device argument rather than
a name
o We no longer have to look up devices by name, the dcache handles that
nicely for us
o Fixed a bug where we were freeing a structure before we'd finished with
it.
o The name field in struct mapped_device is now only used in a very few
places in dm.c and will be replaced in future with a back reference to
the dentry rather than keeping the name in two places.
o dmfs-dir.c becomes dmfs-lv.c
o dmfs-file.c becomes dmfs-table.c
o A few tweeks and updates
The main reason for the slow progress on these files (which are not yet used
by the device mapper) is that we are working out what this interface should
look like as we go along.
Once this has evolved a bit further and in a state where it can be used we'll
announce it on the lists for further comment.
o Only one list of block devices for all tables
o Locking to ensure that block devices only get opened once
o Block device handling is now in dm-blkdev.c
o We open block devices when we create the tables and hold them open until
the table is destroyed (this prevents the module for the device being
unloaded after table parsing and before the table is used)
o We compute the hardsect size when the table is created rather than when
someone requests it.
Still to fix/change:
o Probably want to hash the device lists in dm-blkdev.c and also remove refs
to struct dm_bdev outside this file.
o Need to ensure that hardsect_size doesn't change when new tables are
swapped in (maybe this ought to be a per volume parameter and the tables
will only parse if they match the value for the volume?).
Things are changing fast here, so if you want a stable version of thic code
try checking out yesterdays.
o Moved the linear target into its own module (not really because it needs to
be there, but because its useful to have a simple example so people can see
what we are doing)
Btw, this needs testing properly.
to up_write() and down_write() etc so that you can see what kind of a lock
it is (otherwise it could be anything.. semaphore, spinlock, spinlock_bh,
spinlock_irq, br_lock, etc.)
Mount the dm-fs filesystem on /device-mapper (will fix later). mkdir
to create a device, inside that directory every file you create is a table
file. If there are errors <table>.err will appear automagically. Mv a table
file to ACTIVE to activeate the device. I'm not happy with mv being the
binding command, symlink would be better.
devices based on /proc/devices
+ The dev_mgr structure now has a 256 element char array that is
initially all 0s
+ When a match is found, the array element corresponding to the major
number of the match is set to a non-zero value
+ to check for a match, all one has to do is check that the array
element at the major number in question is non-zero.
o I'm wondering if we should do this with bitwise operators instead? Does
anyone expect the major numbers to grow larger than 8-bits?
o I don't like it, but I'm committing it so I can go back and laugh at
myself later
o I have a (hopefully) better idea that i'll try to commit yet today.
o I'm working on caching the /proc/devices entries now, and should have
that in by the end of today or early tomorrow.
o There will be much cleanup involved with that...
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.