IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area. Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.
This function is necessary to properly implement pv_set_mda_ignored().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
A metadata_area is defined independent of the location. One downside
is that there is no obvious mapping from a pv to an mda. For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv. This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.
I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.
An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV. However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
First we add a 'flags' field to the location independent
metadata_area structure, and a MDA_IGNORE flag. The
mda_is_ignored and mda_set_ignored functions are added to
manage the flag. Adding the flag and functions gives a
library interface to ignore metadata areas independent of
the underlying location (disk, file, etc). The location
specific read/write functions must then handle the specifics
of what this flag means to the location.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Some commands start with a pvname, but we'd like to force users to
start with a vg handle to obtain a pv handle. Our best option seems
to be providing a way to look up the vgname from the pvname, and then
require them to use vg_read/vg_open.
In addition to the pvname lookup function, this patch also provides a
lookup by pvid. The lookup by pvid can be used in conjunction with
lvmcache_get_pvids to process all pvs in the system.
The pvid find function first calls lvmcache_vgname_from_pvid, which may
cause the label to be read if it is not in the cache. If the vgname is
returned is an orphan, we then check to see if there are metadata areas,
and if not, we scan every PV on the system by calling scan_vgs_for_pvs().
In most cases we should not need to do this, and by using the info->mdas
count, we avoid calling pv_read() as prior code did. So this patch is a
bit cleaner and should allow us to refactor more of the pv code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
are active mirrors or snapshots.
We don't have the mechanisms in place to change the device-mapper
tables for those targets that have behavioral differences between
cluster and single machine instances. Allowing users to change
the attribute but not changing the target's behavior can lead to
data corruption.
The following bugs are fixed/avoided by this patch:
235123 - vgchange -c [ny] do not change target types when necessary
289331 - RFE: switching from cluster domain to local domain needs to deactivate volume somehow
289541 - when changing from local to cluster, volumes can not appear to be deactivated
Allow lv_remove_with_dependencies() to know the top-level LV that was
requested to be removed (otherwise it recurses and we lose context).
A merging snapshot cannot be removed directly but the associated origin
can be. Disallow removal of a merging snapshot unless the associated
origin is also being removed.
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
lvm2app needs a link back to the vg in order to use the vg handle for
memory allocations as well as other things. This patch adds the field
to struct physical_volume, and sets pv->vg when reading a vg from disk or
extending a vg by using the helper function previously added,
add_pvl_to_vgs(). Moves and renames are handled with separate code
inside move_pv() and vgmerge(). Add pv->vg check to vg_validate().
A NULL value in pv->vg signifies membership in the orphan VG.
Note though in the case of pv_read() on a device with metadatacopies == 0,
more devices may need to be read for an authoritative answer.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Now that we have library functions to add/delete a pv from the vg->pvs
list, call them from everywhere.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a delete function to manage the vg->pvs list.
NOTE: It may be possible to do further cleanup to these add/del functions
by passing a 'pv' as input instead of 'pv_list'. The pv_list is used for
functions which do allocations (lvcreate) while other places in the code
just manage a list of 'pv' (e.g. import functions, vgextend, etc).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
A user specifying duplicate paths on the cmdline of vgcreate will
get a message similar to the following:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Internal error: Duplicate PV id jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8 detected for /dev/loop3 in vgtest2.
This is caught by vg_validate(), but it would be good to find
this condition earlier in the vgcreate code. add_pv_to_vg()
currently checks by pvname, but does not look for duplcate pvids.
This patch adds the check for duplicate pvids and results in new
error output as follows:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Physical volume '/dev/loop5 (jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8)' listed more than once.
Unable to add physical volume '/dev/loop5' to volume group 'vgtest2'.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In add_pv_to_vg(), we should only add the pv to vg->pvs after all
internal checks have passed. The check for vg->extent_count exeeding
maximum was after we added the pv to the list, so this function could
return a state of vg->pvs that did not reflect other parameters such
as vg->pv_count.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
When we pv_read() a device that has an orphan vgname, we might need to scan
the system to be sure this is true. However, if the PV has mdas, there's
no way possible for it to have an orphan vgname unless it is a true orphan.
Some areas of the code were optimized to take advantage of this fact, while
others were not (we would still do the expensive scan if a device had mdas
but had an orphan VG).
This patch unifies the code so that every place we are operating on such
a PV, we skip the expensive scan if there are mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Petr Rockai <prockai@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
We need to allocate memory for the tag and copy the tag value before we
add it to the list of tags. We could put this inside lvm2app since the
tools keep their memory around until vg_write/vg_commit is called, but
we put it inside the internal library to minimize code in lvm2app.
We need to copy the tag passed in by the caller to ensure the lifetime of
the memory until the {vg|lv} handle is released.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Similar refactoring to vgchange - pull out common parts and put into
library function for reuse. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Pull out common code to be called from tools as well as lvm2app.
Leave archive() at tool level so we can use from vgcreate
as well as vgchange. Should be no functional change.
- add stack macro in vgchange
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
All this seems to do is provide a memory leak so remove it.
The only caller of _alloc_pv() later explicitly sets
pv->vg_name = fmt->orphan_vg_name so clearly this allocation
should be removed. I also saw no where in the code where
strncpy was used to assign pv->vg_name - only direct assignments
and strdup's.
This patch tries to correctly track changes in lvmcache related to commit/revert.
For vg_commit: if there is cached precommitted metadata, after successfull commit
these metadata must be tracked as committed.
For vg_revert: remote nodes must drop precommitted metadata and its flag in lvmcache.
(N.B. Patch do not touch LV locks here in any way.)
All this machinery is needed to properly solve remote node cache invalidaton which
cause several problems recently observed.
The use_precommitted flag indicates, that we want to use precommitted metadata
(used in suspend call to preload table with precommitted data).
But if there are no such data, committed metadata are read but the cache
still contains that precommitted flag.
(The problem is that later possible drop_metadata call will not invalidate
device in cache.)
The wrong precommitted state is stored in on remote nodes during normal
suspend/resume cycle _without_ vg_write/commit.
Use the PRECOMMITTED status flag here instead (which is always set if using
precommited metadata here).
When PV device reappears with old metadata, it is
always updated to new version byt atutomatic metadata
repair.
Remove missing flag if device is empty.
If device contains allocated extents, issue warning that
user must remove volumes and re-add this PV before
manipulating with this volume.
This partially solves bug 547842 when one PV (log) is failed,
dmeventd removes that device and later this device reappears and
is wrongly added into VG marked missing.
The new recovery code first tries to repair LV and then removes failed PV
from VG. It means that during operation there can be VG with PV missing,
and vg_read code handles it like not consistent VG.
We already allows returning "inconsistent" commited metadata,
for mirror repair we need this for precommited too.
(The suspend call prepares precommited metadata to inactive table on
other cluster nodes.)
"Inconsistent" here means - correct metadata, just with some metadata areas
not found (obviously on missing or failed PVs).