IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch adds the get and partially implemented set function.
The 'set' function should probably ignore or un-ignore metadata areas
based on new values.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a field to struct volume_group to later implement metadata
balancing:
- mda_copies: target # of non-ignored mdas in the VG; default 0 (do
not control pv 'ignore mdas' bit.
This patch just adds the parameter to the structures with the default
values but does not modify any commands. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Arrange mdas so mdas that are to be ignored come first. This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.
Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)
After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3. However, if possible we try to avoid
inconsistent on-disk states. Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.
We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)
Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is clearly a consistent state. PV1 will be read but the mda will be
ignored. All other PVs contain v1 metadata, and no auto-correct will be
required. In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When we are constructing the vg, we may need to adjust the list of
metadata_areas if there are ignored mdas. At label read time, we
do not read the metadata of ignored mdas, and as a result, they do
not get placed on vg->fid->metadata_areas inside _text_create_text_instance
since lvmcache does not have these areas attached to vginfo->infos.
However, when we're checking the pvids inside _vg_read, after having
read another metadata area from another PV, we do have the opportunity
to update the metadata_area and metadata_areas_ignored lists based
on the read metadata_area. We need accurate mda lists for the reporting
functions that count the ignored mdas, as well as general correctness
of mda balancing.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
With the addition of ignored mdas, we replace all checks for an empty
mda list with a new function to look for either an empty mda list or
ignored mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a helper function to consolidate checking for an empty mdas list
or ignored mdas. Ignored mdas should behave almost identically to
an empty mda list - the metadata areas should not be read or written
to. This function will make it easier to implement metadata balancing
and easier to track pvs with an empty mda list or ignored mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Define a new pvs field, pv_mda_used_count, and a new vgs field,
vg_mda_used_count to match the existing pv_mda_count and vg_mda_count.
These new fields count the number of mdas that have the 'ignored' bit
clear (they are in use on the PV / VG). Also define various supporting
functions to implement the counting as well as setting the ignored
flag and determining if an mda is ignored. These high level functions
call into the lower level location independent mda ignore functions
defined by earlier patches.
Note that counting ignored mdas in a vg requires traversing both lists
and checking for the ignored bit on the mda. The count of 'ignored'
mdas then is defined by having the bit set, not by which list the mda
is on. The list does determine whether LVM actually does read/write to
the mda, though we must count the bits in order to return accurate numbers
for the various counts. Also, pv_mda_set_ignored must search both vg
lists for ignored mda. If the state changes and needs to be committed
to disk, the ignored mda will be on the non-ignored list.
Note also in pv_mda_set_ignored(), we must properly manage the mda lists.
If we change the ignored state of an mda, we must change any mdas on
vg->fid->metadata_areas that correspond to this pv. Also, we may
need to allocate a copy of the mda, as is done when fid->metadata_areas
is populated from _vg_read(), if we are un-ignoring an ignored mda.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a second mda list, metadata_areas_ignored to fid, and a couple
functions, fid_add_mda() and fid_add_mdas() to help manage the lists.
These functions are needed to properly count the ignored mdas and
manage the lists attached to the 'fid' and ultimately the 'vg'.
Ensure metadata_areas_ignored is initialized in other formats, even
if the list is never used.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area. Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.
This function is necessary to properly implement pv_set_mda_ignored().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
A metadata_area is defined independent of the location. One downside
is that there is no obvious mapping from a pv to an mda. For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv. This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.
I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.
An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV. However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
First we add a 'flags' field to the location independent
metadata_area structure, and a MDA_IGNORE flag. The
mda_is_ignored and mda_set_ignored functions are added to
manage the flag. Adding the flag and functions gives a
library interface to ignore metadata areas independent of
the underlying location (disk, file, etc). The location
specific read/write functions must then handle the specifics
of what this flag means to the location.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Some commands start with a pvname, but we'd like to force users to
start with a vg handle to obtain a pv handle. Our best option seems
to be providing a way to look up the vgname from the pvname, and then
require them to use vg_read/vg_open.
In addition to the pvname lookup function, this patch also provides a
lookup by pvid. The lookup by pvid can be used in conjunction with
lvmcache_get_pvids to process all pvs in the system.
The pvid find function first calls lvmcache_vgname_from_pvid, which may
cause the label to be read if it is not in the cache. If the vgname is
returned is an orphan, we then check to see if there are metadata areas,
and if not, we scan every PV on the system by calling scan_vgs_for_pvs().
In most cases we should not need to do this, and by using the info->mdas
count, we avoid calling pv_read() as prior code did. So this patch is a
bit cleaner and should allow us to refactor more of the pv code.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
are active mirrors or snapshots.
We don't have the mechanisms in place to change the device-mapper
tables for those targets that have behavioral differences between
cluster and single machine instances. Allowing users to change
the attribute but not changing the target's behavior can lead to
data corruption.
The following bugs are fixed/avoided by this patch:
235123 - vgchange -c [ny] do not change target types when necessary
289331 - RFE: switching from cluster domain to local domain needs to deactivate volume somehow
289541 - when changing from local to cluster, volumes can not appear to be deactivated
Allow lv_remove_with_dependencies() to know the top-level LV that was
requested to be removed (otherwise it recurses and we lose context).
A merging snapshot cannot be removed directly but the associated origin
can be. Disallow removal of a merging snapshot unless the associated
origin is also being removed.
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
lvm2app needs a link back to the vg in order to use the vg handle for
memory allocations as well as other things. This patch adds the field
to struct physical_volume, and sets pv->vg when reading a vg from disk or
extending a vg by using the helper function previously added,
add_pvl_to_vgs(). Moves and renames are handled with separate code
inside move_pv() and vgmerge(). Add pv->vg check to vg_validate().
A NULL value in pv->vg signifies membership in the orphan VG.
Note though in the case of pv_read() on a device with metadatacopies == 0,
more devices may need to be read for an authoritative answer.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Now that we have library functions to add/delete a pv from the vg->pvs
list, call them from everywhere.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a delete function to manage the vg->pvs list.
NOTE: It may be possible to do further cleanup to these add/del functions
by passing a 'pv' as input instead of 'pv_list'. The pv_list is used for
functions which do allocations (lvcreate) while other places in the code
just manage a list of 'pv' (e.g. import functions, vgextend, etc).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
A user specifying duplicate paths on the cmdline of vgcreate will
get a message similar to the following:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Internal error: Duplicate PV id jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8 detected for /dev/loop3 in vgtest2.
This is caught by vg_validate(), but it would be good to find
this condition earlier in the vgcreate code. add_pv_to_vg()
currently checks by pvname, but does not look for duplcate pvids.
This patch adds the check for duplicate pvids and results in new
error output as follows:
vgcreate vgtest2 /dev/loop3 /dev/loop5
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop5 not /dev/loop3
Found duplicate PV jk1lXsKzwyOKlXq6bhaFFKMQQ06oPgu8: using /dev/loop3 not /dev/loop5
Physical volume '/dev/loop5 (jk1lXs-Kzwy-OKlX-q6bh-aFFK-MQQ0-6oPgu8)' listed more than once.
Unable to add physical volume '/dev/loop5' to volume group 'vgtest2'.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
In add_pv_to_vg(), we should only add the pv to vg->pvs after all
internal checks have passed. The check for vg->extent_count exeeding
maximum was after we added the pv to the list, so this function could
return a state of vg->pvs that did not reflect other parameters such
as vg->pv_count.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
When we pv_read() a device that has an orphan vgname, we might need to scan
the system to be sure this is true. However, if the PV has mdas, there's
no way possible for it to have an orphan vgname unless it is a true orphan.
Some areas of the code were optimized to take advantage of this fact, while
others were not (we would still do the expensive scan if a device had mdas
but had an orphan VG).
This patch unifies the code so that every place we are operating on such
a PV, we skip the expensive scan if there are mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Petr Rockai <prockai@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
If user try to vgcreate or vgextend non-existent VG,
these messages appears:
# vgcreate xxx /dev/xxx
Internal error: Volume Group xxx was not unlocked
Device /dev/xxx not found (or ignored by filtering).
Unable to add physical volume '/dev/xxx' to volume group 'xxx'.
Internal error: Attempt to unlock unlocked VG xxx.
(the same with existing VG and non-existing PV & vgextend)
# vgextend vg_test /dev/xxx
...
It is caused because code tries to "refresh" cache if
md filter is switched on using cache destroy.
But we can change filters and rescan even without this
machinery now, just use refresh_filters
(and reset md filter afterwards).
(Patch also discovers cache alias bug in vgsplit test,
fix it by using better filter line.)
The kernel's blk_stack_limits() function may flag a device as
'misaligned'. If it does the alignment_offset will be -1.
Update set_pe_align_offset() to accommodate this corner case.
We need to allocate memory for the tag and copy the tag value before we
add it to the list of tags. We could put this inside lvm2app since the
tools keep their memory around until vg_write/vg_commit is called, but
we put it inside the internal library to minimize code in lvm2app.
We need to copy the tag passed in by the caller to ensure the lifetime of
the memory until the {vg|lv} handle is released.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Similar refactoring to vgchange - pull out common parts and put into
library function for reuse. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Pull out common code to be called from tools as well as lvm2app.
Leave archive() at tool level so we can use from vgcreate
as well as vgchange. Should be no functional change.
- add stack macro in vgchange
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
where we should not expose internal VG names/uuids (the ones with "#" prefix )through the
interface. Otherwise, we could end up with library users opening internal VGs which will
initiate locking mechanism that won't be cleaned up properly.
"#orphans_{lvm1, lvm2, pool}" names are treated in a special way, they are truncated first
to "orphans" and this is used as a part of the lock name then (e.g. while calling lvm_vg_open()).
When library user calls lvm_vg_close(), the original name "orphans_{lvm1, lvm2, pool}"
is used directly and therefore no unlock occurs.
We should exclude internal VG names and uuids in the lists provided by lvmcache:
lvmcache_get_vgids() and lvmcache_get_vgnames().
All this seems to do is provide a memory leak so remove it.
The only caller of _alloc_pv() later explicitly sets
pv->vg_name = fmt->orphan_vg_name so clearly this allocation
should be removed. I also saw no where in the code where
strncpy was used to assign pv->vg_name - only direct assignments
and strdup's.