IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
recovery didn't kick in. But leave the behaviour alone when using
precommitted metadata because of a warning in a confusing FIXME.
In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do. (How many other tools fail similarly now?)
And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.
This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
Print device name when setting or clearing metadata ignore bit.
Example:
label/label.c:160 /dev/loop2: lvm2 label detected
cache/lvmcache.c:1136 lvmcache: /dev/loop2: now in VG #orphans_lvm2 (#orphans_lvm2)
metadata/metadata.c:4142 Setting mda ignored flag for metadata_locn /dev/loop2.
format_text/text_label.c:318 Skipping mda with ignored flag on device /dev/loop2 at offset 4096
Logging isn't ideal, especially for mda_set_ignore. Ideally we'd
like to display the device name and offset in this case but this
requires a bit more work and a per-format 'mda_description' function
pointer definition (we don't have access to mda_context in
metadata.c).
Add a helper function to consolidate checking for an empty mdas list
or ignored mdas. Ignored mdas should behave almost identically to
an empty mda list - the metadata areas should not be read or written
to. This function will make it easier to implement metadata balancing
and easier to track pvs with an empty mda list or ignored mdas.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a second mda list, metadata_areas_ignored to fid, and a couple
functions, fid_add_mda() and fid_add_mdas() to help manage the lists.
These functions are needed to properly count the ignored mdas and
manage the lists attached to the 'fid' and ultimately the 'vg'.
Ensure metadata_areas_ignored is initialized in other formats, even
if the list is never used.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area. Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.
This function is necessary to properly implement pv_set_mda_ignored().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
A metadata_area is defined independent of the location. One downside
is that there is no obvious mapping from a pv to an mda. For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv. This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.
I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.
An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV. However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
First we add a 'flags' field to the location independent
metadata_area structure, and a MDA_IGNORE flag. The
mda_is_ignored and mda_set_ignored functions are added to
manage the flag. Adding the flag and functions gives a
library interface to ignore metadata areas independent of
the underlying location (disk, file, etc). The location
specific read/write functions must then handle the specifics
of what this flag means to the location.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Add a delete function to manage the vg->pvs list.
NOTE: It may be possible to do further cleanup to these add/del functions
by passing a 'pv' as input instead of 'pv_list'. The pv_list is used for
functions which do allocations (lvcreate) while other places in the code
just manage a list of 'pv' (e.g. import functions, vgextend, etc).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
The function find_peg_by_pe is incredibly inefficient
for Pvs with many segments.
In shiny future there should be binary (or interval) tree
instead of sorted linked list (volunteers?).
Anyway, for now, we can use dirty trick here to optimise this case:
- Allocations are usually applied from the beginning
of PV (we have no alloocation policy which allocates areas
"backwards")
- The only user of find_peg_by_pe is pv_split_segment()
call. In *most* cases it need to split *last* PV segment.
So if we search sorted pv segment list backwards, we
hit the requested segment immediatelly.
This patch applies this tiny change.
(and saves >30% of processing time when >3000LVs segments are on one PV!)
To discourage using this inefficient function from other code,
it is moved to pv_manip.c and used static for now:-)
Internally we store sizes in sectors, but lvm2app exports sizes
in bytes. We could get fancier and allow units configuration but
this fix should do for now.
Fixes rhbz561422.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
During vgreduce is failed mirror image replaced with error segment,
this segmant type has always area_count == 0.
Current code expects that there is at least one area with device,
patch fixes it by additional check (fixes segfault during vgreduce).
Also do not calculate readahead in every lv_info call, we only need
to cache PV readahead before activation calls which locks memory.
When we are stacking LV over device, which has for some reason
increased read_ahead (e.g. MD RAID), the read_ahead hint
for libdevmapper is wrong (it is zero).
If the calculated read_ahead hint is zero, patch uses read_ahead of underlying device
(if first segment is PV) when setting DM_READ_AHEAD_MINIMUM_FLAG.
Because we are using dev-cache, it also store this value to cache for future use
(if several LVs are over one PV, BLKRAGET is called only once for underlying device.)
This should fix all the reamining problems with readahead mismatch reported
for DM over MD configurations (and similar cases).
Add lvs origin_size field.
Fix linux configure --enable-debug to exclude -O2.
Still a few rough edges, but hopefully usable now:
lvcreate -s vg1 -L 100M --virtualoriginsize 1T
Since now, all code reading volume group is responsible for releasing
the memory allocated by calling vg_release(vg).
(For simplicity of use, vg_releae can be called for vg == NULL,
the same logic like free(NULL)).
Also providing simple macro for unlocking & releasing in one step,
tools usualy uses this approach.
The global memory pool (cmd->mem) should be used only for global
physical volume operations.
This patch have to be applied with all subsequent patches to complete
memory pool per vg logic.
Using separate memory pool has quite bit memory saving impact when
using large VGs, this is mainly needed when we have to use
preallocated and locked memory (and should not overflow from that
memory space).
This patch is not fully tested and leaves some related bugs unfixed.
Intended behaviour of the code now:
pe_start in the lvm2 format PV label header is set only by pvcreate (or
vgconvert -M2) and then preserved in *all* operations thereafter.
In some specialist cases, after the PV is added to a VG, the pe_start
field in the VG metadata may hold a different value and if so, it
overrides the other one for as long as the PV is in such a VG.
Currently, the field storing the size of the data area in the PV label
header always holds 0. As it only has meaning in the context of a
volume group, it is calculated whenever the PV is added to a VG (and can
be derived from extent_size and pe_count in the VG metadata).
Reports the size of the smallest metadata area in a PV or a VG.
Useful to confirm pvcreate --metadatasize or pvmetadatasize setting in
/etc/lvm/lvm.conf file.
NOTE: Actual value in these fields will most always differ from that
given in pvcreate options due to rounding and alignment effects.
It fails for 1k PE now.
Patch adds log_region_size into allocation habdle struct
and use it in _alloc_parallel_area() for proper log size calculation
instead of hardcoded 1 extent - which can fail.
Reproducer for incorrect log size calculation:
DEV=/dev/sd[bcd]
pvcreate $DEV
vgcreate -s 1k vg_test $DEV
lvcreate -m1 -L 12M -n mirr vg_test
https://bugzilla.redhat.com/show_bug.cgi?id=477040
The log size calculation is mostly copied from kernel code.
Use loop to iterate through the now-ordered policy list in _allocate().
Check for failure to allocate just the mirror log.
Introduce calc_area_multiple().
Support mirror log allocation when there is only one PV: area_count now 0.
(See lvm-devel list archives for further details.)
Include mirror log (untested) in _for_each_pv() processing.
Use MIRROR_LOG_SIZE constant.
Remove struct seg_pvs from _for_each_pv() for generalisation.
Avoid adding duplicates to list of parallel PVs to avoid.
Fix some memory leaks in error paths found by coverity.
Use C99 struct initialisers.
Move DEFS into configure.h.
Clean-ups to remove miscellaneous compiler warnings.
[Some activation-related features will stop working for a while now.
Some types of activation are getting split into two steps, with the
first step using the precommitted metadata.]
Clear many compiler warnings (i386) & associated bugs - hopefully without
introducing too many new bugs:-) (Same exercise required for other archs.)
Default compilation has optimisation - or else use ./configure --enable-debug
allocation policy. This can currently take one of three values:
typedef enum {
ALLOC_NEXT_FREE,
ALLOC_STRICT,
ALLOC_CONTIGUOUS
} alloc_policy_t;
Notice that 'SIMPLE' has turned into the slightly more meaningful NEXT_FREE.
ii) Put code into display.[hc] for converting one of these enums to a
text representation and back again.
ii) Updated the text format so this also has the alloc_policy field.
Lots of changes/very little testing so far => there'll be bugs!
Use 'vgcreate -M text' to create a volume group with its metadata stored
in text files. Text format metadata changes should be reasonably atomic,
with a (basic) automatic recovery mechanism if the system crashes while a
change is in progress.
Add a metadata section to lvm.conf to specify multiple directories if
you want (recommended) to keep multiple copies of the metadata (eg on
different filesystems).
e.g. metadata {
dirs = ["/etc/lvm/metadata1","/usr/local/lvm/metadata2"]
}
Plenty of refinements still in the pipeline.
from lock_vol() - otherwise it now attempts to acquire the lock and then
immediately releases it.
o Extend the id field in struct logical_volume to hold VG uuid + LV uuid
for format1. This unique lvid can be used directly when calling lock_vol().
o Add the VG uuid to vgcache to make VG uuid lookups possible. (Another
step towards using them instead of VG names internally.)
I'm taking a different route from LVM1 here in that snapshots are a
seperate entity from the logical volumes, I think of them as an
application of an LV (or two lvs rather). As such there is a list of
snapshots held against the vg, and there is *not* a SNAPSHOT, or
SHAPSHOT_ORG flag in lv->status.
o Text format now has a description and time field at the top level.
o archiving and backup set the description appropriately. eg,
for an archive:
description = "Created *before* executing 'lvextend test_vg/lvol0 -l +1'."
creation_time = 1013166332
for a backup:
description = "Created *after* executing 'lvextend test_vg/lvol0 -l +1'."
creation_time = 1013166332
This is preparing the way for a simple vgcfgundo command.
slightly different from the current LVM1 method.
lvcreate --persistent y --minor 10 (to specify when created)
lvchange --persistent n (to turn off)
lvchange --persistent y --minor 11 (to change)
--persistent uses a new LV status flag stored on disk
minor number is stored on disk the same way as LVM1 does
(but major number stored is 0; any LVM1 major/minor setting gets lost)
lvchange -ay --minor 12 (to activate using minor 12, regardless of the
on-disk setting, which doesn't get changed)
--minor == -m
--persistent == -M