IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
There's an intermittent failure with vgcfgbackup that seems to have been
introduced with the metadataignore / vgmetadatacopies patchset.
Intermittent failures are often the result of uninitialized data,
so this patch calls zalloc in a few places it might matter.
This patch adds the ability to read/write the vg->mda_copies values
from/to the vg metadata.
If we read the VG metadata and this field does not exist, we set
mda_copies to the default value of 0. Later in the code, we use
this special '0' value to indicate a disable of metadata balancing.
This should preserve existing LVM behavior and ensure metadata balancing
can be turned off should the need arise.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
When we are constructing the vg, we may need to adjust the list of
metadata_areas if there are ignored mdas. At label read time, we
do not read the metadata of ignored mdas, and as a result, they do
not get placed on vg->fid->metadata_areas inside _text_create_text_instance
since lvmcache does not have these areas attached to vginfo->infos.
However, when we're checking the pvids inside _vg_read, after having
read another metadata area from another PV, we do have the opportunity
to update the metadata_area and metadata_areas_ignored lists based
on the read metadata_area. We need accurate mda lists for the reporting
functions that count the ignored mdas, as well as general correctness
of mda balancing.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
We implement ignore of an mda at label_read time by checking for
the ignore bit, and then skipping the reading of the vgname and
other information in the metadata. This will have an effect similar
to a PV found with no mdas. Thus, it will look like an orphan in the
cache until we scan the rest of the system and find a PV with
metadata, and the mda will not be on the vg->fid->metadata_areas
list so no read/writes will be done to the metadata area.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a second mda list, metadata_areas_ignored to fid, and a couple
functions, fid_add_mda() and fid_add_mdas() to help manage the lists.
These functions are needed to properly count the ignored mdas and
manage the lists attached to the 'fid' and ultimately the 'vg'.
Ensure metadata_areas_ignored is initialized in other formats, even
if the list is never used.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area. Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.
This function is necessary to properly implement pv_set_mda_ignored().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
A metadata_area is defined independent of the location. One downside
is that there is no obvious mapping from a pv to an mda. For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv. This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.
I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.
An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV. However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
We'd like to pass in mda_header to vgname_from_mda(). In order to
do this, we need to call raw_read_mda_header() from text_label.c,
_text_read(), which gets called from the label_read() path, and
peers into the metadata and update vginfo cache. We should check
the disable bit here, and if set, not peer into the vg metadata,
thus reducing the I/O to disk.
In the process, move vgname_from_mda() to layout.h, since the fn
only gets called from format_text code, and we need the mda_header
definition from the private layout.h.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
This refactoring moves the device open/close up one level to the caller of
_vg_read_raw_area(). Should be no functional change and facilitate future
changes related to metadata balancing.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
First we add a 'flags' field to the location independent
metadata_area structure, and a MDA_IGNORE flag. The
mda_is_ignored and mda_set_ignored functions are added to
manage the flag. Adding the flag and functions gives a
library interface to ignore metadata areas independent of
the underlying location (disk, file, etc). The location
specific read/write functions must then handle the specifics
of what this flag means to the location.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
Adding a flag to the 'rlocn' structure in the mda header of the
text format allows us to flip a bit to ignore an area on disk that
stores the metadata via the text format specific mda_header.
This patch defines the flag and access functions to manage the flag.
Other patches will manage the ignore on a format-independent basis,
by using a flag in the metadata_area structure.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Future patches will make use of a specific flag in the on-disk 'raw_locn'
structure to enable/disable metadata areas, and facilitate metadata
balancing.
Note that 'filler' is always set to '0' (see add_mda() - memset),
so use of this area as a non-zero flags field is a safe way to
provide future code features.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Adding configure.in support for Replicators.
Adding basic lib lvm support for Replicators.
Adding flags REPLICATOR and REPLICATOR_LOG.
Adding segments SEG_REPLICATOR and SEG_REPLICATOR_DEV.
Adding basic methods for handling replicator metadata.
We should write metadata into next position in the ring buffer while calling
vgrename and vgcfgrestore. At this code level (_vg_write_raw), we were not able
to determine if this is a rename or not. If yes, then accompanying VG structure
passed here has a new name set, not the old one.
When looking for a location where to put metadata next, we were given a NULL
value because of failed VG name comparison (in _find_vg_rlocn) between the
name in existing metadata and metadata we're just about to write.
This resets the position in the ring buffer, overwriting any existing metadata
(and also incorrectly updates the cache to "orphan" afterwards).
This patch just adds old_name item in struct volume_group that we can check and use
if necessary and detect renames at lower layers as well.
The same applies for vgcfgrestore, but here we're using a special value of
old_name, an empty string, to disable the check with existing metadata totally.
When moving parts of striped LVs, pvmove wouldn't care about leaving you with
two stripes on the same disk. Now --alloc anywhere is needed for that.
(Tried and gave up on two alternative approaches before the one committed here.)
Small refactor of main places in the code where a pv is added to a
vg into a small function which adds the pv to the list and updates
the vg counts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Simple refactor to mov code that updates the vg extent counts from a
single pv's counts close to the code that adds a pv to vg->pvs and
updates vg->pv_count. No functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Physical segments were still allocated from global
command context mempool.
This leads to very high memory usage when
activating large VG (vgchange).
(Memory usage was about 2G when >3000LVs).
Fix it by properly using vg->vgmem private pool,
so all the memory is released early.
New memory pool parameter is needed here for pv_split_segment
function.
Also fix the same problem in some minor allocations
(vg description, lv segment split).
The _read_vg uses already hash for PVs to optimise
reading of large VGs and avoiding repeated PV list traversing.
Use the same aproach to speed up parsing VG with many LVs.
Eliminate 'merging_snapshot' from 'struct logical_volume' and just use
'snapshot' for origin lv's reference to the merging snapshot; also set
MERGING in the origin lv's status.
Make 'merging_snapshot' pointer that points from the origin to the
segment that represents the merging snapshot.
Import/export 'merging_store' metadata.
Do not allow creating snapshots while another snapshot is merging.
Snapshot created in this state would certainly contain invalid data.
NOTE: patches at the end of this series will remove 'merging_snapshot'
and will introduce helpful wrappers and cleanups.
At this point they probably do not matter but going forward they
may - depends on future patches for replicator, etc. I think
these probably got missed because they were 'flags' so I changed
the name to 'status' to be consistent. So the on-disk
things 'flags' and the in structure 'status' (bits).
NOTE: WHATS_NEW already has entry for this in current release.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
The physical_volume, volume_group, logical_volume and lv_segment
structures' 'status' member is now uint64_t.
The alignment of these structures was also audited to remove holes. The
movement of some members in 'volume_group' and 'lv_segment' eliminates
holes. The 'physical_volume' structure still has one 4-byte hole after
'pe_size'; the other structures no longer have any holes. Each
structures' size has not changed.
indented metadata lines.
Macro outnl() is using exported out_newline() instead of direct
call f->fn(), that required the visibility of the internal
struct formatter.
This patch is all just cleanup and no other patch depends on it.
Replace explicit dereference and check with vg_is_exported().
Update a few copyrights and remove unnecessary whitespace.
Should be no functional change.
If the pvcreate --dataalignmentoffset option is not specified the start
of a PV's aligned data area will be shifted by the associated
'alignment_offset' exposed in sysfs (unless
devices/data_alignment_offset_detection is disabled in lvm.conf).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Documented which use-cases force the reinstatement of the nuanced
handling of pe_start. As soon as orphan PVs are eliminated much of this
will no longer be a concern ('preserve_pe_start' can be reenabled in
.pv_setup).
Added defensive 'if (pv->pe_align)' check in _text_pv_write()'s pe_start
loop.
If pv_setup was given a non-zero pe_start it would short-circuit
establishing a default pv->pe_align. pv->pe_align=0 would result
in a divide by zero in _mda_setup(). 'vgconvert -M2 $vgname' hit this.
.pv_write still properly preserves pe_start if it was supplied.
Adds pe_align_offset to 'struct physical_volume'; is initialized with
set_pe_align_offset(). After pe_start is established pe_align_offset is
added to it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
areas.
This preserved pe_start would quickly be readjusted to follow the first
mda anyway. An example use-case that hit this code path is: running
pvcreate on an already existing PV _without_ a preceeding pvremove.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Without this fix rounding the end of the first mda to a pe_align
boundary could silently exceed the disk_size.
Final 'if (start1 + mda_size1 > disk_size)' block serves as a safety
net.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Document existing pe_start policy.
Fix issue in _text_pv_setup() where existing pe_start case could have
the pv->pe_start set to pv->pe_align even though pe_start shouldn't ever
change.
vgconvert and pvcreate have a facility to preserve the existing start
of the on-disk data extents, known as pe_start.
They indicate this by passing the existing value to the pvsetup function
which must preserve it.
This patch avoids one particular case where the value could get
changed incorrectly now that the alignment settings are configurable.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This function behaves a little bit different than vg_reduce_single, because
it allowes to remove even the latest pv. This has been done to be consistent
to lvm_vg_create, which creates an empty vg.
removed_pvs has been added to the volume_group struct. vg_reduce adds remove
pvs to this list to be able to commit the changes for the pvs in lvm_vg_comm
in liblvm2app.
Initialize removed_pvs list in format-specific volume_group constructors.
Ideally, we should have a base constructor here that initializes the general
non-format specific members of struct volume_group. But until then, there
are multiple places to initialize these members. Maybe a better patch would
be a base constructor patch for struct volume_group. That is more work
though.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Thomas Woerner <twoerner@redhat.com>
Author: Dave Wysochanski <dwysocha@redhat.com>
Currently code uses pv_dev_name() for hash when getting internal
"pvX" name.
This produce corrupted metadata if PVs are missing, pv->dev
is NULL and all these missing devices returns one name
(using "unknown device" for all missing devices as hash key).
link_lv_to_vg and unlink_lv_from_vg are the only functions
for adding/removing logical volume from volume group.
Only these function should manipulate with vg->lvs list.
The snapshot segment (snapshotX) is created twice
during the text metadata segment processing.
This can cause temporary violation of max_lv count.
Simplify the code, snapshot segment is properly initialized
in init_snapshot_seg function now and do not need to be replaced
by vg_add_snapshot call.
The vg_add_snapshot() is now usefull only for adding new
snapshot and it shares the same initialization function.
The snapshot name is always generated, name paramater can be
removed from function call.
The dataalign value must always be aligned according
to MDA area.
The currect code checks if calculated value collides with
MDA area but not if the value is so small that it is
located before MDA starts.
Unfortunatelly there can be also MDA in the end of the device.
The patch adds simple check to avoid this miscalculation.
Patch expects that first MDA always starts on <= pagesize boundary
(this is true for all allowed label sector parameters).
Add lvs origin_size field.
Fix linux configure --enable-debug to exclude -O2.
Still a few rough edges, but hopefully usable now:
lvcreate -s vg1 -L 100M --virtualoriginsize 1T
Run backup of metadata on remote nodes in the
same place like local node - when calling backup().
Introduce backup_locally() which calls only
local backup if needed.
Remote backup is now trigerred by LCK_VG_BACKUP flag
combination (special VG lock).
This lock type will call check_current_backup()
(including backup_locally() call) and updates
metadata on all nodes.
(Patch fixes non-functional remote backup,
current call during VG lock never triggers.)
Since now, all code reading volume group is responsible for releasing
the memory allocated by calling vg_release(vg).
(For simplicity of use, vg_releae can be called for vg == NULL,
the same logic like free(NULL)).
Also providing simple macro for unlocking & releasing in one step,
tools usualy uses this approach.
The global memory pool (cmd->mem) should be used only for global
physical volume operations.
This patch have to be applied with all subsequent patches to complete
memory pool per vg logic.
Using separate memory pool has quite bit memory saving impact when
using large VGs, this is mainly needed when we have to use
preallocated and locked memory (and should not overflow from that
memory space).
if rlocn not defined (there is no metadata area).
In most cases it fails in validate_name(),
unfortunately there are situatuions, when
validate_name is ok and later code fails with
checksum error.
Reproducer:
# dd if=/dev/zero of=/dev/loop0
# pvcreate --metadatasize 637k /dev/loop0
Physical volume "/dev/loop0" successfully created
# pvs /dev/loop0
/dev/loop0: Checksum error
PV VG Fmt Attr PSize PFree
/dev/loop0 lvm2 -- 1.00M 1.00M
Signed-off-by: Milan Broz <mbroz@redhat.com>
-
This patch is not fully tested and leaves some related bugs unfixed.
Intended behaviour of the code now:
pe_start in the lvm2 format PV label header is set only by pvcreate (or
vgconvert -M2) and then preserved in *all* operations thereafter.
In some specialist cases, after the PV is added to a VG, the pe_start
field in the VG metadata may hold a different value and if so, it
overrides the other one for as long as the PV is in such a VG.
Currently, the field storing the size of the data area in the PV label
header always holds 0. As it only has meaning in the context of a
volume group, it is calculated whenever the PV is added to a VG (and can
be derived from extent_size and pe_count in the VG metadata).
Reports the size of the smallest metadata area in a PV or a VG.
Useful to confirm pvcreate --metadatasize or pvmetadatasize setting in
/etc/lvm/lvm.conf file.
NOTE: Actual value in these fields will most always differ from that
given in pvcreate options due to rounding and alignment effects.
Identical argument to previous patch which removed archive_enable() calls.
We add a new parameter to backup_init() which sets the enable value based
on the cmd->default_settings.backup value. This value was used to set
cmd->current_settings.backup, used in the removed backup_enable() call.
_init_backup() calls archive_init(), which originally set 'enabled' to
a hardcoded '1' value. This seems incorrect based on my read of other
areas of the code so here we add a 'enabled' paramter to archive_init().
We pass in cmd->default_settings.archive, which is obtained from the
config tree. Later in create_toolcontext, cmd->current_settings is
set to cmd->default_settings. The archive_enable() call we remove
here was using cmd->current_settings to set the 'archive' enable
value. The final value of cmd->archive_params->enabled should thus
be equivalent to the original code.
Function _text_pv_write doesn't use memory pool but static buffer,
call dm_pool_free in error path in _raw_write_mda_header is wrong.
Move pool free only to path where is the memory pool used.
Failure to check for label_write() return code caused the following test
to indicate it passed when it really failed:
pvcreate rejects labelsector > 1000000000000
The "status" field is treated as it ever has been, unknown flags there are
treated as fatal metadata errors. However, in the "flags" field, any unknown
flags will be ignored and silently dropped. This improves
backward-compatibility possibilities. (Any versions without support for this
new "flag" field will drop the field altogether, which is same as ignoring all
the flags there.)
* lib/misc/lvm-file.c (lvm_fclose): New function.
* lib/misc/lvm-file.h (lvm_fclose): Declare it.
* lib/config/config.c (write_config_file): Use the new function to detect
and diagnose unlikely write failure.
* lib/filters/filter-persistent.c (persistent_filter_dump): Likewise.
* lib/format_text/archive.c (archive_vg): Likewise.
* lib/format_text/format-text.c (_vg_write_file): Likewise.
* lib/log/log.c (fin_log): Similar, but use dm_fclose directly.
Include "\n" at end of each fprintf format string.
Add --config for overriding most config file settings from cmdline.
Quote arguments when printing command line.
Remove linefeed from 'initialising logging' message.
Add 'Completed' debug message.
Don't attempt library exit after reloading config files.
Always compile with libdevmapper, even if device-mapper is disabled.
Fix some memory leaks in error paths found by coverity.
Use C99 struct initialisers.
Move DEFS into configure.h.
Clean-ups to remove miscellaneous compiler warnings.
[Some activation-related features will stop working for a while now.
Some types of activation are getting split into two steps, with the
first step using the precommitted metadata.]
Clear many compiler warnings (i386) & associated bugs - hopefully without
introducing too many new bugs:-) (Same exercise required for other archs.)
Default compilation has optimisation - or else use ./configure --enable-debug
allocation policy. This can currently take one of three values:
typedef enum {
ALLOC_NEXT_FREE,
ALLOC_STRICT,
ALLOC_CONTIGUOUS
} alloc_policy_t;
Notice that 'SIMPLE' has turned into the slightly more meaningful NEXT_FREE.
ii) Put code into display.[hc] for converting one of these enums to a
text representation and back again.
ii) Updated the text format so this also has the alloc_policy field.
Lots of changes/very little testing so far => there'll be bugs!
Use 'vgcreate -M text' to create a volume group with its metadata stored
in text files. Text format metadata changes should be reasonably atomic,
with a (basic) automatic recovery mechanism if the system crashes while a
change is in progress.
Add a metadata section to lvm.conf to specify multiple directories if
you want (recommended) to keep multiple copies of the metadata (eg on
different filesystems).
e.g. metadata {
dirs = ["/etc/lvm/metadata1","/usr/local/lvm/metadata2"]
}
Plenty of refinements still in the pipeline.
from lock_vol() - otherwise it now attempts to acquire the lock and then
immediately releases it.
o Extend the id field in struct logical_volume to hold VG uuid + LV uuid
for format1. This unique lvid can be used directly when calling lock_vol().
o Add the VG uuid to vgcache to make VG uuid lookups possible. (Another
step towards using them instead of VG names internally.)