IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The last commit related to this was incomplete:
"Implement lock-override options without locking type"
This is further reworking and reduction of the locking.[ch]
layer which handled all clustering, but is now only used
for file locking. The "locking types" that this layer
implemented were removed previously, leaving only the
standard file locking. (Some cluster-related artifacts
remain to be cleared out later.)
Command options to override or modify locking behavior
are reimplemented here without using the locking types.
Also, deprecated locking_type values are recognized,
and implemented as if one of the equivalent override
options was set.
Options that override file locking are:
. --nolocking disables all file locking.
. --readonly grants read lock requests without actually
taking a file lock, and refuses write lock requests.
. --ignorelockingfailure tries to set up file locks and
uses them normally if possible. When not possible, it
behaves like --readonly, but allows activation.
. --sysinit is the same as ignorelockingfailure.
. global/metadata_read_only acquires actual read file
locks, and refuses write lock requests.
(Some of these options could probably be deprecated
because they were added as workarounds to various
locking_type behaviors that are now deprecated.)
The locking_type setting now has one valid value: 1 which
refers to standard file locking. Configs that contain
deprecated values are recognized and still work in
largely the same way:
. 0 disabled all locking, now implemented like --nolocking
is set. Allow the nolocking option in all commands.
. 1 is the normal file locking setting and is unchanged.
. 2 was for external locking which was not used, and
reverts to normal file locking.
. 3 was for cluster/clvm. This reverts to normal file
locking, and prints messages about lvmlockd.
. 4 was equivalent to readonly, now implemented like
--readonly is set.
. 5 disabled all locking, now implemented like
--nolocking is set.
The device-mapper directory now holds a copy of libdm source. At
the moment this code is identical to libdm. Over time code will
migrate out to appropriate places (see doc/refactoring.txt).
The libdm directory still exists, and contains the source for the
libdevmapper shared library, which we will continue to ship (though
not neccessarily update).
All code using libdm should now use the version in device-mapper.
As we start refactoring the code to break dependencies (see doc/refactoring.txt),
I want us to use full paths in the includes (eg, #include "base/data-struct/list.h").
This makes it more obvious when we're breaking abstraction boundaries, eg, including a file in
metadata/ from base/
There are likely more bits of code that can be removed,
e.g. lvm1/pool-specific bits of code that were identified
using FMT flags.
The vgconvert command can likely be reduced further.
The lvm1-specific config settings should probably have
some other fields set for proper deprecation.
For reporting commands (pvs,vgs,lvs,pvdisplay,vgdisplay,lvdisplay)
we do not need to repeat the label scan of devices in vg_read if
they all had matching metadata in the initial label scan. The
data read by label scan can just be reused for the vg_read.
This cuts the amount of device i/o in half, from two reads of
each device to one. We have to be careful to avoid repairing
the VG if we've skipped rescanning. (The VG repair code is very
poor, and will be redone soon.)
Create a new dev->bcache_fd that the scanning code owns
and is in charge of opening/closing. This prevents other
parts of lvm code (which do various open/close) from
interfering with the bcache fd. A number of dev_open
and dev_close are removed from the reading path since
the read path now uses the bcache.
With that in place, open(O_EXCL) for pvcreate/pvremove
can then be fixed. That wouldn't work previously because
of other open fds.
The copy of VG metadata stored in lvmcache was not being used
in general. It pretended to be a generic VG metadata cache,
but was not being used except for clvmd activation. There
it was used to avoid reading from disk while devices were
suspended, i.e. in resume.
This removes the code that attempted to make this look
like a generic metadata cache, and replaces with with
something narrowly targetted to what it's actually used for.
This is a way of passing the VG from suspend to resume in
clvmd. Since in the case of clvmd one caller can't simply
pass the same VG to both suspend and resume, suspend needs
to stash the VG somewhere that resume can grab it from.
(resume doesn't want to read it from disk since devices
are suspended.) The lvmcache vginfo struct is used as a
convenient place to stash the VG to pass it from suspend
to resume, even though it isn't related to the lvmcache
or vginfo. These suspended_vg* vginfo fields should
not be used or touched anywhere else, they are only to
be used for passing the VG data from suspend to resume
in clvmd. The VG data being passed between suspend and
resume is never modified, and will only exist in the
brief period between suspend and resume in clvmd.
suspend has both old (current) and new (precommitted)
copies of the VG metadata. It stashes both of these in
the vginfo prior to suspending devices. When vg_commit
is successful, it sets a flag in vginfo as before,
signaling the transition from old to new metadata.
resume grabs the VG stashed by suspend. If the vg_commit
happened, it grabs the new VG, and if the vg_commit didn't
happen it grabs the old VG. The VG is then used to resume
LVs.
This isolates clvmd-specific code and usage from the
normal lvm vg_read code, making the code simpler and
the behavior easier to verify.
Sequence of operations:
- lv_suspend() has both vg_old and vg_new
and stashes a copy of each onto the vginfo:
lvmcache_save_suspended_vg(vg_old);
lvmcache_save_suspended_vg(vg_new);
- vg_commit() happens, which causes all clvmd
instances to call lvmcache_commit_metadata(vg).
A flag is set in the vginfo indicating the
transition from the old to new VG:
vginfo->suspended_vg_committed = 1;
- lv_resume() needs either vg_old or vg_new
to use in resuming LVs. It doesn't want to
read the VG from disk since devices are
suspended, so it gets the VG stashed by
lv_suspend:
vg = lvmcache_get_suspended_vg(vgid);
If the vg_commit did not happen, suspended_vg_committed
will not be set, and in this case, lvmcache_get_suspended_vg()
will return the old VG instead of the new VG, and it will
resume LVs based on the old metadata.
New label_scan function populates bcache for each device
on the system.
The two read paths are updated to get data from bcache.
The bcache is not yet used for writing. bcache blocks
for a device are invalidated when the device is written.
Callers that read larger amounts of data now get a pointer to read-only
data directly without copying it through an intermediate buffer. This
data is owned by the device layer so the callers no longer free it.
If it obtains the data, it passes it into the supplied callback function
and returns 1. Otherwise the callback receives failed = 1.
Updated config_file_read_fd to use this and similarly return the data
via a callback fn of its own.
Rename dev_read() to dev_read_buf() - the function that reads data
into a supplied buffer.
Introduce a new dev_read() that allocates the buffer it returns and
switch the important users over to this. No caller may change the
returned data. (For now, callers are responsible for freeing it after
use, but later the device layer will take full ownership.)
dev_read_buf() should only be used for tiny buffers or unimportant code
(such as the old disk formats).
The creation of wrapped around metadata - where the start of metadata is
written up to the end of the buffer and the remainder follows back at
the start of the buffer - is now restricted to cases where writing the
metadata in one piece wouldn't fit. This shouldn't happen in 'normal'
usage so let's begin treating the code for this as a special case that
can be ignored when optimising 'normal' cases.
Mark the first metadata area on each text format PV as MDA_PRIMARY.
Pass this information down to the device layer so that when
there are two metadata areas on a block device, we can easily
distinguish two independent streams of I/O.
Introduce enum dev_io_reason to categorise block device I/O
in debug messages so it's obvious what it is for.
DEV_IO_SIGNATURES /* Scanning device signatures */
DEV_IO_LABEL /* LVM PV disk label */
DEV_IO_MDA_HEADER /* Text format metadata area header */
DEV_IO_MDA_CONTENT /* Text format metadata area content */
DEV_IO_FMT1 /* Original LVM1 metadata format */
DEV_IO_POOL /* Pool metadata format */
DEV_IO_LV /* Content written to an LV */
DEV_IO_LOG /* Logging messages */
vgsplit shares the vg_rename code so that must only set the PV_MOVED_VG
flag introduced in commit 486ed10848
("vgmerge: Fix intermediate metadata corruption") on PVs that moved.
API for strtod() or strtoul() needs reset of errno, before it's being
called. So add missing resets in missing places and some also some
errno validation for out-of-range numbers.
Add new profilable configation setting to let user select
which metadata format of a created cache pool he wish to use.
By default the 'best' available format is autodetected at runtime,
but user may enforce format 1 or 2 ATM.
Code also detects availability for metadata2 supporting cache target.
In case of troubles user may easily Disable usage of this feature
by placing 'metadata2' into global/cache_disabled_features list.
User can specify metadata profile which stores important cache
geometry data for easy configuration.
Fix missing support for getting chunk_size, cache_mode, cache_policy
for a cache/cache pools volumes from configuration or metadata profile.
Basically code moving operation to have a single place resolving
thin_pool_chunk_size_policy.
Supported are generic & performance profiles.
Function is now shared between thin manipulation code and configuration
_CFG logic to obtain defaults and handle correct reporting upward coding
stack.
Because of contraints in renaming shifted rimage/rmeta LV names
the current RaidLV limit is a maximum of 10 SubLV pairs.
With the previous introduction of reshaping infratructure that
constriant got removed.
Kernel supports 253 since dm-raid target 1.9.0, older kernels 64.
Raise the maximum number of RaidLV rimage/rmeta pairs to 64.
If we want to raise past 64, we have to introdce a check for
the kernel supporting it in lvcreate/lvconvert.
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces the changes to call the reshaping infratructure
from lv_raid_convert().
Changes:
- add reshaping calls from lv_raid_convert()
- add command definitons for reshaping to tools/command-lines.in
- fix raid_rimage_extents()
- add 2 new test scripts lvconvert-raid-reshape-linear_to_striped.sh
and lvconvert-raid-reshape-striped_to_linear.sh to test
the linear <-> striped multi-step conversions
- add lvconvert-raid-reshape.sh reshaping tests
- enhance lvconvert-raid-takeover.sh with new raid10 tests
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
When showing sizes with 'H|human' units we do use standard rounding.
This however is confusing users from time to time,
when the printed number uses some biger units i.e. GiB and there is just
tiny fraction of space missing.
So here is some real-life example with new 'r' unit.
$lvs
LV VG Attr LSize Pool Origin
lvol0 vg -wi-a----- 1.99g
lvol1 vg -wi-a----- <2.00g
lvol2 vg -wi-a----- <2.01g
Meaning is - lvol1 has 'slightly' less then 2.00g - from sign '<' user
can be aware the LV doesn't have full 2.00GiB in size so he
will be less surpriced allocation of 2G volume will not succeed.
$ vgs
VG #PV #LV #SN Attr VSize VFree
vg 2 2 0 wz--n- <6,00g <2,01g
For uses needing 'old' undecorated human unit simply will continue
to use 'H|h' units.
The new R|r may further change when we would recongnize some
other way how to improve readability.
Make it easier to replace missing segments with 'zero' returning
target - otherwise user would have to create some extra target
to provide zeros as /dev/zero can't be used (not a block device).
Also break code loop when segment is found and make it an INTERNAL_ERROR
where it's missing.
Introduce 'hard limit' for max number of cache chunks.
When cache target operates with too many chunks (>10e6).
When user is aware of related possible troubles he
may increase the limit in lvm.conf.
Also verbosely inform user about possible solution.
Code works for both lvcreate and lvconvert.
Lvconvert fully supports change of chunk_size when caching LV
(and validates for compatible settings).
Commit e947c362dd introduced
config_settings.h file for central place to store all definitions for
config options. By mistake, it used report/colums_as_rows instead
of report/columns_as_rows (missing "n" in "columns").
Enforce mirror/raid0/1/10/4/5/6 type specific maximum images when
creating LVs or converting them from mirror <-> raid1.
Document those maxima in the lvcreate/lvconvert man pages.
- resolves rhbz1366060
Some settings are not suitable for override in interactive/shell
mode because such settings may confuse the code and it may end
up with unexpected behaviour. This is because of the fact that
once we're in the interactive/shell mode, we have already applied
some settings for the shell itself and we can't override them
further because we're already using those settings to drive the
interactive/shell mode. Such settings would get ignored silently
or, in worse case, they would mess up the existing configuration.
Commit 3928c96a37 introduced
new defaults for raid number of stripes, which may cause
backwards compatibility issues with customer scripts.
Adding configurable option 'raid_stripe_all_devices' defaulting
to '0' (i.e. off = new behaviour) to select the old behaviour
of using all PVs in the VG or those provided on the command line.
In case any scripts rely on the old behaviour, just set
'raid_strip_all_devices = 1'.
- resolves rhbz1354650
lvm fullreport executes 5 subreports (vg, pv, lv, pvseg, seg) per each VG
(and so taking one VG lock each time) within one command which makes it
easier to produce full report about LVM entities.
Since all 5 subreports for a VG are done under a VG lock, the output is
more consistent mainly in cases where LVM entities may be changed in
parallel.
New report/output_format configuration sets the output format used
for all LVM commands globally. Currently, there are 2 formats
recognized:
- basic (the classical basic output with columns and rows, used by default)
- json (output is in json format)
This is a preparation for new CMDLOG report type which is going to be
used for reporting LVM command log.
The new report type introduces several new fields (log_seq_num, log_type,
log_context, log_object_type, log_object_group, log_object_id, object_name,
log_message, log_errno, log_ret_code) as well as new configuration settings
to set this report type (report/command_log_sort and report/command_log_cols
lvm.conf settings).
This patch also introduces internal report_cmdlog helper function
which is a wrapper over dm_report_object to report command log via
CMDLOG report type and which is going to be used throughout the code
to report the log items.
Wait to compare and choose alternate duplicate devices until
after all devices are scanned. During scanning, the first
duplicate dev is kept in lvmcache, and others are kept in a
new list (_found_duplicate_devs).
After all devices are scanned, compare all the duplicates
available for a given PVID and decide which is best.
If the dev used in lvmcache is changed, drop the old dev
from lvmcache entirely and rescan the replacement dev.
Previously the VG metadata from the old dev was kept in
lvmcache and only the dev was replaced.
A new config setting devices/allow_changes_with_duplicate_pvs
can be set to 0 which disallows modifying a VG or activating
LVs in it when the VG contains PVs with duplicate devices.
Set to 1 is the old behavior which allowed the VG to be
changed.
The logic for which of two devs is preferred has changed.
The primary goal is to choose a device that is currently
in use if the other isn't, e.g. by an active LV.
. prefer dev with fs mounted if the other doesn't, else
. prefer dev that is dm if the other isn't, else
. prefer dev in subsystem if the other isn't
If neither device is preferred by these rules, then don't
change devices in lvmcache, leaving the one that was found
first.
The previous logic for preferring a device was:
. prefer dev in subsystem if the other isn't, else
. prefer dev without holders if the other has holders, else
. prefer dev that is dm if the other isn't
Move checking the lvmetad state, and the possible rescan,
out of lvmetad_send() to the start of the command.
Previously, the token mismatch and rescan would occur
within lvmetad_send() for some other request. Now,
the token mismatch is detected earlier, so the
rescan can be done before the main command is in
progress. Rescanning deep within the processing of
another command will disturb the lvmcache state of
that other command.
A rescan already exists at the start of the command
for the case where foreign VGs are going to be read.
This same rescan is now also performed when there is
an lvmetad token mismatch (from a changed global_filter).
The commands pvscan/vgscan/lvscan/vgimport are excluded
from this preemptive checking/rescanning for lvmetad
because they want to do rescanning themselves explicitly.
If rescanning devices fails, then lvmetad has not been
correctly repopulated and should not be used, so make
the command revert to not using lvmetad.
When a command modifies a PV or VG, or changes the
activation state of an LV, it will send a dbus
notification when the command is finished. This
can be enabled/disabled with a config setting.
The metadata/record_lvs_history is global switch which enables or
disables recording historical LVs in metadata.
If both metadata/record_lvs_history=1 lvm.conf option and
--nohistory command switch is used at the same time, the
--nohistory prevails.
'verbose' was marked as a boolean option while it
takes integer args - so it has limited usage to 0 or 1,
but we supported 0-4 at least.
Fix it by switching to corrent int type.
(Hopefully noone was trying to use this variable as true/yes/false/no
way - as the would be unsupported/undocumented).
Normally, we generate and provide lvm.conf file where use_blkid_wiping
is set based on whether support for this is compiled in or not. This was
generated properly based on configure.
However, if lvm.conf is not used at all (someone deletes it) or the value
in lvm.conf is commented out (user edited it), we still need to use
proper default value that is based on DEFAULT_USE_BLKID_WIPING taken
from configure script - we used hardcoded value of "1" in this case
by mistake.
Just for convenience to display all new configuration settings
introduced since given version (before, there was only --atversion
to display settings introduced in concrete version).
For example:
$ lvmconfig --type new --sinceversion 2.2.120
allocation {
# cache_mode="writethrough"
# cache_settings {
# }
}
global {
use_lvmlockd=0
# lvmlockd_lock_retries=3
# sanlock_lv_extend=256
use_lvmpolld=1
}
activation {
}
# report {
# compact_output_cols=""
# time_format="%Y-%m-%d %T %z"
# }
local {
# host_id=0
}
Here Coverity cannot see the pointer cannot be NULL in this
code path - opened coverity case #00531860.
We could make a model to avoid seeing related reports,
but then we loose coverage for modeled function.
So decided to add minor hint for this case.
The new report/compact_output_cols setting has exactly the same effect
as report/compact_output setting. The difference is that with the new
setting it's possible to define which cols should be compacted exactly
in contrast to all cols in case of report/compact_output.
In case both compact_output and compact_output_cols is enabled/set,
the compact_output prevails.
For example:
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols=""
$ lvs vg
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=1
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize
lvol0 vg -wi-a----- 4.00m