IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If we know that a PV belongs to some VG and we're missing metadata
(because we have only those PV(s) from VG present in the system that
don't have metadata areas), we should skip such PV when processing
under system ID.
This is because we know that the PV belongs to some VG, but we
really can't decide whether it matches system ID unless the VG
metadata is present again.
If we know that the PV is orphan, meaning there's at least one MDA on
that PV which does not reference any VG and at the same time there's
PV_EXT_USED flag set, we're certainly in an inconsistent state and we
need to fix this.
For example, such situation can happen during vgremove/vgreduce if we
removed/reduced the VG, but we haven't written PV headers yet because
vgremove stopped abruptly for whatever reason just before writing new
PV headers with updated state, including PV extension flags (and so the
PV_EXT_USED flag).
However, in case the PV has no MDAs at all, we can't double-check
whether the PV_EXT_USED is correct or not - if that PV is marked
as used, it's either:
- really used (but other disks with MDAs are missing)
- or the error state as described above is hit
User needs to overwrite the PV header directly if it's really clear
the PV having no MDAs does not belong to any VG and at the same time
it's still marked as being in use (pvcreate -ff <dev_name> will fix this).
For example - /dev/sda here has 1 MDA, orphan and is incorrectly marked
with PV_EXT_USED flag:
$ pvs --binary -o+pv_in_use
WARNING: Found inconsistent standalone Physical Volumes.
WARNING: Repairing flag incorrectly marking Physical Volume /dev/sda as used.
PV VG Fmt Attr PSize PFree InUse
/dev/sda lvm2 --- 128.00m 128.00m 0
This is a hotfix for a bug introduced in
6d7dc87cb3.
The bug description: First we allocate memory for
processing handle (at an address 1) then we
allocate some memory on the same pool for later use
in pvmove_poll function inside the process_each_pv
function (at an address 2). After we jump out of
process_each_pv we called destroy_processing_handle.
As a result of destroying the handle memory pool could
deallocate all memory at address 1 or higher. The
pvmove_poll function tried to copy a memory allocated
at address 2 that could be returned to the system.
If it was so it led to segfault.
We need to rethink proper fix but in the same time
cmd->mem pool is recreated per each lvm command so
this should not cause problems even when we run
multiple commands in lvm shell.
A valgrind snapshot of the corruption:
Invalid read of size 1
at 0x4C29F92: strlen (mc_replace_strmem.c:403)
by 0x5495F2E: dm_pool_strdup (pool.c:51)
by 0x1592A7: _create_id (pvmove.c:774)
by 0x159409: pvmove_poll (pvmove.c:796)
by 0x1599E3: pvmove (pvmove.c:931)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Address 0xf15df8a is 138 bytes inside a block of size 8,192 free'd
at 0x4C28430: free (vg_replace_malloc.c:446)
by 0x5494E73: dm_free_wrapper (dbg_malloc.c:357)
by 0x5495DE2: _free_chunk (pool-fast.c:318)
by 0x549561C: dm_pool_free (pool-fast.c:151)
by 0x164451: destroy_processing_handle (toollib.c:1837)
by 0x1598C1: pvmove (pvmove.c:903)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Fix regression caused by c9f021de0b.
This commit actually transfered real-action (e.g. device removal)
into the next loop which has however missed to check for break.
So add check for break also there.
When creating a list in 'context of command' - use proper mempool.
vg->vgmem is mempool related to VG metadata - and can be eventually
locked read-only when VG struct is shared.
The extent size must fits all blocks in 4294967295 sectors
(in 512b units) this is 1/2 KiB less then 2TiB.
So while previous statement 'suggested' 2TiB is still acceptable value,
make it clear it's not.
As now we support any multiples of 128KB as extent size -
values like 2047G will still 'flow-in' otherwise the largest power-of-2
supported value is 1TiB.
With 1TiB user needs 8388608 extents for 8EiB device.
(FYI such device is already unusable with todays glibc-2.22.90-27)
4GiB extent size is currently the smallest extent size which allows
a user to create 8EiB devices (with 2GiB it's less then 8EiB).
TODO: lvm2 may possibly print amount of 'lost/unused space' on a PV,
since using such ridiculously sized extent size may result in huge
space being left unaccessible.
Add a comment in _process_pvs_in_vg() to document the
place where there have been problems with processing
PVs twice.
For a while we had a hacky workaround here where we'd
skip processing a PV if its device wasn't found in
all_devices (and !is_missing_pv since we want to
process PVs with missing devices.). That workaround
was removed in commit 5cd4d46f because it was no
longer needed.
The workaround had originally been needed to prevent
a device from being processed twice when the PV had
no MDAs -- it would be processed once in its real VG
and then the workaround would prevent it from being
processed a second time in the orphan VG.
Wrongly appearing as an orphan likely happened because
lvmcache would consider the no-MDA PV an orphan unless
the real VG holding that PV was also in lvmcache.
This issue is also mentioned in pvchange where holding
the global lock allows VGs to remain in lvmcache so
PVs with 0 mdas are not considered orphans.
The workaround in _process_pvs_in_vg() was originally
intended for reporting commands, not for pvchange.
But, it was accidentally helping pvchange also because
the method described by the pvchange global lock
comment had been subverted by commit 80f4b4b8.
Commit 80f4b4b8 was found to be unnecessary, and was
reverted in commit e710bac0. This restored the
intended global lock lvmcache effect to pvchange, and
it no longer relied on the workaround in toollib.
The problem addressed by this workaround no longer
seems to exist, so remove it. PVs with no mdas
no longer appear in both their actual VG and in
the orphan VG.
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
Also always clear the internal lvmcache after rescanning, and
reinstate a test for --trustcache so that 'pvs --trustcache'
(for example) avoids rescanning.
Before commit c1f246fedf,
_get_all_devices() did a full device scan before
get_vgnameids() was called. The full scan in
_get_all_devices() is from calling dev_iter_create(f, 1).
The '1' arg forces a full scan.
By doing a full scan in _get_all_devices(), new devices
were added to dev-cache before get_vgnameids() began
scanning labels. So, labels would be read from new devices.
(e.g. by the first 'pvs' command after the new device appeared.)
After that commit, _get_all_devices() was called
after get_vgnameids() was finished scanning labels.
So, new devices would be missed while scanning labels.
When _get_all_devices() saw the new devices (after
labels were scanned), those devices were added to
the .cache file. This meant that the second 'pvs'
command would see the devices because they would be
in .cache.
Now, the full device scan is factored out of
_get_all_devices() and called by itself at the
start of the command so that new devices will
be known before get_vgnameids() scans labels.
In general, --select should be used to specify a VG by UUID,
but vgrename already allows a uuid to be substituted for
the name, so continue to allow it in that case.
If the VG arg from the command line does not match the
name of any known VGs, then check if the arg looks like
a UUID. If it's a valid UUID, then compare it to the
UUID of known VGs. If it matches the UUID of a known VG,
then process that VG.
Pass the single vgname as a new process_each_vg arg
instead of setting a cmd flag to tell process_each_vg
to take only the first vgname arg from argv.
Other commands with different argv formats will be
able to use it this way.
If two different VGs with the same name exist on the system,
a command that just specifies that ambiguous name will fail
with a new error:
$ vgs -o name,uuid
...
foo qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
foo vfhKCP-mpc7-KLLL-Uh08-4xPG-zLNR-4cnxJX
$ lvs foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ vgremove foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ lvs -S vg_uuid=qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
lv1 foo ...
This is implemented for process_each_vg/lv, and works
with or without lvmetad. It does not work for commands
that do not use process_each.
This change includes one exception to the behavior shown
above. If one of the VGs is foreign, and the other is not,
then the command assumes that the intended VG is the local
one and uses it.
This makes process_each_vg/lv always use the list of
vgnames on the system. When specific VGs are named on
the command line, the corresponding entries from
vgnameids_on_system are moved to vgnameids_to_process.
Previously, when specific VGs were named on the command
line, the vgnameids_on_system list was not created, and
vgnameids_to_process was created from the arg_vgnames
list (which is only names, without vgids).
Now, vgnameids_on_system is always created, and entries
are moved from that list to vgnameids_to_process -- either
some (when arg_vgnames specifies only some), or all (when
the command is processing all VGs, or needs to look at
all VGs for checking tags/selection).
This change adds one new lvmetad lookup (vg_list) to a
command that specifies VG names. It adds no new work
for other commands, e.g. non-lvmetad commands, or
commands that look at all VGs.
When using lvmetad, 'lvs foo' previously sent one
request to lvmetad: 'vg_lookup foo'.
Now, 'lvs foo' sends two requests to lvmetad:
'vg_list' and 'vg_lookup foo <uuid>'.
(The lookup can now always include the uuid in the request
because the initial vg_list contains name/vgid pairs.)
The recent addition to check for PVs that were
missed during the first iteration of processing
was unintentionally catching duplicate PVs because
duplicates were not removed from the all_devices
list when the primary dev was processed.
Also change a message from warn back to verbose.
If a VG is removed between the time that 'vgs'
or 'lvs' (with no args) creates the list of VGs
and the time that it reads the VG to process it,
then ignore the removed VG; don't report an error
that it could not be found, since it wasn't named
by the command.
PVs could be missing from the 'pvs' output if
their VG was removed at the same time that the
'pvs' command was run. To fix this:
1. If a VG is not found when processed, don't
silently skip the PVs in it, as is done when
the "skip" variable is set.
2. Repeat the VG search if some PVs are not
found on the first search through all VGs.
The second search uses a specific list of
PVs that were missed the first time.
testing:
/dev/sdb is a PV
/dev/sdd is a PV
/dev/sdg is not a PV
each test begins with:
vgcreate test /dev/sdb /dev/sdd
variations to test:
vgremove -f test & pvs
vgremove -f test & pvs -a
vgremove -f test & pvs /dev/sdb /dev/sdd
vgremove -f test & pvs /dev/sdg
vgremove -f test & pvs /dev/sdb /dev/sdg
The pvs command should always display /dev/sdb
and /dev/sdd, either as a part of VG test or not.
The pvs command should always print an error
indicating that /dev/sdg could not be found.
Commit 1a74171ca5 added
a check to ignore a VG that was FAILED_INCONSISTENT
if the command doesn't care if the VG is not found.
Remove that check because that case is never reached
by the current code.
The ONE_VGNAME_ARG was being passed and tested as
vg_read() flag but it's a cmd struct flag.
(It affects command arg processing in toollib,
not vg_read behavior. Flags related to command
processing are generally cmd struct flags, while
vg_read arg flags are generally related to vg_read
behavior.)
Running "vgremove -f VG & pvs" results in the pvs
command reporting that the VG is not found or is
inconsistent. If the VG is gone or being removed,
the pvs command should just skip it and not print
errors about it.
"Not found" is because the pvs command created the
list of VGs to process, including VG, then vgremove
removed the VG, then the pvs command came to to read
the VG to process it and did not find it.
An "inconsistent" error could be reported if vgremove
had only partially completed removing VG when pvs did
vg_read on the VG to process it, causing pvs to find
the VG in a partially-removed state.
This fix adds a flag that pvs uses to ignore a VG
that can't be read or is inconsistent.
If 'vgcreate --shared' finds both sanlock and dlm are running,
print a more accurate error message:
"Found multiple lock managers, select one with --lock-type."
When neither is running, we still print:
"Failed to detect a running lock manager to select lock type."
Using --lock-type sanlock|dlm implies --shared.
Using --shared selects lock type sanlock|dlm
(by choosing the one that's running.)
Using both --shared and --lock-type sanlock|dlm should
also be allowed (--shared is just redundant information.)
The unlock call will fail in expected and normal cases,
and should not cause the command to fail. (An actual
unlock in the lock manager should never fail.)
Add new profilable configurables:
allocation/cache_policy
allocation/cache_settings
and mark allocation/cache_pool_chunk_size as profilable as well.
Obsolete allocation/cache_pool_cachemode and
introduce new allocation/cache_mode instead.
Rename DEFAULT_CACHE_POOL_POLICY to DEFAULT_CACHE_POLICY.
When lvm is built without lvmlockd support, vgcreate using a
shared lock type would succeed and create a local VG (the
--shared option was effectively ignored). Make it fail.
Fix the same issue when using vgchange to change a VG to a
shared lock type.
Make the error messages consistent.
Keep policy name separate from policy settings and avoid
to mangling and demangling this string from same config tree.
Ensure policy_name is always defined.
There are two different failure conditions detected in
access_vg_lock_type() that should have different error
messages. This adds another failure flag so the two
cases can be distinguished to avoid printing a misleading
error message.
This prevents 'lvremove vgname' from attempting to remove the
hidden sanlock LV. Only vgremove should remove the hidden
sanlock LV holding the sanlock locks.
... Using uninitialized value "lockd_state" when calling "lockd_vg"
(even though lockd_vg assigns 0 to the lockd_state, but it looks at
previous state of lockd_state just before that so we need to have
that properly initialized!)
libdm/libdm-report.c:2934: uninit_use_in_call: Using uninitialized value "tm". Field "tm.tm_gmtoff" is uninitialized when calling "_get_final_time".
daemons/lvmlockd/lvmlockctl.c:273: uninit_use_in_call: Using uninitialized element of array "r_name" when calling "format_info_r_action". (just added FIXME as this looks unfinished?)
In process_each_{vg,lv,pv} when no vgname args are given,
the first step is to get a list of all vgid/vgname on the
system. This is exactly what lvmetad returns from a
vg_list request. The current code is doing a vg_lookup
on each VG after the vg_list and populating lvmcache with
the info for each VG. These preliminary vg_lookup's are
unnecessary, because they will be done again when the
processing functions call vg_read. This patch eliminates
the initial round of vg_lookup's, which can roughly cut in
half the number of lvmetad requests and save a lot of extra work.
Routines responsible for polling of in-progress pvmove, snapshot merge
or mirror conversion each used custom lookup functions to find vg and
lv involved in polling.
Especially pvmove used pvname to lookup pvmove in-progress. The future
lvmpolld will poll each operation by vg/lv name (internally by lvid).
Also there're plans to make pvmove able to move non-overlaping ranges
of extents instead of single PVs as of now. This would also require
to identify the opertion in different manner.
The poll_operation_id structure together with daemon_parms structure they
identify unambiguously the polling task.
With use_lvmetad=0, duplicate PVs /dev/loop0 and /dev/loop1,
where in this example, /dev/loop1 is the cached device
referenced by pv->dev, the command 'pvs /dev/loop0' reports:
Failed to find physical volume "/dev/loop0".
This is because the duplicate PV detection by pvid is
not working because _get_all_devices() is not setting
any dev->pvid for any entries. This is because the
pvid information has not yet been saved in lvmcache.
This is fixed by calling _get_vgnameids_on_system()
before _get_all_devices(), which has the effect of
caching the necessary pvid information.
With this fix, running pvs /dev/loop0, or pvs /dev/loop1,
produces no error and one line of output for the PV (the
device printed is the one cached in pv->dev, in this
example /dev/loop1.)
Running 'pvs /dev/loop0 /dev/loop1' produces no error
and two lines of output, with each device displayed
on one of the lines.
Running 'pvs -a' shows two PVs, one with loop0 and one
with loop1, and both shown as a member of the same VG.
Running 'pvs' shows only one of the duplicate PVs,
and that shows the device cached in pv->dev (loop1).
The above output is what the duplicate handling code
was previously designed to output in commits:
b64da4d8b5 toollib: search for duplicate PVs only when needed
3a7c47af0e toollib: pvs -a should display VG name for each duplicate PV
57d74a45a0 toollib: override the PV device with duplicates
c1f246fedf toollib: handle duplicate pvs in process_in_pv
As a further step after this, we may choose to change
some of those.
For all of these commands, a warning is printed about
the existence of the duplicate PVs:
Found duplicate PV ...: using /dev/loop1 not /dev/loop0
sharing connection between parent command and background
processes spawned from parent could lead to occasional failures
due to unexpected corruption in daemon responses sent to either child
or a parent.
lvmetad issued warning about duplicate config values in request.
LVM commands occasionaly failed w/ internal error after receving
corrupted response.
lvmetad connection is renewed when needed after explicit disconnect
in child
spawning a background polling from within the lv_change_activate
fn went to two problems:
1) vgchange should not spawn any background polling until after
the whole activation process for a VG is finished. Otherwise
it could lead to a duplicite request for spawning background
polling. This statement was alredy true with one exception of
mirror up-conversion polling (fixed by this commit).
2) due to current conditions in lv_change_activate lvchange cmd
couldn't start background polling for pvmove LVs if such LV was
about to get activated by the command in the same time.
This commit however doesn't alter the lvchange cmd so that it works same as
vgchange with regard to not to spawn duplicate background pollings per
unique LV.
Do not keep dangling LVs if they're removed from the vg->lvs list and
move them to vg->removed_lvs instead (this is actually similar to already
existing vg->removed_pvs list, just it's for LVs now).
Once we have this vg->removed_lvs list indexed so it's possible to
do lookups for LVs quickly, we can remove the LV_REMOVED flag as
that one won't be needed anymore - instead of checking the flag,
we can directly check the vg->removed_lvs list if the LV is present
there or not and to say if the LV is removed or not then. For now,
we don't have this index, but it may be implemented in the future.
This avoids a problem in which we're using selection on LV list - we
need to do the selection on initial state and not on any intermediary
state as we process LVs one by one - some of the relations among LVs
can be gone during this processing.
For example, processing one LV can cause the other LVs to lose the
relation to this LV and hence they're not selectable anymore with
the original selection criteria as it would be if we did selection
on inital state. A perfect example is with thin snapshots:
$ lvs -o lv_name,origin,layout,role vg
LV Origin Layout Role
lvol1 thin,sparse public,origin,thinorigin,multithinorigin
lvol2 lvol1 thin,sparse public,snapshot,thinsnapshot
lvol3 lvol1 thin,sparse public,snapshot,thinsnapshot
pool thin,pool private
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
The lvremove command above was supposed to remove lvol1 as well as
all its snapshots which have origin=lvol1. It failed to do so, because
once we removed the origin lvol1, the lvol2 and lvol3 which were
snapshots before are not snapshots anymore - the relations change
as we're processing these LVs one by one.
If we do the selection first and then execute any concrete actions on
these LVs (which is what this patch does), the behaviour is correct
then - the selection is done on the *initial state*:
$ lvremove -ff -S 'lv_name=lvol1 || origin=lvol1'
Logical volume "lvol1" successfully removed
Logical volume "lvol2" successfully removed
Logical volume "lvol3" successfully removed
Similarly for all the other situations in which relations among
LVs are being changed by processing the LVs one by one.
This patch also introduces LV_REMOVED internal LV status flag
to mark removed LVs so they're not processed further when we
iterate over collected list of LVs to be processed.
Previously, when we iterated directly over vg->lvs list to
process the LVs, we relied on the fact that once the LV is removed,
it is also removed from the vg->lvs list we're iterating over.
But that was incorrect as we shouldn't remove LVs from the list
during one iteration while we're iterating over that exact list
(dm_list_iterate_items safe can handle only one removal at
one iteration anyway, so it can't be used here).
In log messages refer to it as system ID (not System ID).
Do not put quotes around the system_id string when printing.
On the command line use systemid.
In code, metadata, and config files use system_id.
In lvmsystemid refer to the concept/entity as system_id.
Commands that can never use foreign VGs begin with
cmd->error_foreign_vgs = 1. This tells the vg_read
lib layer to print an error as soon as a foreign VG
is read.
The toollib process_each layer also prints an error if a
foreign VG is read, but is more selective about it. It
won't print an error if the command did not explicitly
name the foreign VG. We want to silently ignore foreign VGs
unless a command attempts to use one explicitly.
So, foreign VG errors are printed from two different layers:
vg_read (lower layer) and process_each (upper layer).
Commands that use toollib process_each, only want errors from
the process_each layer, not from both layers. So, process_each
disables the lower layer vg_read error message by setting
error_foreign_vgs = 0.
Commands that do not use toollib process_each, want errors
from the vg_read layer, otherwise they would get no error
message. The original cmd->error_foreign_vgs setting
enables this error.
(Commands that are allowed to operate on foreign VGs always
begin with cmd->error_foreign_vgs = 0, and all the commands
in this group use toollib process_each with the selective
error reporting.)
When checking whether the system ID permits access to a VG, check for
each permitted situation first, and only then issue the appropriate
error message. Always issue a message for now. (We'll try to
suppress some of those later when the VG concerned wasn't explicitly
requested.)
Add more messages to try to ensure every return code is checked and
every error path (and only an error path) contains a log_error().
Add self-correction to vgchange -c to deal with situations where
the cluster state and system ID state are out-of-sync (e.g. if
old tools were used).
(This reverts patch #d95c6154)
Filter complete device list through full_filter unconditionally when
we're getting the list of *all* devices even in case we're interested
only in fraction of those devices - the PVs, not the other devices
which are not PVs yet (e.g. pvs vs. pvs -a).
We need to do this full filtering whenever we're handling *complete*
list of devices, we need to be safe here, mainly if there are any
future changes and we'd forgot to change to use proper filtering then.
Also properly preventing duplicates if there are any block subsystem
components used (mpath, MD ...).
Thing here is that (under use_lvmetad=1), cmd->filter can be used
only if we're sure that the list of devices we're filtering contains
only PVs. We have to use cmd->full_filter otherwise (like it is in
case of _get_all_devices fn which acquires complete list of devices,
no matter if it is a PV or not).
Of course, cmd->full_filter is more extensive than cmd->filter
which is only a subset of full_filter.
We could optimize this in a way that if we're interested in PVs only
during process_each_pv processing (e.g. using pvs in contrast to pvs -a),
we'd get the list of PV devices directly from lvmetad from the
lvmcache_seed_infos_from_lvmetad fn call which currently updates
lvmcache only. We'd add an additional output arg for this fn to get
the list of PV devices directly in addition, without a need to iterate
over all devices which include non-PVs which we're not interested in
anyway, hence we could use only cmd->filter, not the cmd->full_filter.
So the code would look something like this:
static int _get_all_devices(....)
{
struct device_id_list *dil;
if (interested_in_pvs_only)
lvmcache_seed_infos_from_lvmetad(cmd, &dil); /* new "dil" arg */
/* the "dil" list would be filtered through cmd->filter inside lvmcache_seed_infos_from_lvmetad */
else {
lvmcache_seed_infos_from_lvmetad(cmd, NULL);
dev_iter_create(cmd->full_filter)
while (dev = dev_iter_get ...) {
dm_list_add(all_devices, &dil->list);
}
}
}
It's cleaner this way - do not mix static and dynamic
(init_processing_handle) initializers. Use the dynamic one everywhere.
This makes it easier to manage the code - there are no "exceptions"
then and we don't need to take care about two ways of initializing the
same thing - just use one common initializer throughout and it's clear.
Also, add more comments, mainly in the report_for_selection fn explaining
what is being done and why with respect to the processing_handle and
selection_handle.
We still need to get the list as the calls underneath process_each_pv
rely on this list. But still keep the change related to the filters -
if we're processing all devices, we need to use cmd->full_filter.
If we're processing only PVs, we can use cmd->filter only to save
some time which would be spent in filtering code.
When lvmetad is used and at the same time we're getting list of all
PV-capable devices, we can't use cmd->filter (which is used to filter
out lvmetad responses - so we're sure that the devices are PVs already).
To get the list of PV-capable devices, we're bypassing lvmetad (since
lvmetad only caches PVs, not all the other devices which are not PVs).
For this reason, we have to use the "full_filter" filter chain (just
like we do when we're running without lvmetad).
Example scenario:
- sdo and sdp components of MD device md0
- sdq, sdr and sds components of mpatha multipath device
- mpatha multipath device partitioned
- vda device partitioned
=> sdo,sdp,sdr,sds, mpatha and vda should be filtered!
$ lsblk -o NAME,TYPE
NAME TYPE
sdn disk
sdo disk
`-md0 raid0
sdp disk
`-md0 raid0
sdq disk
`-mpatha mpath
`-mpatha1 part
sdr disk
`-mpatha mpath
`-mpatha1 part
sds disk
`-mpatha mpath
`-mpatha1 part
vda disk
|-vda1 part
`-vda2 part
|-fedora-swap lvm
`-fedora-root lvm
Before this patch:
==================
use_lvmetad=0 (correct behaviour!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
use_lvmetad=1 (incorrect behaviour - sdo,sdp,sdq,sdr,sds and mpatha not filtered!)
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/sdo --- 0 0
/dev/sdp --- 0 0
/dev/sdq --- 0 0
/dev/sdr --- 0 0
/dev/sds --- 0 0
/dev/vda --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
With this patch applied:
========================
use_lvmetad=1
$ pvs -a
PV VG Fmt Attr PSize PFree
/dev/fedora/root --- 0 0
/dev/fedora/swap --- 0 0
/dev/mapper/mpatha1 --- 0 0
/dev/md0 --- 0 0
/dev/sdn --- 0 0
/dev/vda1 --- 0 0
/dev/vda2 fedora lvm2 a-- 9.51g 0
List of all devices is only needed if we want to process devices
which are not PVs (e.g. pvs -a). But if this is not the case, it's
useless to get the list of all devices and then discard it without
any use, which is exactly what happened in process_each_pv where
the code was never reached and the list was unused if we were
processing just PVs, not all PV-capable devices:
int process_each_pv(...)
{
...
process_all_devices = process_all_pvs &&
(cmd->command->flags & ENABLE_ALL_DEVS) &&
arg_count(cmd, all_ARG);
...
/*
* If the caller wants to process all devices (not just PVs), then all PVs
* from all VGs are processed first, removing them from all_devices. Then
* any devs remaining in all_devices are processed.
*/
_get_all_devices(cmd, &all_devices);
...
ret = _process_pvs_in_vgs(...);
...
if (!process_all_devices)
goto out;
ret = _process_device_list(cmd, &all_devices, handle, process_single_pv);
...
}
This patch adds missing check for "process_all_devices" and it gets the
list of all (including non-PV) devices only if needed:
This is a followup patch for previous patchset that enables selection in
process_each_* fns to fix an issue where field prefixes are not
automatically used for fields in selection criteria.
Use initial report type that matches the intention of each process_each_* functions:
- _process_pvs_in_vg - PVS
- process_each_vg - VGS
- process_each_lv and process_each_lv_in_vg - LVS
This is not normally needed for the selection handle init, BUT we would
miss the field prefix matching, e.g.
lvchange -ay -S 'name=lvol0'
The "name" above would not work if we didn't initialize reporting with
the LVS type at its start. If we pass proper init type, reporting code
can deduce the prefix automatically ("lv_name" in this case).
This report type is then changed further based on what selection criteria we
have. When doing pure selection, not report output, the final report type
is purely based on combination of this initial report type and report types
of the fields used in selection criteria.
The report_for_selection does the actual "reporting for selection only".
The selection status will be saved in struct selection_handle's "selected"
variable.
This applies to:
- process_each_lv_in_vg - the VG is selected only if at least one of its LVs is selected
- process_each_segment_in_lv - the LV is selected only if at least one of its LV segments is selected
- process_each_pv_in_vg - the VG is selected only if at least one of its PVs is selected
- process_each_segment_in_pv - the PV is selected only if at least one of its PV segments is selected
So this patch causes the selection result to be properly propagated up to callers.
Call _init_processing_handle, _init_selection_handle and
_destroy_processing_handle in process_each_* and related functions to
set up and destroy handles used while processing items.
The init_processing_handle, init_selection_handle and
destroy_processing_handle are helper functions that allocate and
initialize the handles used when processing items in process_each_*
and related functions.
The "struct processing_handle" contains handles to drive the selection/matching
so pass it to the _select_match_* functions which are entry points to the
selection mechanism used in process_each_* and related functions.
This is revised and edited version of former Dave Teigland's patch which
provided starting point for all the select support in process_each_* fns.
This patch replaces "void *handle" with "struct processing_handle *handle"
in process_each_*, process_single_* and related functions.
The struct processing_handle consists of two handles inside now:
- the "struct selection_handle *selection_handle" used for
applying selection criteria while processing process_each_*,
process_single_* and related functions (patches using this
logic will follow)
- the "void* custom_handle" (this is actually the original handle
used before this patch - a pointer to custom data passed into
process_each_*, process_single_* and related functions).
A full search for duplicate PVs in the case of pvs -a
is only necessary when duplicates have previously been
detected in lvmcache. Use a global variable from lvmcache
to indicate that duplicate PVs exist, so we can skip the
search for duplicates when none exist.
Previously, 'pvs -a' displayed the VG name for only the device
associated with the cached PV (pv->dev), and other duplicate
devices would have a blank VG name. This commit displays the
VG name for each of the duplicate devices. The cost of doing
this is not small: for each PV processed, the list of all
devices must be searched for duplicates.
When multiple duplicate devices are specified on the
command line, the PV is processed once for each of them,
but pv->dev is the device used each time.
This overrides the PV device to reflect the duplicate
device that was specified on the command line. This is
done by hacking the lvmcache to replace pv->dev with the
device of the duplicate being processed. (It would be
preferable to override pv->dev without munging the content
of the cache, and without sprinkling special cases throughout
the code.)
This override only applies when multiple duplicate devices are
specified on the command line. When only a single duplicate
device of pv->dev is specified, the priority is to display the
cached pv->dev, so pv->dev is not overridden by the named
duplicate device.
In the examples below, loop3 is the cached device referenced
by pv->dev, and is given priority for processing. Only after
loop3 is processed/displayed, will other duplicate devices
loop0/loop1 appear (when requested on the command line.)
With two duplicate devices, loop0 and loop3:
# pvs
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m
# pvs -o+dev_size /dev/loop0 /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop0
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
With three duplicate devices, loop0, loop1, loop3:
# pvs -o+dev_size
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3 /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop3 /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0 /dev/loop1
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
# pvs -o+dev_size /dev/loop0 /dev/loop1 /dev/loop3
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop1 not /dev/loop0
Found duplicate PV XhLbpVo0hmuwrMQLjfxuAvPFUFZqD4vr: using /dev/loop3 not /dev/loop1
PV VG Fmt Attr PSize PFree DevSize
/dev/loop0 loopa lvm2 a-- 12.00m 12.00m 16.00m
/dev/loop1 loopa lvm2 a-- 12.00m 12.00m 32.00m
/dev/loop3 loopa lvm2 a-- 12.00m 12.00m 32.00m
Processes a PV once for each time a device with its PV ID
exists on the command line.
This fixes a regression in the case where:
. devices /dev/sdA and /dev/sdB where clones (same PV ID)
. the cached VG references /dev/sdA
. before the regression, the command: pvs /dev/sdB
would display the cached device clone /dev/sdA
. after the regression, pvs /dev/sdB would display nothing,
causing vgimportclone /dev/sdB to fail.
. with this fix, pvs /dev/sdB displays /dev/sdA
Also, pvs /dev/sdA /dev/sdB will report two lines, one for each
device on the command line, but /dev/sdA is displayed for each.
This only works without lvmetad.
When processing PVs specified on the command line, the arg
name was being matched against pv_dev_name, which will not
always work:
- The PV specified on the command line could be an alias,
e.g. /dev/disk/by-id/...
- The PV specified on the command line could be any random
path to the device, e.g. /dev/../dev/sdb
To fix this, first resolve the named PV args to struct device's,
then iterate through the devices for processing.
The call to dm_config_destroy can derefence result->mem
while result is still NULL:
struct dm_config_tree *get_cachepolicy_params(struct cmd_context *cmd)
{
...
int ok = 0;
...
if (!(result = dm_config_flatten(current)))
goto_out;
...
ok = 1;
out:
if (!ok) {
dm_config_destroy(result)
...
}
...
}
ignore_vg now returns 0 for the FAILED_CLUSTERED case,
so all the ignore_vg 1 cases will return vg's with an
empty vg->pvs, so we do not need to iterate through
vg->pvs to remove the entries from the devices list.
Clean up whitespace problems in that area from the
previous commit.
- Fix problems with recent changes related to skipping in:
. _process_vgnameid_list
. _process_pvs_in_vgs
- Undo unnecessary changes to the code structure and readability.
- Preserve valid but minor changes:
. testing FAILED bit values in ignore_vg
. using "skip" value from ignore_vg instead of "ret" value
. applying the sigint check to the start of all loops
. setting stack backtrace when ECMD_PROCESSED is not returned,
i.e. apply the following pattern:
ret = process_foo();
if (ret != ECMD_PROCESSED)
stack;
if (ret > ret_max)
ret_max = ret;
Extend/fix d8923457b8 commit.
'skip'-ed VG is not holding any lock - so don't unlock such VG.
At the same time simplify the code around and relase VG at a single
place and unlock only not skiped and not ignored VGs.
Rework ignore_vg() API so it properly handles
multiple kind of vg_read_error() states.
Skip processing only otherwise valid VG.
Always return ECMD_FAILED when break is detected.
Check sigint_caught() in front of dm iterator loop.
Add stack for _process failing ret codes.
LVM2.2.02.112/tools/toollib.c:1991: leaked_storage: Variable "iter" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/filters/filter-usable.c:89: leaked_storage: Variable "f" going out of scope leaks the storage it points to.
LVM2.2.02.112/lib/activate/dev_manager.c:1874: leaked_handle: Handle variable "fd" going out of scope leaks the handle.
Let's use this function for more activations in the code.
'needs_exlusive' will enforce exlusive type for any given LV.
We may want to activate LV in exlusive mode, even when we know
the LV (as is) supports non-exlusive activation as well.
lvcreate -ay -> exclusive & local
lvcreate -aay -> exclusive & local
lvcreate -aly -> exclusive & local
lvcreate -aey -> exclusive (might be on any node).
Tool will use internal activation of unused cache pool to
clear metadata area before next use of cache-pool.
So allow to deactivation unused pool in case some error
case happend and we were not able to deactivation pool
right after metadata wipe.
Because of the recent change to process_each_pv(), the vg is always
provided when the pv is in a vg. is_pv(pv) means the pv is in a vg,
which means that the vg arg will not be NULL, which means the removed
block of code is not needed.
When we are given an existing LV name - it needs to be allowed
to pass in even restricted name as the LV could have existed
long before we introduced some new restriction on prefix/suffix.i
Fix the regression on name limits and drop restriction to be applied
on any existing LVs - only the new created LV names have to be
complient with current name restrictions.
FIXME: we are currently using restricted names incorrectly in few
other places - device_is_usable() skips restricted names,
and udev flags are also incorrectly set for restricted names
so these LVs are not getting links properly.
The warnings arg was used to enable logging of warnings
when reading a PV. This arg is turned into a set of flags
with the WARN_PV_READ flag matching the existing behavior.
A new flag WARN_INCONSISTENT is added that will cause
vg_read_internal() to log the "VG is not consistent"
warning so the various callers do not need to log
this warning themselves.
A new vg_read flag READ_WARN_INCONSISTENT is used from
reporting to enable the WARN_INCONSISTENT flag in
vg_read_internal.
[Committed by agk with cosmetic changes and tweaks.]
Process PVs by iterating through VGs, then iterating through
devices if the command needs to process non-PV devices.
The process_single function can always use the VG and PV args.
[Committed by agk with cosmetic changes and tweaks.]
Copy the same form as the new process_each_vg.
Replace unused struct cmd_vg and cmd_vg_read() replicator
code with struct vg and vg_read() directly.
The failed_lvnames arg is no longer used since the
cmd_vg replicator wrapper was removed.
[Committed by agk with cosmetic changes and tweaks.]
Split VG argument collection from processing.
This allows the two different loops through VGs to
be replaced by a single loop.
Replace unused struct cmd_vg and cmd_vg_read() replicator
code with struct vg and vg_read() directly.
[Committed by agk with cosmetic changes and tweaks.]
There are actually three filter chains if lvmetad is used:
- cmd->lvmetad_filter used when when scanning devices for lvmetad
- cmd->filter used when processing lvmetad responses
- cmd->full_fiilter (which is just cmd->lvmetad_filter + cmd->filter chained together) used
for remaining situations
This patch adds the third one - "cmd->full_filter" - currently this is
used if device processing does not fall into any of the groups before,
for example, devices which does not have the PV label yet and we're just
creating a new one or we're processing the devices where the list of the
devices (PVs) is not returned by lvmetad initially.
Currently, the cmd->full_filter is used exactly in these functions:
- lvmcache_label_scan
- _pvcreate_check
- pvcreate_vol
- lvmdiskscan
- pvscan
- _process_each_label
If lvmetad is used, then simply cmd->full_filter == cmd->filter because
cmd->lvmetad_filter is NULL in this case.
Split internals of extract_vgname into _extract_vgname.
This common code will be used for other similar function.
Reuse skip_dev_dir() instead of less mature coded to skip
device dir.
Instead of duplicating full vg/lv name - allocate string
only vg portion of lv name.
Use lv_is_* macros throughout the code base, introducing
lv_is_pvmove, lv_is_locked, lv_is_converting and lv_is_merging.
lv_is_mirror_type no longer includes pvmove.
When ignoring 'listed' volume, print info message.
(So the final command error message is a bit less confusing,
i.e. when user tries to deactive virtual origin:
> lvchange -an vg/lvol2_vorigin
Ignoring virtual origin logical volume vg/lvol2_vorigin.
One or more specified logical volume(s) not found.
Fix get_pool_params to only read params.
Add poolmetadataspare option to get_pool_params.
Move all profile code into update_pool_params.
Move recalculate code into pool_manip.c
Major update of lvconvert code to handle cache and thin.
related targets.
Code tries to unify handling of cache and thin pools.
Better supports lvm2 syntax:
lvconvert --type cache --cachepool vg/pool vg/cache
lvconvert --type thin --thinpool vg/pool vg/extorg
lvconvert --type cache-pool vg/pool
lvconvert --type thin-pool vg/pool
as well as:
lvconvert --cache --cachepool vg/pool vg/cache
lvconvert --thin --thinpool vg/pool vg/extorg
lvconvert --cachepool vg/pool
lvconvert --thinpool vg/pool
While catching much more command line errors.
(Yet couple paths still needs more tests)
Detects as much cmdline errors prior opening VG.
Uses single lvconvert_name_params to convert LV names.
Detects as much incompatibilies in VG prior prompting.
Uses single prompt to confirm whole conversion.
TODO: still the code needs fixes...
Currently, we have two modes of activation, an unnamed nominal mode
(which I will refer to as "complete") and "partial" mode. The
"complete" mode requires that a volume group be 'complete' - that
is, no missing PVs. If there are any missing PVs, no affected LVs
are allowed to activate - even RAID LVs which might be able to
tolerate a failure. The "partial" mode allows anything to be
activated (or at least attempted). If a non-redundant LV is
missing a portion of its addressable space due to a device failure,
it will be replaced with an error target. RAID LVs will either
activate or fail to activate depending on how badly their
redundancy is compromised.
This patch adds a third option, "degraded" mode. This mode can
be selected via the '--activationmode {complete|degraded|partial}'
option to lvchange/vgchange. It can also be set in lvm.conf.
The "degraded" activation mode allows RAID LVs with a sufficient
level of redundancy to activate (e.g. a RAID5 LV with one device
failure, a RAID6 with two device failures, or RAID1 with n-1
failures). RAID LVs with too many device failures are not allowed
to activate - nor are any non-redundant LVs that may have been
affected. This patch also makes the "degraded" mode the default
activation mode.
The degraded activation mode does not yet work in a cluster. A
new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
to make that work. Currently, there is limited space for this
extra flag and I am looking for possible solutions. One possible
solution is to usurp LCK_CONVERT, as it is not used. When the
locking_type is 3, the degraded mode flag simply gets dropped and
the old ("complete") behavior is exhibited.
The list of strings is used quite frequently and we'd like to reuse
this simple structure for report selection support too. Make it part
of libdevmapper for general reuse throughout the code.
This also simplifies the LVM code a bit since we don't need to
include and manage lvm-types.h anymore (the string list was the
only structure defined there).
There were two bugs before when using pvcreate --restorefile together
with data alignment and its offset specified:
- the --dataalignment was always ignored due to missing braces in the
code when validating the divisibility of supplied --dataalignment
argument with pe_start which we're just restoring:
if (pp->rp.pe_start % pp->data_alignment)
log_warn("WARNING: Ignoring data alignment %" PRIu64
" incompatible with --restorefile value (%"
PRIu64").", pp->data_alignment, pp->rp.pe_start);
pp->data_alignment = 0
The pp->data_alignment should be zeroed only if the pe_start is not
divisible with data_alignment.
- the check for compatibility of restored pe_start was incorrect too
since it did not properly count with the dataalignmentoffset that
could be supplied together with dataalignment
The proper formula is:
X * dataalignment + dataalignmentoffset == pe_start
So it should be:
if ((pp->rp.pe_start % pp->data_alignment) != pp->data_alignment_offset) {
...ignore supplied dataalignment and dataalignment offset...
}
Since commit f12ee43f2e call destroy,
it start to check all VGs are unlocked. However when we become_daemon,
we simply reset locking (since lock is still kept by parent process).
So implement a simple 'reset' flag.
This patch allows users to convert existing logical volumes into
cache pool LVs. Since cache pool LVs consist of data and metadata
sub-LVs, there is also the '--poolmetadata' (similar to thin_pool)
which allows for the specification of the metadata device.
Boolean algebra changes for process_each_lv_in_vg().
1st.
Drop process_lv variable since it's not needed.
2nd.
process_lv was always initilized to 0 - so the condition was always true.
It the condition (!tags_supplied && !lvargs_supplied) evaluates as "true",
process_all is already set to 1, so skip vg tags evaluation.
3rd.
Move check for matching lv name in the front of lv tags check
since this check can't be skipped for lvargs_matched counter.
If this filter evaluates to true, skip lv tags evaluation.
For merging thin snapshot we have to do couple extra
checks before we allow this operation.
We pretend thin-snapshot and thin-origin
are tied together and we have to properly
maintain locking.
Add a PV create which takes a paramters object that
has get/set method to configure PV creation.
Current get/set operations include:
- size
- pvmetadatacopies
- pvmetadatasize
- data_alignment
- data_alignment_offset
- zero
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=880395
Signed-off-by: Tony Asleson <tasleson@redhat.com>
Only reading a single PV works correctly only in very limited circumstances.
Moreover, we can't rely on the MDA available on the PV either, since it may be
out of date in some circumstances (until now, we believed that PVs that have an
empty MDA are always orphans, but this is not 100% reliable either).
Accept --ignoreskippedcluster with pvs, vgs, lvs, pvdisplay, vgdisplay,
lvdisplay, vgchange and lvchange to avoid the 'Skipping clustered
VG' errors when requesting information about a clustered VG
without using clustered locking and still exit with success.
The messages can still be seen with -v.
The traditional style used for optional editable definitions
/* #define X /* */
produces a bogus warning from gcc -Wall.
Rather than suppressing this with -Wno-comment, switch over to
the // comment style.
Udev daemon has recently introduced a limit on the number of udev
processes (there was no limit before). This causes a problem
when calling pvscan --cache -aay in lvmetad udev rules which
is supposed to activate the volumes. This activation is itself
synced with udev and so it waits for the activation to complete
before the pvscan finishes. The event processing can't continue
until this pvscan call is finished.
But if we're at the limit with the udev process count, we can't
instatiate any more udev processes, all such events are queued
and so we can't process the lvm activation event for which the
pvscan is waiting.
Then we're in a deadlock since the udev process with the
pvscan --cache -aay call waits for the lvm activation udev
processing to complete, but that will never happen as there's
this limit hit with the number of udev processes.
The process with pvscan --cache -aay actually times out eventually
(3min or 30sec, depends on the version of udev).
This patch makes it possible to run the pvscan --cache -aay
in the background so the udev processing can continue and hence
we can avoid the deadlock mentioned above.
When creating a new thin pool and there's no profile requested
via "lvcreate --profile ...", inherit any VG profile if it's attached.
Currently this applies to these settings:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
Fix and improve handling on sigint.
Always check for signal presence *before* calling of command,
so it will not call the command when break was hit.
If the command has been finished succesfully there is
no problem to mark the command ok and not report interrupt at all.
Fix cuple related stack; reports and assignments.
These settins are customizable by profiles:
allocation/thin_pool_zero
allocation/thin_pool_discards
allocation/thin_pool_chunk_size
activation/thin_pool_autoextend_threshold
activation/thin_pool_autoextend_percent
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
Move common code for changing activation state from
vgchange and lvchange to one function.
Fix the order of checks - so we always implicitelly
activate snapshots and thin volumes in exclusive mode,
and we do not allow local deactivation for them.
Keep the flag whether given thin pool argument has been given on command
line or it's been 'estimated'
Call of update_pool_params() must not change cmdline given args and
needs to know this info.
Since there is a need to move this update function into /lib, we cannot
use arg_count().
FIXME: we need some generic mechanism here.
For example, the old call and reference:
find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)
...now becomes:
find_config_tree_str(cmd, devices_dir_CFG)
So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
To create an Embedding Area during PV creation (pvcreate or as part of
the vgconvert operation), we need to define the Embedding Area size.
The Embedding Area start will be calculated automatically by the tools.
This patch adds --embeddingareasize argument to pvcreate and vgconvert.
Extract restorable PV creation parameters from struct pvcreate_params into
a separate struct pvcreate_restorable_params for clarity and also for better
maintainability when adding any new items later.
The motivation to grab the global lock is to avoid a scan and metadata parsing
for each PV, but the cost of obtaining metadata is _mostly_ mitigated by having
lvmetad around. Not taking the global lock improves throughput when multiple pvs
or related commands are running in parallel, like in RHEV.
Remove no longer needed warning for unsuppoted discards
for non-power-2 lvcreate commands.
(Missed from the patch for the same update in lvchange made
by commit dde5a6c52b)
Configurable settings for thin pool create
if they are not specified on command line.
New supported lvm.conf options are:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.
Use log_warn to print non-fatal warning messages.
Use of log_error would confuse checker for testing
whether proper error has been reported for some real error.
Accept -q as the short form of --quiet.
Suppress non-essential standard output if -q is given twice.
Treat log/silent in lvm.conf as equivalent to -qq.
Review all log_print messages and change some to
log_print_unless_silent.
When silent, the following commands still produce output:
dumpconfig, lvdisplay, lvmdiskscan, lvs, pvck, pvdisplay,
pvs, version, vgcfgrestore -l, vgdisplay, vgs.
[Needs checking.]
Non-essential messages are shifted from log level 4 to log level 5
for syslog and lvm2_log_fn purposes.
In process_each_pv() if we haven't yet scanned and the PV appears
to be an orphan, we must scan the other PVs looking for mdas that
reference it to find out what VG it is in.
1. If the PV has no mdas, we must scan.
2. If the PV has an mda that is not ignored we do not need to scan.
3. If the PV has an mda that is ignored, we do need to scan.
This patch fixes case 3.
> pvs -o +mda_count,vg_mda_count /dev/loop[0123]
PV VG Fmt Attr PSize PFree #PMda #VMda
/dev/loop0 vg3 lvm2 a- 96.00m 96.00m 0 1
/dev/loop1 vg3 lvm2 a- 96.00m 96.00m 1 1
/dev/loop2 vg2 lvm2 a- 96.00m 96.00m 1 2
/dev/loop3 vg2 lvm2 a- 28.00m 28.00m 1 2
Before:
> pvs /dev/loop2 /dev/loop3 /dev/loop0 /dev/loop1 --unbuffered
PV VG Fmt Attr PSize PFree
/dev/loop2 lvm2 a-- 100.00m 100.00m
/dev/loop3 vg2 lvm2 a-- 28.00m 28.00m
/dev/loop0 lvm2 a-- 100.00m 100.00m
/dev/loop1 vg3 lvm2 a-- 96.00m 96.00m
After:
> pvs /dev/loop2 /dev/loop3 /dev/loop0 /dev/loop1 --unbuffered
PV VG Fmt Attr PSize PFree
/dev/loop2 vg2 lvm2 a-- 96.00m 96.00m
/dev/loop3 vg2 lvm2 a-- 28.00m 28.00m
/dev/loop0 vg3 lvm2 a-- 96.00m 96.00m
/dev/loop1 vg3 lvm2 a-- 96.00m 96.00m
We're refererring to 'activation' all over the code and we're talking
about 'LVs being activated' all the time so let's use 'activation/activate'
everywhere for clarity and consistency (still providing the old
'available' keyword as a synonym for backward compatibility with
existing environments).
Read lvm.conf setting for monitoring for each command. So we should not
activate monitoring if the default compilation is set to monitor during
lvconvert commnads.
Patch also removes check for clustered VG and allows to disable monitoring
for clustered VG with the assumption, the problem with monitoring and dmeventd
flag passing for INGNORE is already fixed.
We want to keep this logic -
when LV is extend - extend the LV by at least given amount,
when LV is reduced - reduce the LV by at most given amount.
So for this the rounding needs to be used.
Current logic which seems to satisfy give rule is to round up all
extent values for LV resize upward except for values with '-' sign
that are round downward.
This patch also fixes the problem when lvextend --use-polices tried
to extend LV the by i.e. 20% - but the resulting 20% were smaller
the extent size thus before this patch no extension happened.
Remove DM_THIN_ERROR_DEVICE_ID from API.
Remove API warning.
Drop code that was using DM_THIN_ERROR_DEVICE_ID (already commented)
Remove debug message which slipped in through some previous commit.
Since activation of pool is now independent on thin activation,
user may do whatever he needs - thought preferable thin should stay alive,
but it it will be found inactivate, update_pool will bring the pool up.
To ensure we properly handle LV cluster locking - explicitely do
not allow to change the availability of the thin pool that is in use
for some thin LV.
As soon as the thin volume is created the only way to activate pool
is via implicit dependency.
Ignore thinpool open count for lv/vgchange operations.
leaving behind the LVM-specific parts of the code (convenience wrappers that
handle `struct device` and `struct cmd_context`, basically). A number of
functions have been renamed (in addition to getting a dm_ prefix) -- namely,
all of the config interface now has a dm_config_ prefix.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
Since format instances will use own memory pool, it's necessary to properly
deallocate it. For now, only fid is deallocated. The PV structure itself
still uses cmd mempool mostly, but anytime we'd like to add a mempool
in the struct physical_volume, we can just rename this fn to free_pv and
add the code (like we have free_vg fn for VGs).
While STRIPE_SIZE_LIMIT * 2 is basically UINT_MAX, 32bit integer
value can already overflow durin arg size parsing.
(This really happens in test where --stripesize 4294967291 is used,
in s390x uint overflow and this test is ineffective.)
Fixing some const warnings - with API change in:
int vg_extend(struct volume_group *vg, int pv_count, const char *const *pv_names,
Change is needed - as lvm2api expects const behaviour here.
So vg_extend() is doing local strdup for unescaping.
skip_dev_dir return const char* from const char* vg_name.
Rest of the patch is cleanup of related warnings.
Also using dm_report_filed_string() API change to simplify
casting in _string_disp and _lvname_disp.
Patch adds extra check for lv_name not being NULL.
Test avoids unneeded strlen call for this case.
Otherwise there is no functional change as test would fail on
size_t comparation even for NULL lv_name (thus there is no risk
of NULL dereference when taking 'true' if branch.
Reported by clang as: Argument with 'nonnull' attribute passed null
Reuse the result of the last strchr() call - make sure, 'st' point is not
null for the next strchr() call.
This is not only undocumented but is is also in violation with --help
documentation.
Using --yes without --force is useful in pvcreate when it detects
old signature.
Allow metadataignore flag to be passed in to pvcreate.
Ideally, more refactoring of the mda allocation / initialization
is warranted, but for now, we just add another parameter to 'add_mda'
to take an existing mda ignored flag. We need to do this or pv_write
loses the state of the mda 'ignored' flag before copying and writing
to disk.
Allow parsing of --vgmetadatacopies for vgcreate. Accept
--metadatacopies as a synonym for --vgmetadatacopies.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Add a field to struct volume_group to later implement metadata
balancing:
- mda_copies: target # of non-ignored mdas in the VG; default 0 (do
not control pv 'ignore mdas' bit.
This patch just adds the parameter to the structures with the default
values but does not modify any commands. Should be no functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
As for _process_one_vg() we need similar retry loop for
process_each_lv_in_vg(). This patch retries to process
failed LVs with reopened VGs.
Patch does not add any extra repeated invocations if there is not
found any missing VG during LV processing.
Patch modifes behavior of _process_one_vg().
In the first pass vg_read() collectis for replicator sorted list of
additional VGs during lock_vol().
If any other VG is needed by the replicator and it is not yet opened
then next iteration loop is taken with all collected VGs.
Flag vg->cmd_missing_vgs detects missing VGs.
Patch adds failed_lvnames to the list of parameters for process_each_lv_in_vg().
If the list is not NULL it will be filled with LV names of failing LVs
during function execution.
Application could later reiterate only on failed LVs.