IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Use the new pvcreate_each_device() function from
toollib, previously added for pvcreate, in place
of the old pvcreate_vol().
This also requires shifting the location where the
lock is acquired for the new VG name. The lock for
the new VG is supposed to be acquired before pvcreate.
This means splitting the vg_lock_newname() out of
vg_create(), and calling vg_lock_newname() directly
before pvcreate, and then calling the remainder of
vg_create() after pvcreate.
The new function vg_lock_and_create() now does
vg_lock_newname() + vg_create(), like the previous
version of vg_create().
The lock on the new VG name is released before the
pvcreate and reacquired after the pvcreate because
pvcreate needs to reset lvmcache, which doesn't work
when locks are held. An exception could likely be
made for the new VG name lock, which would allow
vgcreate to hold the new VG name lock across the
pvcreate step.
This is common code for handling PV create/remove
that can be shared by pvcreate/vgcreate/vgextend/pvremove.
This does not change any commands to use the new code.
- Pull out the hidden equivalent of process_each_pv
into an actual top level process_each_pv.
- Pull the prompts to the top level, and do not
run any prompts while locks are held.
The orphan lock is reacquired after any prompts are
done, and the devices being created are checked for
any change made while the lock was not held.
Previously, pvcreate_vol() was the shared function for
creating a PV for pvcreate, vgcreate, vgextend.
Now, it will be toollib function pvcreate_each_device().
pvcreate_vol() was called effectively as a helper, from
within vgcreate and vgextend code paths.
pvcreate_each_device() will be called at the same level
as other process_each functions.
One of the main problems with pvcreate_vol() is that
it included a hidden equivalent of process_each_pv for
each device being created:
pvcreate_vol() -> _pvcreate_check() ->
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
pvcreate_each_device() reorganizes the code so that
each-VG-each-PV loop is done once, and uses the standard
process_each_pv function at the top level of the function.
This uses the vg->pv_write_list in place of the
vg->pvs_to_write list, and eliminates the use of
pvcreate_params. The label remove and zeroing
steps are shifted out of vg_write() to the higher
level like pvcreate will do.
If we know that a PV belongs to some VG and we're missing metadata
(because we have only those PV(s) from VG present in the system that
don't have metadata areas), we should skip such PV when processing
under system ID.
This is because we know that the PV belongs to some VG, but we
really can't decide whether it matches system ID unless the VG
metadata is present again.
The backup_restore_vg is used directly for restoring the VG from backup.
It's also used to do the VG conversions from one metadata format to
another which means vgconvert calls backup_restore_vg too.
When restoring VG from backup, we need to rewrite/write PV headers as
PVs may have been orphans before and now they're becoming part of some
VG - we need to write the PV_EXT_USED flag at least.
When using the backup_restore_vg for vgconvert, we need to write
completely new PV header in different format.
Avoid the special "pv_write" call and handling that was used before
this patch in vgconvert (vgconvert_single function to be more precise)
and reuse existing internal interface to register PV header for writing
(or rewriting) via vg->pvs_to_write list instead like we do it elsewhere
in the code.
This patch also resolves a problem in which PV headers with target
format were written in the vgconvert_single fn as orphans and VG
metadata were added later on - this was a tiny hack actually.
We can't do this now - we need to write the PV as belonging
to a VG because otherwise the PV_EXT_USED flag won't be written
properly (if the PV header is written as orphan, the PV_EXT_USED
is set to 0, of course, even though metadata are attached later).
So this patch removes this tiny inconsistency which was passing
just fine before because we didn't have any relation to the VG
in PV header before. Now we have the PV_EXT_USED flag which says
the "PV is used in some VG".
If we know that the PV is orphan, meaning there's at least one MDA on
that PV which does not reference any VG and at the same time there's
PV_EXT_USED flag set, we're certainly in an inconsistent state and we
need to fix this.
For example, such situation can happen during vgremove/vgreduce if we
removed/reduced the VG, but we haven't written PV headers yet because
vgremove stopped abruptly for whatever reason just before writing new
PV headers with updated state, including PV extension flags (and so the
PV_EXT_USED flag).
However, in case the PV has no MDAs at all, we can't double-check
whether the PV_EXT_USED is correct or not - if that PV is marked
as used, it's either:
- really used (but other disks with MDAs are missing)
- or the error state as described above is hit
User needs to overwrite the PV header directly if it's really clear
the PV having no MDAs does not belong to any VG and at the same time
it's still marked as being in use (pvcreate -ff <dev_name> will fix this).
For example - /dev/sda here has 1 MDA, orphan and is incorrectly marked
with PV_EXT_USED flag:
$ pvs --binary -o+pv_in_use
WARNING: Found inconsistent standalone Physical Volumes.
WARNING: Repairing flag incorrectly marking Physical Volume /dev/sda as used.
PV VG Fmt Attr PSize PFree InUse
/dev/sda lvm2 --- 128.00m 128.00m 0
Make sure we won't use a PV that is already marked as used. Normally,
VG metadata would stop us from doing that, but we can run into a
situation where such metadata is missing because PVs with MDAs
are missing and the PVs left are the ones with 0 MDAs.
(/dev/sda in this example has 0 MDAs and it belongs to a VG,
but other PVs with MDA are missing)
$ pvs -o pv_name,pv_mda_count /dev/sda
PV #PMda
/dev/sda 0
$ pvcreate /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't initialize PV '/dev/sda' without -ff.
$ pvchange -u /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Can't change PV '/dev/sda' without -ff.
Physical volume /dev/sda not changed
0 physical volumes changed / 1 physical volume not changed
$ pvremove /dev/sda
PV '/dev/sda' is marked as belonging to a VG but its metadata is missing.
(If you are certain you need pvremove, then confirm by using --force twice.)
$ vgcreate vg /dev/sda
Physical volume '/dev/sda' is marked as belonging to a VG but its metadata is missing.
Unable to add physical volume '/dev/sda' to volume group 'vg'.
This is a hotfix for a bug introduced in
6d7dc87cb3.
The bug description: First we allocate memory for
processing handle (at an address 1) then we
allocate some memory on the same pool for later use
in pvmove_poll function inside the process_each_pv
function (at an address 2). After we jump out of
process_each_pv we called destroy_processing_handle.
As a result of destroying the handle memory pool could
deallocate all memory at address 1 or higher. The
pvmove_poll function tried to copy a memory allocated
at address 2 that could be returned to the system.
If it was so it led to segfault.
We need to rethink proper fix but in the same time
cmd->mem pool is recreated per each lvm command so
this should not cause problems even when we run
multiple commands in lvm shell.
A valgrind snapshot of the corruption:
Invalid read of size 1
at 0x4C29F92: strlen (mc_replace_strmem.c:403)
by 0x5495F2E: dm_pool_strdup (pool.c:51)
by 0x1592A7: _create_id (pvmove.c:774)
by 0x159409: pvmove_poll (pvmove.c:796)
by 0x1599E3: pvmove (pvmove.c:931)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Address 0xf15df8a is 138 bytes inside a block of size 8,192 free'd
at 0x4C28430: free (vg_replace_malloc.c:446)
by 0x5494E73: dm_free_wrapper (dbg_malloc.c:357)
by 0x5495DE2: _free_chunk (pool-fast.c:318)
by 0x549561C: dm_pool_free (pool-fast.c:151)
by 0x164451: destroy_processing_handle (toollib.c:1837)
by 0x1598C1: pvmove (pvmove.c:903)
by 0x15105B: lvm_run_command (lvmcmdline.c:1655)
by 0x1523C3: lvm2_main (lvmcmdline.c:2121)
by 0x1754F3: main (lvm.c:22)
Fix regression caused by c9f021de0b.
This commit actually transfered real-action (e.g. device removal)
into the next loop which has however missed to check for break.
So add check for break also there.
When creating a list in 'context of command' - use proper mempool.
vg->vgmem is mempool related to VG metadata - and can be eventually
locked read-only when VG struct is shared.
The extent size must fits all blocks in 4294967295 sectors
(in 512b units) this is 1/2 KiB less then 2TiB.
So while previous statement 'suggested' 2TiB is still acceptable value,
make it clear it's not.
As now we support any multiples of 128KB as extent size -
values like 2047G will still 'flow-in' otherwise the largest power-of-2
supported value is 1TiB.
With 1TiB user needs 8388608 extents for 8EiB device.
(FYI such device is already unusable with todays glibc-2.22.90-27)
4GiB extent size is currently the smallest extent size which allows
a user to create 8EiB devices (with 2GiB it's less then 8EiB).
TODO: lvm2 may possibly print amount of 'lost/unused space' on a PV,
since using such ridiculously sized extent size may result in huge
space being left unaccessible.
Add a comment in _process_pvs_in_vg() to document the
place where there have been problems with processing
PVs twice.
For a while we had a hacky workaround here where we'd
skip processing a PV if its device wasn't found in
all_devices (and !is_missing_pv since we want to
process PVs with missing devices.). That workaround
was removed in commit 5cd4d46f because it was no
longer needed.
The workaround had originally been needed to prevent
a device from being processed twice when the PV had
no MDAs -- it would be processed once in its real VG
and then the workaround would prevent it from being
processed a second time in the orphan VG.
Wrongly appearing as an orphan likely happened because
lvmcache would consider the no-MDA PV an orphan unless
the real VG holding that PV was also in lvmcache.
This issue is also mentioned in pvchange where holding
the global lock allows VGs to remain in lvmcache so
PVs with 0 mdas are not considered orphans.
The workaround in _process_pvs_in_vg() was originally
intended for reporting commands, not for pvchange.
But, it was accidentally helping pvchange also because
the method described by the pvchange global lock
comment had been subverted by commit 80f4b4b8.
Commit 80f4b4b8 was found to be unnecessary, and was
reverted in commit e710bac0. This restored the
intended global lock lvmcache effect to pvchange, and
it no longer relied on the workaround in toollib.
Previously, pvmove used the function find_pv_in_vg() which did the
equivalent of process_each_pv() by doing:
find_pv_by_name() -> get_pvs() ->
get_pvs_internal() -> _get_pvs() -> get_vgids() ->
/* equivalent to process_each_pv */
dm_list_iterate_items(vgids)
vg = vg_read_internal()
dm_list_iterate_items(&vg->pvs)
With the found 'pv', it would do vg_read() on pv_vg_name(pv),
and then do the actual pvmove processing.
This commit simplifies by using process_each_pv() and putting
the actual pvmove processing into the "single" function.
This eliminates both find_pv_by_name() and the vg_read().
The processing code that followed vg_read remains the same.
The return code for the pvmove command is not based on the
process_each_pv return code, but is based on the success/fail
conditions in the existing code.
When an orphan PV is changed/resized, the
lvmlockd global lock is converted from sh
to ex. If the command is changing two
orphan PVs, the conversion to ex should
be done only once.
The problem addressed by this workaround no longer
seems to exist, so remove it. PVs with no mdas
no longer appear in both their actual VG and in
the orphan VG.
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
Also always clear the internal lvmcache after rescanning, and
reinstate a test for --trustcache so that 'pvs --trustcache'
(for example) avoids rescanning.
Before commit c1f246fedf,
_get_all_devices() did a full device scan before
get_vgnameids() was called. The full scan in
_get_all_devices() is from calling dev_iter_create(f, 1).
The '1' arg forces a full scan.
By doing a full scan in _get_all_devices(), new devices
were added to dev-cache before get_vgnameids() began
scanning labels. So, labels would be read from new devices.
(e.g. by the first 'pvs' command after the new device appeared.)
After that commit, _get_all_devices() was called
after get_vgnameids() was finished scanning labels.
So, new devices would be missed while scanning labels.
When _get_all_devices() saw the new devices (after
labels were scanned), those devices were added to
the .cache file. This meant that the second 'pvs'
command would see the devices because they would be
in .cache.
Now, the full device scan is factored out of
_get_all_devices() and called by itself at the
start of the command so that new devices will
be known before get_vgnameids() scans labels.
In general, --select should be used to specify a VG by UUID,
but vgrename already allows a uuid to be substituted for
the name, so continue to allow it in that case.
If the VG arg from the command line does not match the
name of any known VGs, then check if the arg looks like
a UUID. If it's a valid UUID, then compare it to the
UUID of known VGs. If it matches the UUID of a known VG,
then process that VG.
Pass the single vgname as a new process_each_vg arg
instead of setting a cmd flag to tell process_each_vg
to take only the first vgname arg from argv.
Other commands with different argv formats will be
able to use it this way.
If two different VGs with the same name exist on the system,
a command that just specifies that ambiguous name will fail
with a new error:
$ vgs -o name,uuid
...
foo qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
foo vfhKCP-mpc7-KLLL-Uh08-4xPG-zLNR-4cnxJX
$ lvs foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ vgremove foo
Multiple VGs found with the same name: foo
Use the --select option with VG UUID (vg_uuid).
$ lvs -S vg_uuid=qyUS65-vn32-TuKs-a8yF-wfeQ-7DkF-Fds0uf
lv1 foo ...
This is implemented for process_each_vg/lv, and works
with or without lvmetad. It does not work for commands
that do not use process_each.
This change includes one exception to the behavior shown
above. If one of the VGs is foreign, and the other is not,
then the command assumes that the intended VG is the local
one and uses it.
This makes process_each_vg/lv always use the list of
vgnames on the system. When specific VGs are named on
the command line, the corresponding entries from
vgnameids_on_system are moved to vgnameids_to_process.
Previously, when specific VGs were named on the command
line, the vgnameids_on_system list was not created, and
vgnameids_to_process was created from the arg_vgnames
list (which is only names, without vgids).
Now, vgnameids_on_system is always created, and entries
are moved from that list to vgnameids_to_process -- either
some (when arg_vgnames specifies only some), or all (when
the command is processing all VGs, or needs to look at
all VGs for checking tags/selection).
This change adds one new lvmetad lookup (vg_list) to a
command that specifies VG names. It adds no new work
for other commands, e.g. non-lvmetad commands, or
commands that look at all VGs.
When using lvmetad, 'lvs foo' previously sent one
request to lvmetad: 'vg_lookup foo'.
Now, 'lvs foo' sends two requests to lvmetad:
'vg_list' and 'vg_lookup foo <uuid>'.
(The lookup can now always include the uuid in the request
because the initial vg_list contains name/vgid pairs.)
Just for convenience to display all new configuration settings
introduced since given version (before, there was only --atversion
to display settings introduced in concrete version).
For example:
$ lvmconfig --type new --sinceversion 2.2.120
allocation {
# cache_mode="writethrough"
# cache_settings {
# }
}
global {
use_lvmlockd=0
# lvmlockd_lock_retries=3
# sanlock_lv_extend=256
use_lvmpolld=1
}
activation {
}
# report {
# compact_output_cols=""
# time_format="%Y-%m-%d %T %z"
# }
local {
# host_id=0
}
Avoid internal error message where thin pool repair code tries to
fix cache pool - was catched later in code stack, so rather
catch this early and make the repair function exlusive
to thin pools.
So far we have no code for repairing cache pools
(other then the automatic during activation/deactivation).
Add missing display_lvname in _lvconvert_merge_thin_snapshot().
Also when we detect missing origin, report Internal error,
which would likely be the primary fault here
(and avoid dereft of NULL origin as noticed by Coverity).
When the first arg is a UUID and vgrename translates
that UUID to a current VG name, the old and new VG
names are not being checked for equality. If they
are equal, it produces an internal error rather than
a proper error.
Coverity here is not fully-in-picture - but please it
with validation of pointer which currently cannot be null,
since we always return at least empty string.
This option could never have been printed in lvm2 metadata, so it could
be safely removed as it could have been set only as 0.
These configurable setting is supported via metadata profile.
The recent addition to check for PVs that were
missed during the first iteration of processing
was unintentionally catching duplicate PVs because
duplicates were not removed from the all_devices
list when the primary dev was processed.
Also change a message from warn back to verbose.
If a VG is removed between the time that 'vgs'
or 'lvs' (with no args) creates the list of VGs
and the time that it reads the VG to process it,
then ignore the removed VG; don't report an error
that it could not be found, since it wasn't named
by the command.
PVs could be missing from the 'pvs' output if
their VG was removed at the same time that the
'pvs' command was run. To fix this:
1. If a VG is not found when processed, don't
silently skip the PVs in it, as is done when
the "skip" variable is set.
2. Repeat the VG search if some PVs are not
found on the first search through all VGs.
The second search uses a specific list of
PVs that were missed the first time.
testing:
/dev/sdb is a PV
/dev/sdd is a PV
/dev/sdg is not a PV
each test begins with:
vgcreate test /dev/sdb /dev/sdd
variations to test:
vgremove -f test & pvs
vgremove -f test & pvs -a
vgremove -f test & pvs /dev/sdb /dev/sdd
vgremove -f test & pvs /dev/sdg
vgremove -f test & pvs /dev/sdb /dev/sdg
The pvs command should always display /dev/sdb
and /dev/sdd, either as a part of VG test or not.
The pvs command should always print an error
indicating that /dev/sdg could not be found.
Commit 1a74171ca5 added
a check to ignore a VG that was FAILED_INCONSISTENT
if the command doesn't care if the VG is not found.
Remove that check because that case is never reached
by the current code.
The ONE_VGNAME_ARG was being passed and tested as
vg_read() flag but it's a cmd struct flag.
(It affects command arg processing in toollib,
not vg_read behavior. Flags related to command
processing are generally cmd struct flags, while
vg_read arg flags are generally related to vg_read
behavior.)
Running "vgremove -f VG & pvs" results in the pvs
command reporting that the VG is not found or is
inconsistent. If the VG is gone or being removed,
the pvs command should just skip it and not print
errors about it.
"Not found" is because the pvs command created the
list of VGs to process, including VG, then vgremove
removed the VG, then the pvs command came to to read
the VG to process it and did not find it.
An "inconsistent" error could be reported if vgremove
had only partially completed removing VG when pvs did
vg_read on the VG to process it, causing pvs to find
the VG in a partially-removed state.
This fix adds a flag that pvs uses to ignore a VG
that can't be read or is inconsistent.
Make lvm2_disable_dmeventd_monitoring() more explicit.
As memlock_inc_daemon() is also used by clvmd, which
does changes dmeventd and suspend ignore state at
some stages - make updates of these 2 variable
tied to the call of lvm2_disable_dmeventd_monitoring().
Once this call is made dmeventd monitoring
and suspended devices are ignored.
TODO: all lvm-global settings should really be moved
to command context.
CONVERTING status flag is a tricky one. It's not set when converting
a non-mirror LV type to the mirror type, i.e.: linear -> two leg mirror.
Also the conversion itself is instant and doesn't require to be polled.
When mirror reaches sync state there's no final update on VG metadata
for lvmpolld to be made thereby report_progress in fact doesn't report
percentage of mirror being converted but percentage of mirror
being in sync. Perhaps we should reword the lvconvert output here.
On the other hand CONVERTING is set while we upconvert the mirror
from i.e. two leg mirror to four leg mirror. In such case the operation
is required to be polled so that lvmpolld can cleanup temporary
conversion log when the conversion is over.
Ignore CONVERTING lv_type for the moment and match LVs only by uuids
during 'mirror conversion'/'waiting for a sync to finish'.
The old code made two loops through the PVs: in the first
loop it found the max PV and VG name lengths, and in the
second loop it printed each PV using the name lengths as
field widths for aligning columns.
The new code uses process_each_pv() which makes one loop
through the PVs. In the *first* call to pvscan_single(),
the max name lengths are found by looping through the
lvmcache entries which have been populated by the generic
process_each code prior to calling any _single functions.
Subsequent calls to pvscan_single() reuse the max lengths
that were found by the first call.
The new report/compact_output_cols setting has exactly the same effect
as report/compact_output setting. The difference is that with the new
setting it's possible to define which cols should be compacted exactly
in contrast to all cols in case of report/compact_output.
In case both compact_output and compact_output_cols is enabled/set,
the compact_output prevails.
For example:
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols=""
$ lvs vg
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=0
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize Log Cpy%Sync Convert
lvol0 vg -wi-a----- 4.00m
---
$ lvmconfig --type full report/compact_output report/compact_output_cols
compact_output=1
compact_output_cols="data_percent,metadata_percent,pool_lv,move_pv,origin"
$ lvs vg
LV VG Attr LSize
lvol0 vg -wi-a----- 4.00m
If 'vgcreate --shared' finds both sanlock and dlm are running,
print a more accurate error message:
"Found multiple lock managers, select one with --lock-type."
When neither is running, we still print:
"Failed to detect a running lock manager to select lock type."
Using --lock-type sanlock|dlm implies --shared.
Using --shared selects lock type sanlock|dlm
(by choosing the one that's running.)
Using both --shared and --lock-type sanlock|dlm should
also be allowed (--shared is just redundant information.)
When user specifies '--force' with remove/remove_all/wipe_table
use '--noflush --nolockfs' resume flags, so the operation
will not block when device underneath is blocked.
Since we may want to swap names when LVs are complex types, we cannot
avoid doing full renames on both LV stacks.
Temporarily use 'pvmove_tmeta' as unused name to prevent validation troubles.
ATM allocation can't handle stripping and cache pool allocation.
It's not yet even clear what should be actually result.
Until resolved, disable this option (it's been coredumping
inside allocation anyway).
Certain stacks of cached LVs may have unexpected consequences.
So add a warning function called when LV is cached to detect
such caces and WARN user about them - the best we could do ATM.