IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In order to support striped raid5/6/10 LV reshaping (change
of LV type, stripesize or number of legs), this patch
introduces infrastructure prerequisites to be used
by raid_manip.c extensions in followup patches.
This base is needed for allocation of out-of-place
reshape space required by the MD raid personalities to
avoid writing over data in-place when reading off the
current RAID layout or number of legs and writing out
the new layout or to a different number of legs
(i.e. restripe)
Changes:
- add members reshape_len to 'struct lv_segment' to store
out-of-place reshape length per component rimage
- add member data_copies to struct lv_segment
to support more than 2 raid10 data copies
- make alloc_lv_segment() aware of both reshape_len and data_copies
- adjust all alloc_lv_segment() callers to the new API
- add functions to retrieve the current data offset (needed for
out-of-place reshaping space allocation) and the devices count
from the kernel
- make libdm deptree code aware of reshape_len
- add LV flags for disk add/remove reshaping
- support import/export of the new 'struct lv_segment' members
- enhance lv_extend/_lv_reduce to cope with reshape_len
- add seg_is_*/segtype_is_* macros related to reshaping
- add target version check for reshaping
- grow rebuilds/writemostly bitmaps to 246 bit to support kernel maximal
- enhance libdm deptree code to support data_offset (out-of-place reshaping)
and delta_disk (legs add/remove reshaping) target arguments
Related: rhbz834579
Related: rhbz1191935
Related: rhbz1191978
When command calls backup() more then once (which is actually not
wanted) this warning message is shown repeatedly:
"WARNING: This metadata update is NOT backed up."
Instead now print message just once and less confuse user.
Previously, a command sent lvmetad new VG metadata in vg_commit().
In vg_commit(), devices are suspended, so any memory allocation
done by the command while sending to lvmetad, or by lvmetad while
updating its cache could deadlock if memory reclaim was triggered.
Now lvmetad is updated in unlock_vg(), after devices are resumed.
The new method for updating VG metadata in lvmetad is in two phases:
1. In vg_write(), before devices are suspended, the command sends
lvmetad a short message ("set_vg_info") telling it what the new
VG seqno will be. lvmetad sees that the seqno is newer than
the seqno of its cached VG, so it sets the INVALID flag for the
cached VG. If sending the message to lvmetad fails, the command
fails before the metadata is committed and the change is not made.
If sending the message succeeds, vg_commit() is called.
2. In unlock_vg(), after devices are resumed, the command sends
lvmetad the standard vg_update message with the new metadata.
lvmetad sees that the seqno in the new metadata matches the
seqno it saved from set_vg_info, and knows it has the latest
copy, so it clears the INVALID flag for the cached VG.
If a command fails between 1 and 2 (after committing the VG on disk,
but before sending lvmetad the new metadata), the cached VG retains
the INVALID flag in lvmetad. A subsequent command will read the
cached VG from lvmetad, see the INVALID flag, ignore the cached
copy, read the VG from disk instead, update the lvmetad copy
with the latest copy from disk, (this clears the INVALID flag
in lvmetad), and use the correct VG metadata for the command.
(This INVALID mechanism already existed for use by lvmlockd.)
Previously, vgcfgrestore would attempt to vg_remove the
existing VG from lvmetad and then vg_update to add the
restored VG. But, if there was a failure in the command
or with vg_update, the lvmetad cache would be left incorrect.
Now, disable lvmetad before the restore begins, and then
rescan to populate lvmetad from disk after restore has
written the new VG to disk.
A number of places are working on a specific dev when they
call lvmcache_info_from_pvid() to look up an info struct
based on a pvid. In those cases, pass the dev being used
to lvmcache_info_from_pvid(). When a dev is specified,
lvmcache_info_from_pvid() will verify that the cached
info it's using matches the dev being processed before
returning the info. Calling code will not mistakenly
get info for the wrong dev when duplicate devs exist.
This confusion was happening when scanning labels when
duplicate devs existed. label_read for the first dev
would add an info struct to lvmcache for that dev/pvid.
label_read for the second dev would see the pvid in
lvmcache from first dev, and mistakenly conclude that
the label_read from the second dev can be skipped
because it's already been done. By verifying that the
dev for the cached pvid matches the dev being read,
this mismatch is avoided and the label is actually read
from the second duplicate.
The lvmetad connection is created within the
init_connections() path during command startup,
rather than via the old lvmetad_active() check.
The old lvmetad_active() checks are replaced
with lvmetad_used() which is a simple check that
tests if the command is using/connected to lvmetad.
The old lvmetad_set_active(cmd, 0) calls, which
stopped the command from using lvmetad (to revert to
disk scanning), are replaced with lvmetad_make_unused(cmd).
The code in _print_historical_lv function works with temporary
"descendants_buffer" that is allocated and freed within this
function.
When printing text out, we used "outf" macro which called
"out_text" fn and it checked return value and if failed,
the macro called "return_0" automatically. But since we
use the temporary buffer, if any of the out_text calls
fails, we need to deallocate this buffer properly - that's
the "goto_out", otherwise we'll be leaking memory.
So add new "outfgo" helper macro which does the same as "outf",
but it calls "goto_out" instead of "return_0" so we can jump
to a cleanup hook at the end.
Also export historical LVs when exporting LVM2 metadata.
This is list of all historical LVs listed in
"historical_logical_volumes" metadata section with all
the properties exported for each historical LV.
For example, we have this thin snapshot sequence:
lvol1 --> lvol2 --> lvol3
\
--> lvol4
We end up with these metadata:
logical_volume {
...
(lvol1, lvol3 and lvol4 listed here as usual - no change here)
...
}
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "lvol1"
descendants = ["lvol3", "lvol4"]
}
}
By removing lvol1 further, we end up with:
historical_logical_volumes {
lvol2 {
id = "S0Dw1U-v5sF-LwAb-W9SI-pNOF-Madd-5dxSv5"
creation_time = 1456919613 # 2016-03-02 12:53:33 +0100
removal_time = 1456919620 # 2016-03-02 12:53:40 +0100
origin = "-lvol1"
descendants = ["lvol3", "lvol4"]
}
lvol1 {
id = "me0mes-aYnK-nRfT-vNlV-UiR1-GP7r-ojbROr"
creation_time = 1456919608 # 2016-03-02 12:53:28 +0100
removal_time = 1456919767 # 2016-03-02 12:56:07 +0100
}
}
This uses the vg->pv_write_list in place of the
vg->pvs_to_write list, and eliminates the use of
pvcreate_params. The label remove and zeroing
steps are shifted out of vg_write() to the higher
level like pvcreate will do.
The backup_restore_vg is used directly for restoring the VG from backup.
It's also used to do the VG conversions from one metadata format to
another which means vgconvert calls backup_restore_vg too.
When restoring VG from backup, we need to rewrite/write PV headers as
PVs may have been orphans before and now they're becoming part of some
VG - we need to write the PV_EXT_USED flag at least.
When using the backup_restore_vg for vgconvert, we need to write
completely new PV header in different format.
Avoid the special "pv_write" call and handling that was used before
this patch in vgconvert (vgconvert_single function to be more precise)
and reuse existing internal interface to register PV header for writing
(or rewriting) via vg->pvs_to_write list instead like we do it elsewhere
in the code.
This patch also resolves a problem in which PV headers with target
format were written in the vgconvert_single fn as orphans and VG
metadata were added later on - this was a tiny hack actually.
We can't do this now - we need to write the PV as belonging
to a VG because otherwise the PV_EXT_USED flag won't be written
properly (if the PV header is written as orphan, the PV_EXT_USED
is set to 0, of course, even though metadata are attached later).
So this patch removes this tiny inconsistency which was passing
just fine before because we didn't have any relation to the VG
in PV header before. Now we have the PV_EXT_USED flag which says
the "PV is used in some VG".
Use process_each_vg() to lock and read the old VG,
and then call the main vgrename code.
When real VG names are used (not a UUID in place of the
old name), the command still pre-locks the new name
(when strcmp wants it locked first), before calling
process_each_vg on the old name.
In the case where the old name is replaced with a UUID,
process_each_vg now translates that UUID into the real
VG name, which it locks and reads. In this case, we
cannot do pre-locking to maintain lock ordering because
the old name is unknown. So, in this case the strcmp
based lock ordering is suppressed and the old name is
always locked first. This opens a remote chance for
lock ordering conflict between racing vgrenames between
two names where one or both commands use the UUID.
It's getting a bit more complex here.
Basic idea behind is - check_current_backup() should not
log error when a user is using a read-only filesystem,
so e.g. vgscan will not report any error when it tries
to take missing backup.
We still have cases when error could be reported though,
e.g. the backup this would be a symbolic link, but these
are rather misconfiguration and unexpected case.
We have to modes of 'archive()' usage -
1. compulsory - fail stops command and user may try '-An' option
to do a command.
2. non-compulsory - some fails in archiving are ignorable (i.e.
read-only filesystem where archive dir is located).
Those 2 cases needs to be properly handle - i.e. the non-compulsory
logging should not be tampering error logging message production.
So more work here is needed
When checking minimum mda size, make sure the mda_size after alignment
and calculation is more than 0 - if there's no place for an MDA at the
end of the disk, the _text_pv_add_metadata_area does not try to add it
there and it returns (because we already have the MDA at the start of
the disk at least).