IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
To have better control were the config tree could be modified use more
const pointers and very carefully downcast them back to non-const
(for config tree merge).
Set cmd->independent_metadata_areas if metadata/dirs or disk_areas in use.
- Identify and record this state.
Don't skip full scan when independent mdas are present even if memlock is set.
- Clusters and OOM aren't supported, so no problem doing the proper scans.
Avoid revalidating the label cache immediately after scanning.
- A simple optimisation.
Support scanning for a single VG in independent mdas.
- Not used by the fix but I left it in anyway as later patches might use it.
This reset of vgmem pointer causes access of already released memory.
(_vg_make_handle allocates vg from vgmem pool itself - which is a bit tricky)
Interestingly this memory fault was missed by our test suite.
Set vg to NULL after releasing it as the following memlock() test may
lead to goto for the second call of vg_release() with the already
released vg pointer.
Problem:
When both legs of a mirrored log fail, neither the log nor the parent
mirror can proceed. The repair code must be careful to replace the
log with an error target before operating on the parent - otherwise,
the parent can get stuck trying to suspend because it can't push through
any writes. The steps to replace the log device with an error target
were incomplete and resulted in the replacement not happening at all!
The code originally had all the necessary logic to complete the
replacement task, but was pulled out in a effort to clean-up that
section of code, while fixing another bug:
<offending commit msg>
In addition, I added following three changes.
- Removed tmp_orphan_lvs handling procedure
It seems that _delete_lv() can handle detached_log_lv properly
without adding mirror legs in mirrored log to tmp_orphan_lvs.
Therefore, I removed the procedure.
- Removed vg_write()/vg_commit()
Metadata is saved by vg_write()/vg_commit() just after detached_log_lv
is handled. Therefore, I removed vg_write()/vg_commit().
</offending commit msg>
http://sources.redhat.com/cgi-bin/cvsweb.cgi/LVM2/lib/metadata/mirror.c?cvsroot=lvm2&f=h#rev1.130
I've reverted the "clean-up" changes associated with that fix, but not what
that commit was actually fixing.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
page.
Add ->target_name to segtype_handler to allow a more specific target
name to be returned based on the state of the segment.
Result of trying to merge a snapshot using a kernel that doesn't have
the snapshot-merge target:
Before:
# lvconvert --merge vg/snap
Can't expand LV lv: snapshot target support missing from kernel?
Failed to suspend origin lv
After:
# lvconvert --merge vg/snap
Can't process LV lv: snapshot-merge target support missing from kernel?
Failed to suspend origin lv
Unable to merge LV "snap" into it's origin.
Use _even_rand() function instead of floor() in _bitset_with_random_bits().
floor() function is missing in dietlibc (on architectures other than x86).
Moreover using floor() to clip rand results does not assure even result
distribution. _even_rand() uses integer arithmetic only and is designed to
return evenly distributed results.
> Looks OK to me. It took a while to decipher what is the exact meaning of
> the loop in _even_rand (to a non-pseudorandomness-expert) but I am
> fairly comfortable with it now. If I understand this correctly, it
> rejects numbers that come from an "incomplete" slice of the RAND_MAX
> space (considering the number space [0, RAND_MAX] is divided into some
> "max"-sized slices and at most a single smaller slice, between [n*max,
> RAND_MAX] for suitable n -- numbers from this last slice are discarded
> because they could distort the distribution in favour of smaller
> numbers).
Signed-off-by: Przemyslaw Iskra <sparky <at> pld-linux.org>
Reviewed-by: Petr Rockai <prockai <at> redhat.com>
In other LVM memory structures such as volume_group, the field
used to store flags is called "status", and on-disk fields are called
'flags', so rename the one inside metadata_area to be consistent.
Not only is it more consistent with existing code but is cleaner
to say "the status of this mda is ignored".
Background for this patch - prajnoha pinged me on IRC this morning
about a fix he was working on related to metadataignore when
metadata/dirs was set. I was reviewing my patches from this year
and realized the 'flags' field was probably not the best choice
when I originally did the metadataignore patches.
Add supporting functions for vg_name, vg_fmt, vg_system_id.
Append "_dup" to end of supporting functions to make clear the strings
are dup'd and to avoid namespace conflict with vg_name.
Add supporting functions for pv_uuid, vg_uuid, and lv_uuid.
Call new function id_format_and_copy. Use 'const' where appropriate.
Add "_dup" suffix to indicate memory is being allocated.
Call {pv|vg|lv}_uuid_dup from lvm2app uuid functions.
This patch addresses code review request to simplify creation of 'attr'
strings. The simplification is done in this separate patch to more
easily review and ensure the simplification is done without error.
Move the creating of the 'attr' strings into a common function so
they can be called from the 'disp' functions as well as the new
'get' property functions.
Add "_dup" suffix to indicate memory is allocated.
Refactor pvstatus_disp to take pv argument and call pv_attr_dup().
This patch is similar to the other patches for pv and vg
functionality, and separates lv functionality into separate
files, concentrating on reporting fields and simple functions.
The metadata.[ch] files are very large. This patch makes a first
attempt at separating out pv functions and data, particularly
related to the reporting fields calculations.
More code could be moved here but for now I'm stopping at reporting
functions 'get' / 'set' functions.
The metadata.[ch] files are very large. This patch makes a first
attempt at separating out vg functions and data, particularly
related to the reporting fields calculations.
introduced in commit b16b4d92a7
"Improve various log messages."
fixes a lot of
../include/metadata.h:148: warning: type qualifiers ignored on function return type
Add "devices/default_data_alignment" to lvm.conf to control the internal
default that LVM2 uses: 0==64k, 1==1MB, 2==2MB, etc.
If --dataalignment (or lvm.conf's "devices/data_alignment") is specified
then it is always used to align the start of the data area. This means
the md_chunk_alignment and data_alignment_detection are disabled if set.
(Same now applies to pvcreate --dataalignmentoffset, the specified value
will be used instead of the result from data_alignment_offset_detection)
set_pe_align() still looks to use the determined default alignment
(based on lvm.conf's default_data_alignment) if the default is a
multiple of the MD or topology detected values.
In all top vg read functions only LCK_VG_READ/WRITE can be used.
All other vg lock definitions are low-level backend machinery.
Moreover, LCK_WRITE cannot be tested through bitmask.
This patch fixes these mistakes.
For _recover_vg() we do not need lock_flags, it can be only
two of above and we always upgrading to LCK_VG_WRITE lock there.
(N.B. that code is racy)
There is no functional change in code (despite wrong masking
it produces correct bits:-)
One shiny day we should use libblkid here. But now using LUKS is
very common together with LVM and pvcreate destroys LUKS completely.
So for user's convenience, try to detect LUKS signature and allow abort.
pvcreate detects MD and swap signature.
The logic hidden there is not only documented but it is also
user unfriendly. Who invented this logic should run pvcreate
on its own critical MD device to see why;-)
This patch
- creates one function instead of duplication code
- asks if user want to overwrite signature
- allows aborting (!)
(Please note that writing LVM signatute without wiping old
is wrong, it confuses blkid, MD will not work anyway and
swap and LUKS is broken too.)
Ignore snapshots when performing mirror recovery beneath an origin.
Pass LCK_ORIGIN_ONLY flag around cluster.
Add suspend_lv_origin and resume_lv_origin using LCK_ORIGIN_ONLY.
DM devices were not handled properly on nodes in a cluster that were not
where the splitmirrors command was issued. This was happening because
suspend_lv/resume_lv were being used in a place where activate_lv should
have been used.
When the suspend/resume are issued on (effectively) new LVs, their
'resource' (UUID) is not located in the lv_hash. Thus, both operations
turn into no-ops. You can see this from the output of clvmd from one
of the remote nodes:
<snip>
do_suspend_lv, lock not already held
<snip>
do_resume_lv, lock not already held
'activate_lv' enjoins the other nodes in the cluster to process the lock
and activate the new LV. clvmd output from remote node as follows:
do_lock_lv: resource 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S', cmd = 0x19 LCK_LV_ACTIVATE (READ|LV|NONBLOCK), flags = 0x84 (DMEVENTD_MONITOR ), memlock = 1
sync_lock: 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S' mode:1 flags=1
sync_lock: returning lkid 27b0001
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
The new standard in the storage industry is to default alignment of data
areas to 1MB. fdisk, parted, and mdadm have all been updated to this
default.
Update LVM to align the PV's data area start (pe_start) to 1MB. This
provides a more useful default than the previous default of 64K (which
generally ended up being a 192K pe_start once the first metadata area
was created).
Before this patch:
# pvs -o name,vg_mda_size,pe_start
PV VMdaSize 1st PE
/dev/sdd 188.00k 192.00k
After this patch:
# pvs -o name,vg_mda_size,pe_start
PV VMdaSize 1st PE
/dev/sdd 1020.00k 1.00m
The heuristic for setting the default alignment for LVM data areas is:
- If the default value (1MB) is a multiple of the detected alignment
then just use the default.
- Otherwise, use the detected value.
In practice this means we'll almost always use 1MB -- that is unless:
- the alignment was explicitly specified with --dataalignment
- or MD's full stripe width, or the {minimum,optimal}_io_size exceeds
1MB
- or the specified/detected value is not a power-of-2
An incorrect fix on July 13, 2010 for an annoyance has caused a regression.
The offending check-in was part of the 2.02.71 release of LVM. That
check-in caused any PVs specified on the command line to be ignored when
performing a mirror split.
This patch reverses the aforementioned check-in (solving the regressions)
and posits a new solution to the list reversal problem. The original
problem was that we would always take the lowest mimage LVs from a mirror
when performing a split, but what we really want is to take the highest
mimage LVs. This patch accomplishes that by working through the list in
reverse order - choosing the higher numbered mimages first. (This also
reduces the amount of processing necessary.)
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Takahiro Yasui <takahiro.yasui@hds.com>
all but one mirror leg.
<patch header>
To handle a double failure of a mirrored log, Jon's two patches are
commited, however, lvconvert command can't still handle an error
when mirror leg and mirrored log got failure at the same time.
[Patch]: Handle both devices of a mirrored log failing (bug 607347)
posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00009.html
commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00027.html
[Patch]: Handle both devices of a mirrored log failing (bug 607347) -
additional fix
posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00093.html
commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00101.html
In the second patch, the target type of mirrored log is replaced with
error target when remove_log is set to 1, but this procedure should be
also used in other cases such as the number of mirror leg is 1. This
patch relocates the procedure to the main path.
In addition, I added following three changes.
- Removed tmp_orphan_lvs handling procedure
It seems that _delete_lv() can handle detached_log_lv properly
without adding mirror legs in mirrored log to tmp_orphan_lvs.
Therefore, I removed the procedure.
- Removed vg_write()/vg_commit()
Metadata is saved by vg_write()/vg_commit() just after detached_log_lv
is handled. Therefore, I removed vg_write()/vg_commit().
- With Jon's second patch, we think that we don't have to call
remove_mirror_log() in _lv_update_mirrored_log() because will be
handled remove_mirror_images() in _lvconvert_mirrors_repaire().
</patch header>
Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
The cluster log daemon (cmirrord) is not multi-threaded and
can handle only one request at a time. When a log is stacked
on top of a mirror (which itself contains a 'core' log), it
creates a situation that cannot be solved without threading.
When the top level mirror issues a "resume", the log daemon
attempts to read from the log device to retrieve the log
state. However, the log is a mirror which, before issuing
the read, attempts to determine the 'sync' status of the
region of the mirror which is to be read. This sync status
request cannot be completed by the daemon because it is
blocked on a read I/O to the very mirror requesting the
sync status.
mirrors, we must also check that the log daemon (cmirrord) is running.
The log module can be auto-loaded, but the daemon cannot be
"auto-started". Failing to check for the daemon produces cryptic
messages that customers have a hard time deciphering. (The system
messages do report that the log daemon is not running, but people
don't seem to find this message easily.)
Here are examples of what is printed when the module is available,
but the log daemon has not been started.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg
Shared cluster mirrors are not available.
[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v
Setting logging type to disk
Finding volume group "vg"
Archiving volume group "vg" metadata (seqno 3).
Creating logical volume lv
Executing: /sbin/modprobe dm-log-userspace
Cluster mirror log daemon is not running
Shared cluster mirrors are not available.
Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).
The main problem with these bugs was that the newly split
off LV was not being suspended properly. This meant that
the memlock count was not being balanced, the DM devices
were not being renamed, and some DM devices which should
have been removed were not.
I've also renamed some of the variables and added comments
to make things clearer as to what is going on. (I can break
this patch in two if it means easier review.)
even though there was no log. A simple run through the in-tree test
suite would have caught this. :(
- if (lv_is_mirrored(detached_log_lv) &&
+ if (detached_log_lv && lv_is_mirrored(detached_log_lv) &&
Also, made some cosmetic changes suggested by kabi after my last check-in
(e.g. s/return 0/return_0/ and adding an error message).
A previous check-in added logic to handle the case where both images
of a mirrored log failed. It solved the problem by simply removing
the log entirely - leaving the parent mirror with a 'core' log. This
worked for most cases. However, if there was a small delay between
the failures of the two mirrored log devices, the mirror would hang,
LVM would hang, and no additional LVM commands could be issued.
When the first leg of the log fails, it signals the need for repair.
Before 'lvconvert --repair' is run by dmeventd, the second leg fails.
'lvconvert' would see both devices as failed and try to remove the
log entirely. When it came time to suspend the parent mirror to
update the configuration, the suspend would hang because it couldn't
get any I/O through the mirrored log, which was plugged waiting for
corrective action. The solution is to replace the log with an error
target to clear any pending writes before removing it. This allows
the parent mirror to suspend and make the proper changes.
Pass metadataignore through PV creation / setup paths.
As a result of this cleanup, we can remove the unnecessary setting
of mda_ignore bits inside pvcreate_single(), after call to pv_create.
For now, just set metadataignore to '0' in some places. This is
equivalent to the prior functionality, although the 0 is given
by the caller not hardcoded in _mda_setup() call.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
recovery didn't kick in. But leave the behaviour alone when using
precommitted metadata because of a warning in a confusing FIXME.
In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do. (How many other tools fail similarly now?)
And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.
This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
Allow metadataignore flag to be passed in to pvcreate.
Ideally, more refactoring of the mda allocation / initialization
is warranted, but for now, we just add another parameter to 'add_mda'
to take an existing mda ignored flag. We need to do this or pv_write
loses the state of the mda 'ignored' flag before copying and writing
to disk.
Print device name when setting or clearing metadata ignore bit.
Example:
label/label.c:160 /dev/loop2: lvm2 label detected
cache/lvmcache.c:1136 lvmcache: /dev/loop2: now in VG #orphans_lvm2 (#orphans_lvm2)
metadata/metadata.c:4142 Setting mda ignored flag for metadata_locn /dev/loop2.
format_text/text_label.c:318 Skipping mda with ignored flag on device /dev/loop2 at offset 4096
Logging isn't ideal, especially for mda_set_ignore. Ideally we'd
like to display the device name and offset in this case but this
requires a bit more work and a per-format 'mda_description' function
pointer definition (we don't have access to mda_context in
metadata.c).