1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-14 23:24:55 +03:00

766 Commits

Author SHA1 Message Date
Dave Wysochanski
c0789ae1b4 Remove irrelevant comments relating to vg_mda_copies. 2010-07-30 16:47:27 +00:00
Jonathan Earl Brassow
56fe3c6176 It's not enough to check for the kernel module in the case of cluster
mirrors, we must also check that the log daemon (cmirrord) is running.
The log module can be auto-loaded, but the daemon cannot be
"auto-started".  Failing to check for the daemon produces cryptic
messages that customers have a hard time deciphering.  (The system
messages do report that the log daemon is not running, but people
don't seem to find this message easily.)

Here are examples of what is printed when the module is available,
but the log daemon has not been started.

[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg
  Shared cluster mirrors are not available.

[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v
    Setting logging type to disk
    Finding volume group "vg"
    Archiving volume group "vg" metadata (seqno 3).
    Creating logical volume lv
    Executing: /sbin/modprobe dm-log-userspace
    Cluster mirror log daemon is not running
  Shared cluster mirrors are not available.
    Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).
2010-07-21 13:40:21 +00:00
Jonathan Earl Brassow
8d983d6f2d Fix for bug 614164: No check for existing name when splitting mirror
The user could use the same name as an existing LV when specifying a
name for an LV split off from a mirror.  This causes all sorts of
issues.
2010-07-13 22:24:39 +00:00
Jonathan Earl Brassow
659f47f76a Fix for bugs: 612248 & 612291 Split mirror issues
The main problem with these bugs was that the newly split
off LV was not being suspended properly.  This meant that
the memlock count was not being balanced, the DM devices
were not being renamed, and some DM devices which should
have been removed were not.

I've also renamed some of the variables and added comments
to make things clearer as to what is going on.  (I can break
this patch in two if it means easier review.)
2010-07-13 21:48:16 +00:00
Jonathan Earl Brassow
ddedf42d21 Failed to test for the case where a log was requested to be removed
even though there was no log.  A simple run through the in-tree test
suite would have caught this.  :(

-               if (lv_is_mirrored(detached_log_lv) &&
+               if (detached_log_lv && lv_is_mirrored(detached_log_lv) &&

Also, made some cosmetic changes suggested by kabi after my last check-in
(e.g. s/return 0/return_0/ and adding an error message).
2010-07-09 17:57:51 +00:00
Dave Wysochanski
b3a13b17cb Add log_error when strdup fails in {vg|lv}_change_tag().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-09 16:57:44 +00:00
Alasdair Kergon
2d3164a59f Use __attribute__ consistently throughout. 2010-07-09 15:34:40 +00:00
Alasdair Kergon
d8d4a1d694 Remove superfluous fn prototypes. 2010-07-09 15:21:10 +00:00
Jonathan Earl Brassow
0b18937cbe Finish fix for bug 607347: failing both redundant mirror log legs...
A previous check-in added logic to handle the case where both images
of a mirrored log failed.  It solved the problem by simply removing
the log entirely - leaving the parent mirror with a 'core' log.  This
worked for most cases.  However, if there was a small delay between
the failures of the two mirrored log devices, the mirror would hang,
LVM would hang, and no additional LVM commands could be issued.

When the first leg of the log fails, it signals the need for repair.
Before 'lvconvert --repair' is run by dmeventd, the second leg fails.
'lvconvert' would see both devices as failed and try to remove the
log entirely.  When it came time to suspend the parent mirror to
update the configuration, the suspend would hang because it couldn't
get any I/O through the mirrored log, which was plugged waiting for
corrective action.  The solution is to replace the log with an error
target to clear any pending writes before removing it.  This allows
the parent mirror to suspend and make the proper changes.
2010-07-09 15:08:12 +00:00
Dave Wysochanski
264091a239 Pass metadataignore to pv_create, pv_setup, _mda_setup, and add_mda.
Pass metadataignore through PV creation / setup paths.
As a result of this cleanup, we can remove the unnecessary setting
of mda_ignore bits inside pvcreate_single(), after call to pv_create.
For now, just set metadataignore to '0' in some places.  This is
equivalent to the prior functionality, although the 0 is given
by the caller not hardcoded in _mda_setup() call.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-08 18:24:29 +00:00
Dave Wysochanski
d324fd94df Init mda->list in mda_copy.
This patch should be no functional change as all callers initialize
mda->list.
2010-07-08 17:41:46 +00:00
Dave Wysochanski
94cb2db723 Add warning to vgextend and pvchange if metadataignore given on cmdline.
Warn the user then change the value of vg_mda_copies.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-07 18:59:45 +00:00
Alasdair Kergon
23aa1c6524 Adjust auto-metadata repair and caching logic to try to cope with empty mdas.
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
  recovery didn't kick in.  But leave the behaviour alone when using
  precommitted metadata because of a warning in a confusing FIXME.

In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do.  (How many other tools fail similarly now?)

And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.

This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
2010-07-07 02:53:16 +00:00
Alasdair Kergon
c1589415f0 fix code in 2nd mda unignore loop to match 1st loop 2010-07-06 20:09:38 +00:00
Alasdair Kergon
389cbc3686 s/flags/mda/ 2010-07-06 17:29:50 +00:00
Alasdair Kergon
2508b4000b shorten mesg 2010-07-06 17:27:32 +00:00
Alasdair Kergon
e5164acc9c fix jumbled args in 'Adjusting' message 2010-07-06 17:26:08 +00:00
Alasdair Kergon
38c6e8faf6 Randomly select which mdas to use or ignore.
Add some missing standard configure.in checks.
2010-07-05 22:23:15 +00:00
Alasdair Kergon
ed2630dce5 Add printf format attributes to yes_no_prompt & dm_{sn,as}printf and fix a calle 2010-07-02 21:16:50 +00:00
Alasdair Kergon
17cffe2bf9 improve vgmetadatacopies unmanaged message 2010-06-30 20:03:52 +00:00
Dave Wysochanski
d1b068a68a Check for missing_pv in vg_remove loop.
If a pv is missing, we should just skip it rather than checking the
device size and failing the vgremove.
2010-06-30 19:55:43 +00:00
Alasdair Kergon
a5053f6ada more mda ignore cleanups 2010-06-30 19:28:35 +00:00
Dave Wysochanski
261a3e7e78 Refactor vg_remove_check to place pv removal into separate function. 2010-06-30 18:03:52 +00:00
Alasdair Kergon
36e28d4b4b more metadataignore message/code cleanup 2010-06-30 17:13:05 +00:00
Alasdair Kergon
51c4e1ea4b revert that 2010-06-30 14:54:29 +00:00
Alasdair Kergon
c8f03c38fe suppress useless compiler warning 2010-06-30 14:52:29 +00:00
Dave Wysochanski
f0c79d95f5 Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg. 2010-06-30 14:48:07 +00:00
Alasdair Kergon
d13b495217 Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg. 2010-06-30 14:27:40 +00:00
Alasdair Kergon
b16b4d92a7 Improve various log messages. 2010-06-30 13:51:11 +00:00
Dave Wysochanski
fdcf749779 Add --metadataignore to pvcreate.
Allow metadataignore flag to be passed in to pvcreate.
Ideally, more refactoring of the mda allocation / initialization
is warranted, but for now, we just add another parameter to 'add_mda'
to take an existing mda ignored flag.  We need to do this or pv_write
loses the state of the mda 'ignored' flag before copying and writing
to disk.
2010-06-30 12:17:24 +00:00
Dave Wysochanski
ed38f47f23 Improve logging for setting --vgmetadatacopies.
Example of logging:
metadata/metadata.c:1127     Setting mda_copies = 3 on vg vgtest
metadata/pv_manip.c:296         /dev/loop2 0:      0     25: NULL(0:0)
metadata/pv_manip.c:296         /dev/loop3 0:      0     25: NULL(0:0)
metadata/pv_manip.c:296         /dev/loop4 0:      0     25: NULL(0:0)
metadata/metadata.c:1072     Adjusting ignored mdas on vg vgtest, vg_mda_used_count=5, vg_mda_copies=3
metadata/metadata.c:1015     Setting ignore flag for 2 mdas on vg vgtest
metadata/metadata.c:4151     Setting mda ignored flag for metadata_locn /dev/loop2.
metadata/metadata.c:4151     Setting mda ignored flag for metadata_locn /dev/loop3.
2010-06-29 22:41:28 +00:00
Dave Wysochanski
3b0585be3b Improve logging for metadata ignore by printing device name.
Print device name when setting or clearing metadata ignore bit.
Example:
label/label.c:160       /dev/loop2: lvm2 label detected
cache/lvmcache.c:1136         lvmcache: /dev/loop2: now in VG #orphans_lvm2 (#orphans_lvm2)
metadata/metadata.c:4142     Setting mda ignored flag for metadata_locn /dev/loop2.
format_text/text_label.c:318     Skipping mda with ignored flag on device /dev/loop2 at offset 4096
2010-06-29 22:37:32 +00:00
Dave Wysochanski
ebe1a8080c Add some log_verbose debug statements related to metadataignore.
Logging isn't ideal, especially for mda_set_ignore.  Ideally we'd
like to display the device name and offset in this case but this
requires a bit more work and a per-format 'mda_description' function
pointer definition (we don't have access to mda_context in
metadata.c).
2010-06-29 22:25:58 +00:00
Dave Wysochanski
593bb6470b Move code into pv_change_metadataignore library function.
In preparation to call this from both pvcreate as well as pvchange,
move the guts of metadataignore into a library function.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-29 21:32:44 +00:00
Dave Wysochanski
996360df6b Allow 'all' and 'unmanaged' values for --vgmetadatacopies.
Allowing an 'all' and 'unmanaged' value is more intuitive, and
provides a simple way for users to get back to original LVM behavior
of metadata written to all PVs in the volume group.

If the user requests "--vgmetadatacopies unmanaged", this instructs
LVM not to manage the ignore bits to achieve a specific number of
metadata copies in the volume group.  The user is free to use
"pvchange --metadataignore" to control the mdas on a per-PV basis.
If the user requests "--vgmetadatacopies all", this instructs LVM
to do 2 things: 1) clear all ignore bits, and 2) set the "unmanaged"
policy going forward.

Internally, we use the special MAX_UINT32 value to indicate 'all'.
This 'just' works since it's the largest value possible for the
field and so all 'ignore' bits on all mdas in the VG will get
cleared inside _vg_metadata_balance().  However, after we've
called the _vg_metadata_balance function, we check for the special
'all' value, and if set, we write the "unmanaged" value into the
metadata.  As such, the 'all' value is never written to disk.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:40:01 +00:00
Dave Wysochanski
e52586db1c Update check in vg_split_mdas to account for ignored mdas list.
The check in vg_split_mdas will trigger an error if the 'from' vg
list is empty.  However, this might be ok in some instances now
that we have ignored mdas.  Relax this check so an error is triggered
only in the case where there's truly no more mdas in the 'from'
vg.

One example of where this makes a difference is with vgreduce.
If we try to vgreduce a PV with un-ignored mdas, this should trigger
the balancing function to un-ignore mdas on another PV in the VG.
However, we don't get to vg_write() before we fail because this
list size check fails, and we see an error message indicating:
"Cannot remove final metadata area ..."

Another example is with vgsplit into a new VG, where the PVs
being moved contain all ignored mdas.  We must move the mdas on
fid->metadata_areas_ignored from 'vg_from' to 'vg_to'.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:38:56 +00:00
Dave Wysochanski
6cb9642200 Ensure fid mda lists are populated correctly during vgextend.
The vgextend path calls add_pv_to_vg().  Inside add_pv_to_vg(),
we must ensure we pass the correct mdas list into pv_setup(), as
copies of mdas are placed on the vg->fid list.  If we don't place
the mdas on the correct vg->fid list, the various counts may be
incorrect and the metadata balance algorithm will not work when
called from vg_write() path.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:38:39 +00:00
Dave Wysochanski
074c1880ff Implement _vg_adjust_ignored_mdas and call from vg_write() path.
Compare the value of the newly added vg_mda_copies field
(--vgmetadatacopies parameter) with the current count of
in-use mdas and ignoring or unignoring mdas as necessary to
get to the target count.  Also, as a safety check before
returning, ensure we have at least one mda enabled.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:37:54 +00:00
Dave Wysochanski
12910a5a29 Add vg get/set methods for VG metadata copies.
This patch adds the get and partially implemented set function.
The 'set' function should probably ignore or un-ignore metadata areas
based on new values.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:36:56 +00:00
Dave Wysochanski
eb93c134a7 Add mda_copies to VG structures and initialization.
Add a field to struct volume_group to later implement metadata
balancing:
- mda_copies: target # of non-ignored mdas in the VG; default 0 (do
not control pv 'ignore mdas' bit.

This patch just adds the parameter to the structures with the default
values but does not modify any commands.  Should be no functional change.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:36:37 +00:00
Dave Wysochanski
43526902ae Before committing each mda, arrange mdas so ignored mdas get committed first.
Arrange mdas so mdas that are to be ignored come first.  This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.

Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)

If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)

After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)

This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3.  However, if possible we try to avoid
inconsistent on-disk states.  Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.

We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)

Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)

This is clearly a consistent state.  PV1 will be read but the mda will be
ignored.  All other PVs contain v1 metadata, and no auto-correct will be
required.  In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:49 +00:00
Dave Wysochanski
b6ec0f9b92 Refactor vg_commit() to add _vg_commit_mdas().
Factor out calling mda->ops->vg_commit() for each mda.
No functional change.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:33 +00:00
Dave Wysochanski
4e7f1a3a9d Update _vg_read and _text_create_text_instance to use fid_add_mda[s].
When we are constructing the vg, we may need to adjust the list of
metadata_areas if there are ignored mdas.  At label read time, we
do not read the metadata of ignored mdas, and as a result, they do
not get placed on vg->fid->metadata_areas inside _text_create_text_instance
since lvmcache does not have these areas attached to vginfo->infos.
However, when we're checking the pvids inside _vg_read, after having
read another metadata area from another PV, we do have the opportunity
to update the metadata_area and metadata_areas_ignored lists based
on the read metadata_area.  We need accurate mda lists for the reporting
functions that count the ignored mdas, as well as general correctness
of mda balancing.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:17 +00:00
Dave Wysochanski
50e879ef56 Use mdas_empty_or_ignored() in place of checks for empty mda list.
With the addition of ignored mdas, we replace all checks for an empty
mda list with a new function to look for either an empty mda list or
ignored mdas.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:34:58 +00:00
Dave Wysochanski
06468b9049 Add mdas_empty_or_ignored() helper function.
Add a helper function to consolidate checking for an empty mdas list
or ignored mdas.  Ignored mdas should behave almost identically to
an empty mda list - the metadata areas should not be read or written
to.  This function will make it easier to implement metadata balancing
and easier to track pvs with an empty mda list or ignored mdas.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:34:40 +00:00
Dave Wysochanski
bca7c09e6c Define new functions and vgs/pvs fields related to mda ignore.
Define a new pvs field, pv_mda_used_count, and a new vgs field,
vg_mda_used_count to match the existing pv_mda_count and vg_mda_count.
These new fields count the number of mdas that have the 'ignored' bit
clear (they are in use on the PV / VG).  Also define various supporting
functions to implement the counting as well as setting the ignored
flag and determining if an mda is ignored.  These high level functions
call into the lower level location independent mda ignore functions
defined by earlier patches.

Note that counting ignored mdas in a vg requires traversing both lists
and checking for the ignored bit on the mda.  The count of 'ignored'
mdas then is defined by having the bit set, not by which list the mda
is on.  The list does determine whether LVM actually does read/write to
the mda, though we must count the bits in order to return accurate numbers
for the various counts.  Also, pv_mda_set_ignored must search both vg
lists for ignored mda.  If the state changes and needs to be committed
to disk, the ignored mda will be on the non-ignored list.

Note also in pv_mda_set_ignored(), we must properly manage the mda lists.
If we change the ignored state of an mda, we must change any mdas on
vg->fid->metadata_areas that correspond to this pv.  Also, we may
need to allocate a copy of the mda, as is done when fid->metadata_areas
is populated from _vg_read(), if we are un-ignoring an ignored mda.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:33:44 +00:00
Dave Wysochanski
2e163858fc Add metadata_areas_ignored list and functions to manage ignored mdas.
Add a second mda list, metadata_areas_ignored to fid, and a couple
functions, fid_add_mda() and fid_add_mdas() to help manage the lists.

These functions are needed to properly count the ignored mdas and
manage the lists attached to the 'fid' and ultimately the 'vg'.

Ensure metadata_areas_ignored is initialized in other formats, even
if the list is never used.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:33:22 +00:00
Dave Wysochanski
0ad2a4c72f Rename fid->metadata_areas to fid->metadata_areas_in_use.
Rename the metadata_areas list to an 'in_use' list to prepare for
future 'ignored' list.
2010-06-28 20:32:44 +00:00
Dave Wysochanski
d4f0f090c0 Add mda location specific mda_copy constructor.
Because of the way mdas are handled internally, where a PV in a VG
has mdas on both info->mdas and vg->fid->metadata_areas list, we
need a location independent copy constructor for struct
metadata_area.  Break up the existing format-text specific copy
constructor into a format independent piece and a format dependent
piece.

This function is necessary to properly implement pv_set_mda_ignored().

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
2010-06-28 20:31:59 +00:00
Dave Wysochanski
e56f421490 Add mda_locns_match() internal library function for mapping pv/device to VG mda.
A metadata_area is defined independent of the location.  One downside
is that there is no obvious mapping from a pv to an mda.  For a PV in
a VG, we need a way to start with a PV and end up with an MDA, if we
are to manage mdas starting with a device/pv.  This function provides
us a way to go down the list of PVs on a VG, and identify which ones
match a particular PV.

I'm not entirely happy with this approach, but it does fit into the
existing structures in a reasonable way.

An alternative solution might be to refactor the VG - PV interface such
that mdas are a list tied to a PV.  However, this seemed a bit tricky since
a PV does not come into existence until after the list of mdas is
constructed (see _vg_read() - we create a 'fid' and attach mdas to it,
then we go through them and attach pvs).

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Reviewed-by: Alasdair G Kergon <agk@redhat.com>
2010-06-28 20:31:38 +00:00