1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-05 13:18:20 +03:00
Commit Graph

780 Commits

Author SHA1 Message Date
Peter Rajnoha
bad35c6554 Add escape sequence for ':' and '@' found in device names used as PVs. 2010-09-23 12:02:33 +00:00
Milan Broz
c7af31dbd7 Fix return type qualifier to avoid compiler warning.
introduced in commit b16b4d92a7
"Improve various log messages."

fixes a lot of
../include/metadata.h:148: warning: type qualifiers ignored on function return type
2010-08-26 12:08:19 +00:00
Mike Snitzer
4efb1d9cbb Update heuristic used for default and detected data alignment.
Add "devices/default_data_alignment" to lvm.conf to control the internal
default that LVM2 uses: 0==64k, 1==1MB, 2==2MB, etc.

If --dataalignment (or lvm.conf's "devices/data_alignment") is specified
then it is always used to align the start of the data area.  This means
the md_chunk_alignment and data_alignment_detection are disabled if set.

(Same now applies to pvcreate --dataalignmentoffset, the specified value
will be used instead of the result from data_alignment_offset_detection)

set_pe_align() still looks to use the determined default alignment
(based on lvm.conf's default_data_alignment) if the default is a
multiple of the MD or topology detected values.
2010-08-20 20:59:05 +00:00
Dave Wysochanski
69d67dc2ca Add vg_mda_size and vg_mda_free functions.
Add supporting functions to get vg_mda_size and vg_mda_free fields.
Should be no functional change.
2010-08-20 12:43:49 +00:00
Milan Broz
586b56b18c Fix wrong use of LCK_WRITE
In all top vg read functions only LCK_VG_READ/WRITE can be used.
All other vg lock definitions are low-level backend machinery.

Moreover, LCK_WRITE cannot be tested through bitmask.
This patch fixes these mistakes.

For _recover_vg() we do not need lock_flags, it can be only
two of above and we always upgrading to LCK_VG_WRITE lock there.
(N.B. that code is racy)

There is no functional change in code (despite wrong masking
it produces correct bits:-)
2010-08-19 23:26:31 +00:00
Milan Broz
727f7bfa49 Detect LUKS signature in pvcreate
One shiny day we should use libblkid here. But now using LUKS is
very common together with LVM and pvcreate destroys LUKS completely.

So for user's convenience, try to detect LUKS signature and allow abort.
2010-08-19 23:08:18 +00:00
Milan Broz
2d5e2b52ca Change the pvcreate swap/md logic
pvcreate detects MD and swap signature.

The logic hidden there is not only documented but it is also
user unfriendly. Who invented this logic should run pvcreate
on its own critical MD device to see why;-)

This patch
 - creates one function instead of duplication code
 - asks if user want to overwrite signature
 - allows aborting (!)
 (Please note that writing LVM signatute without wiping old
 is wrong, it confuses blkid, MD will not work anyway and
 swap and LUKS is broken too.)
2010-08-19 23:03:34 +00:00
Alasdair Kergon
22149572e8 Use 'SINGLENODE' instead of 'dead' in clvmd singlenode messages.
Ignore snapshots when performing mirror recovery beneath an origin.
Pass LCK_ORIGIN_ONLY flag around cluster.
Add suspend_lv_origin and resume_lv_origin using LCK_ORIGIN_ONLY.
2010-08-17 19:25:05 +00:00
Alasdair Kergon
2d6fcbf67d Allow internal suspend and resume of origin without its snapshots. 2010-08-17 16:25:32 +00:00
Jonathan Earl Brassow
d0191bf9f4 Fix for bug 612291: dm devices of split off mirror images are not removed
DM devices were not handled properly on nodes in a cluster that were not
where the splitmirrors command was issued.  This was happening because
suspend_lv/resume_lv were being used in a place where activate_lv should
have been used.

When the suspend/resume are issued on (effectively) new LVs, their
'resource' (UUID) is not located in the lv_hash.  Thus, both operations
turn into no-ops.  You can see this from the output of clvmd from one
of the remote nodes:
<snip>
do_suspend_lv, lock not already held
<snip>
do_resume_lv, lock not already held

'activate_lv' enjoins the other nodes in the cluster to process the lock
and activate the new LV.  clvmd output from remote node as follows:
do_lock_lv: resource 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S', cmd = 0x19 LCK_LV_ACTIVATE (READ|LV|NONBLOCK), flags = 0x84 (DMEVENTD_MONITOR ), memlock = 1
sync_lock: 'zMseY7CBuO3Ty09vXlplPAHzD0Y0CovjrTdv0R1VcwggMwPdYhutHErRcwm5Nd2S' mode:1 flags=1
sync_lock: returning lkid 27b0001

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
2010-08-16 18:02:14 +00:00
Mike Snitzer
b123a82d73 Change default alignment of pe_start to 1MB.
The new standard in the storage industry is to default alignment of data
areas to 1MB.  fdisk, parted, and mdadm have all been updated to this
default.

Update LVM to align the PV's data area start (pe_start) to 1MB.  This
provides a more useful default than the previous default of 64K (which
generally ended up being a 192K pe_start once the first metadata area
was created).

Before this patch:
# pvs -o name,vg_mda_size,pe_start
  PV         VMdaSize  1st PE
  /dev/sdd     188.00k 192.00k

After this patch:
# pvs -o name,vg_mda_size,pe_start
  PV         VMdaSize  1st PE
  /dev/sdd    1020.00k   1.00m

The heuristic for setting the default alignment for LVM data areas is:
- If the default value (1MB) is a multiple of the detected alignment
  then just use the default.
- Otherwise, use the detected value.

In practice this means we'll almost always use 1MB -- that is unless:
- the alignment was explicitly specified with --dataalignment
- or MD's full stripe width, or the {minimum,optimal}_io_size exceeds
  1MB
- or the specified/detected value is not a power-of-2
2010-08-12 04:11:48 +00:00
Jonathan Earl Brassow
8d2d4f1fa0 Fix for bug 619221 - log device splitting regression
An incorrect fix on July 13, 2010 for an annoyance has caused a regression.
The offending check-in was part of the 2.02.71 release of LVM.  That
check-in caused any PVs specified on the command line to be ignored when
performing a mirror split.

This patch reverses the aforementioned check-in (solving the regressions)
and posits a new solution to the list reversal problem.  The original
problem was that we would always take the lowest mimage LVs from a mirror
when performing a split, but what we really want is to take the highest
mimage LVs.  This patch accomplishes that by working through the list in
reverse order - choosing the higher numbered mimages first.  (This also
reduces the amount of processing necessary.)

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Takahiro Yasui <takahiro.yasui@hds.com>
2010-08-06 15:38:32 +00:00
Jonathan Earl Brassow
cbd41292a4 Taka's fix for handling failure of all mirrored log devices and
all but one mirror leg.

<patch header>
To handle a double failure of a mirrored log, Jon's two patches are
commited, however, lvconvert command can't still handle an error
when mirror leg and mirrored log got failure at the same time.

  [Patch]: Handle both devices of a mirrored log failing (bug 607347)
  posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00009.html
  commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00027.html

  [Patch]: Handle both devices of a mirrored log failing (bug 607347) -
           additional fix
  posted: https://www.redhat.com/archives/lvm-devel/2010-July/msg00093.html
  commit: https://www.redhat.com/archives/lvm-devel/2010-July/msg00101.html

In the second patch, the target type of mirrored log is replaced with
error target when remove_log is set to 1, but this procedure should be
also used in other cases such as the number of mirror leg is 1. This
patch relocates the procedure to the main path.

In addition, I added following three changes.

- Removed tmp_orphan_lvs handling procedure
  It seems that _delete_lv() can handle detached_log_lv properly
  without adding mirror legs in mirrored log to tmp_orphan_lvs.
  Therefore, I removed the procedure.

- Removed vg_write()/vg_commit()
  Metadata is saved by vg_write()/vg_commit() just after detached_log_lv
  is handled. Therefore, I removed vg_write()/vg_commit().

- With Jon's second patch, we think that we don't have to call
  remove_mirror_log() in _lv_update_mirrored_log() because will be
  handled remove_mirror_images() in _lvconvert_mirrors_repaire().
</patch header>

Signed-off-by: Takahiro Yasui <takahiro.yasui@hds.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
2010-08-02 21:07:40 +00:00
Jonathan Earl Brassow
efaaf3146d Disallow mirrored logs in cluster mirrors.
The cluster log daemon (cmirrord) is not multi-threaded and
can handle only one request at a time.  When a log is stacked
on top of a mirror (which itself contains a 'core' log), it
creates a situation that cannot be solved without threading.

When the top level mirror issues a "resume", the log daemon
attempts to read from the log device to retrieve the log
state.  However, the log is a mirror which, before issuing
the read, attempts to determine the 'sync' status of the
region of the mirror which is to be read.  This sync status
request cannot be completed by the daemon because it is
blocked on a read I/O to the very mirror requesting the
sync status.
2010-08-02 19:03:45 +00:00
Dave Wysochanski
936541ec56 Remove irrelevant comments relating to vg_mda_copies. 2010-07-30 16:47:27 +00:00
Jonathan Earl Brassow
405c4a45d8 It's not enough to check for the kernel module in the case of cluster
mirrors, we must also check that the log daemon (cmirrord) is running.
The log module can be auto-loaded, but the daemon cannot be
"auto-started".  Failing to check for the daemon produces cryptic
messages that customers have a hard time deciphering.  (The system
messages do report that the log daemon is not running, but people
don't seem to find this message easily.)

Here are examples of what is printed when the module is available,
but the log daemon has not been started.

[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg
  Shared cluster mirrors are not available.

[root@bp-01 LVM2]# lvcreate -m1 -l1 -n lv vg -v
    Setting logging type to disk
    Finding volume group "vg"
    Archiving volume group "vg" metadata (seqno 3).
    Creating logical volume lv
    Executing: /sbin/modprobe dm-log-userspace
    Cluster mirror log daemon is not running
  Shared cluster mirrors are not available.
    Creating volume group backup "/etc/lvm/backup/vg" (seqno 4).
2010-07-21 13:40:21 +00:00
Jonathan Earl Brassow
60f425d1b3 Fix for bug 614164: No check for existing name when splitting mirror
The user could use the same name as an existing LV when specifying a
name for an LV split off from a mirror.  This causes all sorts of
issues.
2010-07-13 22:24:39 +00:00
Jonathan Earl Brassow
c42b084793 Fix for bugs: 612248 & 612291 Split mirror issues
The main problem with these bugs was that the newly split
off LV was not being suspended properly.  This meant that
the memlock count was not being balanced, the DM devices
were not being renamed, and some DM devices which should
have been removed were not.

I've also renamed some of the variables and added comments
to make things clearer as to what is going on.  (I can break
this patch in two if it means easier review.)
2010-07-13 21:48:16 +00:00
Jonathan Earl Brassow
a93fb6299f Failed to test for the case where a log was requested to be removed
even though there was no log.  A simple run through the in-tree test
suite would have caught this.  :(

-               if (lv_is_mirrored(detached_log_lv) &&
+               if (detached_log_lv && lv_is_mirrored(detached_log_lv) &&

Also, made some cosmetic changes suggested by kabi after my last check-in
(e.g. s/return 0/return_0/ and adding an error message).
2010-07-09 17:57:51 +00:00
Dave Wysochanski
f77fb62b2a Add log_error when strdup fails in {vg|lv}_change_tag().
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-09 16:57:44 +00:00
Alasdair Kergon
08f1ddea6c Use __attribute__ consistently throughout. 2010-07-09 15:34:40 +00:00
Alasdair Kergon
80e569104b Remove superfluous fn prototypes. 2010-07-09 15:21:10 +00:00
Jonathan Earl Brassow
aa5734f2a3 Finish fix for bug 607347: failing both redundant mirror log legs...
A previous check-in added logic to handle the case where both images
of a mirrored log failed.  It solved the problem by simply removing
the log entirely - leaving the parent mirror with a 'core' log.  This
worked for most cases.  However, if there was a small delay between
the failures of the two mirrored log devices, the mirror would hang,
LVM would hang, and no additional LVM commands could be issued.

When the first leg of the log fails, it signals the need for repair.
Before 'lvconvert --repair' is run by dmeventd, the second leg fails.
'lvconvert' would see both devices as failed and try to remove the
log entirely.  When it came time to suspend the parent mirror to
update the configuration, the suspend would hang because it couldn't
get any I/O through the mirrored log, which was plugged waiting for
corrective action.  The solution is to replace the log with an error
target to clear any pending writes before removing it.  This allows
the parent mirror to suspend and make the proper changes.
2010-07-09 15:08:12 +00:00
Dave Wysochanski
a5fb2bbff3 Pass metadataignore to pv_create, pv_setup, _mda_setup, and add_mda.
Pass metadataignore through PV creation / setup paths.
As a result of this cleanup, we can remove the unnecessary setting
of mda_ignore bits inside pvcreate_single(), after call to pv_create.
For now, just set metadataignore to '0' in some places.  This is
equivalent to the prior functionality, although the 0 is given
by the caller not hardcoded in _mda_setup() call.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-08 18:24:29 +00:00
Dave Wysochanski
dce204cec5 Init mda->list in mda_copy.
This patch should be no functional change as all callers initialize
mda->list.
2010-07-08 17:41:46 +00:00
Dave Wysochanski
7041b476ac Add warning to vgextend and pvchange if metadataignore given on cmdline.
Warn the user then change the value of vg_mda_copies.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-07-07 18:59:45 +00:00
Alasdair Kergon
7f7af46862 Adjust auto-metadata repair and caching logic to try to cope with empty mdas.
- If a PV contained empty mdas, the auto-recovery code was not kicking in.
- The 'inconsistent' state was getting lost when metadata was cached so
  recovery didn't kick in.  But leave the behaviour alone when using
  precommitted metadata because of a warning in a confusing FIXME.

In my testing, pvs and vgs didn't repair inconsistent metadata like they
used to do.  (How many other tools fail similarly now?)

And there should be no need to cache inconsistent metadata because it is
supposed to get repaired under the protection of a write lock immediately it is
discovered.

This code is in need of a redesign based on first principles.
I still see bugs in this code and this commit is risky.
2010-07-07 02:53:16 +00:00
Alasdair Kergon
6c8655ce9b fix code in 2nd mda unignore loop to match 1st loop 2010-07-06 20:09:38 +00:00
Alasdair Kergon
68f4e0c734 s/flags/mda/ 2010-07-06 17:29:50 +00:00
Alasdair Kergon
0db1bbc3c3 shorten mesg 2010-07-06 17:27:32 +00:00
Alasdair Kergon
643f234119 fix jumbled args in 'Adjusting' message 2010-07-06 17:26:08 +00:00
Alasdair Kergon
d911ec67a9 Randomly select which mdas to use or ignore.
Add some missing standard configure.in checks.
2010-07-05 22:23:15 +00:00
Alasdair Kergon
db3c1ac1c8 Add printf format attributes to yes_no_prompt & dm_{sn,as}printf and fix a calle 2010-07-02 21:16:50 +00:00
Alasdair Kergon
12eadbabdd improve vgmetadatacopies unmanaged message 2010-06-30 20:03:52 +00:00
Dave Wysochanski
3b9d1b1a96 Check for missing_pv in vg_remove loop.
If a pv is missing, we should just skip it rather than checking the
device size and failing the vgremove.
2010-06-30 19:55:43 +00:00
Alasdair Kergon
d8886386bd more mda ignore cleanups 2010-06-30 19:28:35 +00:00
Dave Wysochanski
40b4d1c3ae Refactor vg_remove_check to place pv removal into separate function. 2010-06-30 18:03:52 +00:00
Alasdair Kergon
23177eda88 more metadataignore message/code cleanup 2010-06-30 17:13:05 +00:00
Alasdair Kergon
efe75fd705 revert that 2010-06-30 14:54:29 +00:00
Alasdair Kergon
a6c4427188 suppress useless compiler warning 2010-06-30 14:52:29 +00:00
Dave Wysochanski
ef7b409966 Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg. 2010-06-30 14:48:07 +00:00
Alasdair Kergon
67b91d0848 Only attempt to guarantee 1 mda ignored if there's at least one mda in the vg. 2010-06-30 14:27:40 +00:00
Alasdair Kergon
647c64c796 Improve various log messages. 2010-06-30 13:51:11 +00:00
Dave Wysochanski
a5bf70018b Add --metadataignore to pvcreate.
Allow metadataignore flag to be passed in to pvcreate.
Ideally, more refactoring of the mda allocation / initialization
is warranted, but for now, we just add another parameter to 'add_mda'
to take an existing mda ignored flag.  We need to do this or pv_write
loses the state of the mda 'ignored' flag before copying and writing
to disk.
2010-06-30 12:17:24 +00:00
Dave Wysochanski
6af5155529 Improve logging for setting --vgmetadatacopies.
Example of logging:
metadata/metadata.c:1127     Setting mda_copies = 3 on vg vgtest
metadata/pv_manip.c:296         /dev/loop2 0:      0     25: NULL(0:0)
metadata/pv_manip.c:296         /dev/loop3 0:      0     25: NULL(0:0)
metadata/pv_manip.c:296         /dev/loop4 0:      0     25: NULL(0:0)
metadata/metadata.c:1072     Adjusting ignored mdas on vg vgtest, vg_mda_used_count=5, vg_mda_copies=3
metadata/metadata.c:1015     Setting ignore flag for 2 mdas on vg vgtest
metadata/metadata.c:4151     Setting mda ignored flag for metadata_locn /dev/loop2.
metadata/metadata.c:4151     Setting mda ignored flag for metadata_locn /dev/loop3.
2010-06-29 22:41:28 +00:00
Dave Wysochanski
d37dd5b2d3 Improve logging for metadata ignore by printing device name.
Print device name when setting or clearing metadata ignore bit.
Example:
label/label.c:160       /dev/loop2: lvm2 label detected
cache/lvmcache.c:1136         lvmcache: /dev/loop2: now in VG #orphans_lvm2 (#orphans_lvm2)
metadata/metadata.c:4142     Setting mda ignored flag for metadata_locn /dev/loop2.
format_text/text_label.c:318     Skipping mda with ignored flag on device /dev/loop2 at offset 4096
2010-06-29 22:37:32 +00:00
Dave Wysochanski
710c9373bf Add some log_verbose debug statements related to metadataignore.
Logging isn't ideal, especially for mda_set_ignore.  Ideally we'd
like to display the device name and offset in this case but this
requires a bit more work and a per-format 'mda_description' function
pointer definition (we don't have access to mda_context in
metadata.c).
2010-06-29 22:25:58 +00:00
Dave Wysochanski
a375ced300 Move code into pv_change_metadataignore library function.
In preparation to call this from both pvcreate as well as pvchange,
move the guts of metadataignore into a library function.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-29 21:32:44 +00:00
Dave Wysochanski
a9d8bf269a Allow 'all' and 'unmanaged' values for --vgmetadatacopies.
Allowing an 'all' and 'unmanaged' value is more intuitive, and
provides a simple way for users to get back to original LVM behavior
of metadata written to all PVs in the volume group.

If the user requests "--vgmetadatacopies unmanaged", this instructs
LVM not to manage the ignore bits to achieve a specific number of
metadata copies in the volume group.  The user is free to use
"pvchange --metadataignore" to control the mdas on a per-PV basis.
If the user requests "--vgmetadatacopies all", this instructs LVM
to do 2 things: 1) clear all ignore bits, and 2) set the "unmanaged"
policy going forward.

Internally, we use the special MAX_UINT32 value to indicate 'all'.
This 'just' works since it's the largest value possible for the
field and so all 'ignore' bits on all mdas in the VG will get
cleared inside _vg_metadata_balance().  However, after we've
called the _vg_metadata_balance function, we check for the special
'all' value, and if set, we write the "unmanaged" value into the
metadata.  As such, the 'all' value is never written to disk.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:40:01 +00:00
Dave Wysochanski
a09a8efb66 Update check in vg_split_mdas to account for ignored mdas list.
The check in vg_split_mdas will trigger an error if the 'from' vg
list is empty.  However, this might be ok in some instances now
that we have ignored mdas.  Relax this check so an error is triggered
only in the case where there's truly no more mdas in the 'from'
vg.

One example of where this makes a difference is with vgreduce.
If we try to vgreduce a PV with un-ignored mdas, this should trigger
the balancing function to un-ignore mdas on another PV in the VG.
However, we don't get to vg_write() before we fail because this
list size check fails, and we see an error message indicating:
"Cannot remove final metadata area ..."

Another example is with vgsplit into a new VG, where the PVs
being moved contain all ignored mdas.  We must move the mdas on
fid->metadata_areas_ignored from 'vg_from' to 'vg_to'.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:38:56 +00:00