1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-10-28 03:27:58 +03:00
Commit Graph

625 Commits

Author SHA1 Message Date
Alasdair G Kergon
7cff640d9a activation: Fix upgrades using uuid suffixes.
2.02.106 added suffixes to some LV uuids in the kernel.

If any of these LVs is activated with 2.02.105 or earlier,
and then a later version is used, the LVs appear invisible and
activation commands fail.

The code now has to check the kernel for both old and new uuids.
2014-07-30 21:55:11 +01:00
Alasdair G Kergon
52217f6ebd raid: Fix partial activation logic for non-raid. 2014-07-23 16:13:12 +01:00
Alasdair G Kergon
99e3c13012 raid: Moved degraded activation code to raid_manip.
Adjust some messages & fn names.
2014-07-22 20:50:29 +01:00
Alasdair G Kergon
513fd029a6 config: Adjust description of activation_mode. 2014-07-21 15:50:47 +01:00
Zdenek Kabelac
c0c1ada88e pool: callback handle cache
Extend the callback functionality to handle also cache pools.

cache_check is now executed on cachepool metadata when
it's activated and deactivated.
2014-07-11 12:57:45 +02:00
Jonathan Brassow
be75076dfc activation: Add "degraded" activation mode
Currently, we have two modes of activation, an unnamed nominal mode
(which I will refer to as "complete") and "partial" mode.  The
"complete" mode requires that a volume group be 'complete' - that
is, no missing PVs.  If there are any missing PVs, no affected LVs
are allowed to activate - even RAID LVs which might be able to
tolerate a failure.  The "partial" mode allows anything to be
activated (or at least attempted).  If a non-redundant LV is
missing a portion of its addressable space due to a device failure,
it will be replaced with an error target.  RAID LVs will either
activate or fail to activate depending on how badly their
redundancy is compromised.

This patch adds a third option, "degraded" mode.  This mode can
be selected via the '--activationmode {complete|degraded|partial}'
option to lvchange/vgchange.  It can also be set in lvm.conf.
The "degraded" activation mode allows RAID LVs with a sufficient
level of redundancy to activate (e.g. a RAID5 LV with one device
failure, a RAID6 with two device failures, or RAID1 with n-1
failures).  RAID LVs with too many device failures are not allowed
to activate - nor are any non-redundant LVs that may have been
affected.  This patch also makes the "degraded" mode the default
activation mode.

The degraded activation mode does not yet work in a cluster.  A
new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
to make that work.  Currently, there is limited space for this
extra flag and I am looking for possible solutions.  One possible
solution is to usurp LCK_CONVERT, as it is not used.  When the
locking_type is 3, the degraded mode flag simply gets dropped and
the old ("complete") behavior is exhibited.
2014-07-09 22:56:11 -05:00
Peter Rajnoha
cfed0d09e8 report: select: refactor: move percent handling code to libdm for reuse 2014-06-17 16:27:21 +02:00
Zdenek Kabelac
2f260c9909 activation: retry cleanup deactivation
Enable 'retry' deactivation also in 'cleanup' phase.
It shouldn't be mostly needed - however udev now produces
more and more completelny non-synchronizable device opens,
so even for orphan devices we can't easily predict where
udevd opens devices.

So it's more preferable here to log error about device being open
and retry clean, but let the command proceed.
2014-06-10 10:51:24 +02:00
Zdenek Kabelac
2adaef8272 revert: restore original timeout
Accidently it's been commited - but it has also shown,
that on heavy loaded systems (like our test machine could be)
slightly bigger timeouts which waits longer for udev rules
processing does help and avoids occasional refuse of deactivation
because device is still being open.
(i.e. lvcreate...; lvchange -an...)

Unsure how we could now synchronize for this. On very slow(/loaded)
system 5 second timeout is simply not enough.

TODO: introduce at least lvm.conf configurable setting to
allow longer 'retry' loops.
2014-05-28 15:33:41 +02:00
Zdenek Kabelac
ae43d1afa2 activate: cleanup lv_check_not_in_use
Reindent lv_check_not_in_use to simplify internal loop code.
Also return always '0/1'  (drop -1) - since we only
check for failure (0) - and we don't really know
why  lv_info() has failed.
2014-05-27 17:08:49 +02:00
Zdenek Kabelac
cb7bba9ffe dev_manager: disable extra udev loop
Disable code which has postprocessed whole tree and reset udev flags.
We need to find out which case was troublesome - since this loop
was just hidding bug in other code parts (most probably preload tree)
2014-05-23 21:36:55 +02:00
Zdenek Kabelac
675fcfe9b7 devmapper: fix compilation without devmapper
Fix compilation when configured with --disable-devmapper option.
2014-04-30 10:26:29 +02:00
Alasdair G Kergon
b5f8f452ac tools: Add --readonly support.
Offer lock-free access to display virtual machine or clustered VG metadata
while it might be in use.
2014-04-18 02:46:34 +01:00
Zdenek Kabelac
9eab84aa2b debug: catch invalid request for tree
In general for non-toplevel LVs we shouldn't allow any _tree_action.
For now error on request for cache_pool activation which
doesn't even exist in dm-table.
2014-04-08 11:00:15 +02:00
Zdenek Kabelac
96cf9dc017 raid: use internal variables for array alloc
Don't use passed pointer when allocating policies' array.
(In case policy_argc would be NULL, this would have caused
NULL dereference).
2014-04-08 11:00:13 +02:00
Zdenek Kabelac
e2ea3cd7ba cleanup: cache use const char policy
Policy should be const char pointer.
2014-04-01 20:54:09 +02:00
Zdenek Kabelac
a018c57f0b cache: never activate cache pool
Since cache-pool is purely lvm abstraction layer LV, it never
need any device node, so do not add even 'error' device for it.
2014-04-01 20:17:10 +02:00
Zdenek Kabelac
356fdda46d lv_manip: drop cmd pointer from for_each_sub_lv
Drop unused passed cmd pointer from function.

TODO:

We have two similar functions (though not identical)

lv_manip.c: for_each_sub_lv()
metadata.c: _lv_each_dependency()

They seem to not always match - we should probably convert
to use only a single function.
2014-03-27 13:10:13 +01:00
Zdenek Kabelac
4a6f05e420 cleanup: use trigraph 2014-03-25 11:22:58 +01:00
Zdenek Kabelac
0ca16c6946 activate: report release with critical section
This function is typically called for cmd context refresh or destroy.
On the non-clustered case we already unlocked all messages,
however when i.e. 'clvmd' gets break signal it may have
still couple messages queued.

For now just report an error.
2014-03-21 22:29:22 +01:00
Zdenek Kabelac
c0f1eb5f0f dev_manager: check prohibited devices earlier
Reorder detection for internal device - since this test
is much simpler then target analysis, check it sooner.

Replace test for '68' with sizeof & ID_LEN

Add FIXME about device alias problem with is_reserved_lvname,
since this test fails on devices like /dev/dm-X
so we need to convert tests to UUID.
2014-03-12 19:38:34 +01:00
Zdenek Kabelac
4cc5c689b8 thin: add pool uuid suffix for pool volume
Even though we make pool volume as a public visible LV,
we still do not want tools to look at this volume.

While we do not create /dev/vg/lv link, device is still
accessible via /dev/mapper/vg-lv and there is no easy
way to recognize it's private without lvm2 metadata.

Enhance UUID with -pool suffix and directly skip
any LV with a suffix in  device_is_usable() call.

TODO: enhance other targets with this logic.
blkid may probably use same simple logic.
2014-03-12 00:16:27 +01:00
Zdenek Kabelac
6a0d97a65c lvm: change build_dm_uuid API
Pass directly 'lv' into this build routine,
so we can eventually add more private UUID suffixes.
2014-03-12 00:16:20 +01:00
Zdenek Kabelac
4d64e91efd thin: do not check of empty pool with messages
The empty pool is also the pool which has yet queued list of messages
and transaction_id == 1.

Problem is exposed when pool is created inactive.

lvcreate -L10 -T vg/pool -an
lvcreate -V10 -T vg/pool
2014-03-12 00:15:22 +01:00
Zdenek Kabelac
07ba047116 cleanup: relocate segment flags
Move flags for segments to segtype header where it seems more closely
related as the features are related to segtype and not activation.

Use unsigned #define - since it's more common in lvm2 source code
for bit flags.
2014-02-27 14:46:11 +01:00
Zdenek Kabelac
40e6176d25 snapshots: fix incorrect calculation of cow size
Code uses target driver version for better estimation of
max size of COW device for snapshot.

The bug can be tested with this script:
VG=vg1
lvremove -f $VG/origin
set -e
lvcreate -L 2143289344b -n origin $VG
lvcreate -n snap -c 8k -L 2304M -s $VG/origin
dd if=/dev/zero of=/dev/$VG/snap bs=1M count=2044 oflag=direct

The bug happens when these two conditions are met
* origin size is divisible by (chunk_size/16) - so that the last
  metadata area is filled completely
* the miscalculated snapshot metadata size is divisible by extent size -
  so that there is no padding to extent boundary which would otherwise
  save us

Signed-off-by:Mikulas Patocka <mpatocka@redhat.com>
2014-02-26 14:25:09 +01:00
Zdenek Kabelac
95fe823eba raid: use feature attributes for raid10
Test raid10 availability as a target feature (instead of doing
it in all the places where raid10 should be checked).

TODO: activation needs runtime validation - so metadata with raid10
are skipped from activation in user-friendly way in lvm2.
2014-02-24 21:10:13 +01:00
Zdenek Kabelac
9974136b90 cleanup: indent 2014-02-17 22:25:53 +01:00
Zdenek Kabelac
f0f4248333 activation: drop test r/w vg state for activing LV
VG status read/write is meant to influence only VG metadata.
It's not related to the read/write status of the LV itself.
2014-02-15 11:34:54 +01:00
Jonathan Brassow
96626f64fa cache: Code to allow the create/remove of cache LVs
This patch allows users to create cache LVs with 'lvcreate'.  An origin
or a cache pool LV must be created first.  Then, while supplying the
origin or cache pool to the lvcreate command, the cache can be created.

Ex1:
Here the cache pool is created first, followed by the origin which will
be cached.
~> lvcreate --type cache_pool -L 500M -n cachepool vg /dev/small_n_fast
~> lvcreate --type cache -L 1G -n lv vg/cachepool /dev/large_n_slow

Ex2:
Here the origin is created first, followed by the cache pool - allowing
a cache LV to be created covering the origin.
~> lvcreate -L 1G -n lv vg /dev/large_n_slow
~> lvcreate --type cache -L 500M -n cachepool vg/lv /dev/small_n_fast

The code determines which type of LV was supplied (cache pool or origin)
by checking its type.  It ensures the right argument was given by ensuring
that the origin is larger than the cache pool.

If the user wants to remove just the cache for an LV.  They specify
the LV's associated cache pool when removing:
~> lvremove vg/cachepool

If the user wishes to remove the origin, but leave the cachepool to be
used for another LV, they specify the cache LV.
~> lvremove vg/lv

In order to remove it all, specify both LVs.

This patch also includes tests to create and remove cache pools and
cache LVs.
2014-02-04 16:50:16 -06:00
Jonathan Brassow
75b8ea195c cache: New functions for gathering info on cache devices
Building on the new DM function that parses DM cache status, we
introduce the following LVM level functions to aquire information
about cache devices:
- lv_cache_block_info: retrieves information on the cache's block/chunk usage
- lv_cache_policy_info: retrieves information on the cache's policy
2014-01-28 12:24:51 -06:00
Zdenek Kabelac
902b343e0e thin: validate resize of thin LV with ext. origin
When thin volume is using external origin, current thin target
is not able to supply 'extended' size with empty pages.

lvm2 detects version and disables extension of LV past the external
origin size in this case.

Thin LV could be however still reduced and extended freely bellow
this size.
2014-01-23 14:20:34 +01:00
Zdenek Kabelac
c3d82d717c Revert "tree_action: destroy devices from failing activation"
This reverts commit 24639be558.

Ok - seems we could be here a bit too active - and we
may remove devices which are unsuable for reasons we are not
aware of - thus taking down whole device could be way to big hammer.

So we still need some solution to recover from failing preload
and activation - but it needs more tunning.
2013-12-17 15:21:28 +01:00
Zdenek Kabelac
24639be558 tree_action: destroy devices from failing activation
When activation fails - we may leak large tree of partially loaded
devices in the dm table (i.e. failure in snapshot activation)

The best we can do here is try to deactivate whole device and
remove as much inactive table entries as we can.
2013-12-17 14:08:54 +01:00
Zdenek Kabelac
1200b7e7c2 thin: deactivation of merging thin snapshot
Before trying to deactivate merging thin snapshot
(which is invisible) check if it's not in-use.
2013-12-04 14:30:26 +01:00
Zdenek Kabelac
664a695561 thin: merge support for device tree
When thin snapshot merge is requested, tree must detect
if user tries to active such LV while origin or
snapshost is still active.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
572983d793 thin: read table line with thin device id
Add functions to parse thin table line to
obtain thin device id.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
84b3852ee5 snapshots: use lv_check_not_in_use
Switch from a simple 'open_count' test on opened snaphost
to a more 'skilled'  lv_check_not_in_use().
2013-12-04 14:30:24 +01:00
Zdenek Kabelac
6bf6430ae9 cleanup: convert log_error with log_warn
Collapse 2 ifs and replace log_error() with log_warn(), since\
the reported message is not causing tools error.
(and cannot be probably triggered anyway).
2013-11-28 12:48:01 +01:00
Zdenek Kabelac
79991aa769 snapshot: drop find_merging_snapshot
Drop find_merging_snapshot() function. Use find_snapshot()
called after check for lv_is_merging_origin() which
is the commonly used code path - so we avoid duplicated
tests and potential risk of derefering NULL point
in unhandled error path.
2013-11-28 12:42:43 +01:00
Zdenek Kabelac
069fa6c49d activate: modify read_only when dev_manager exists
Change opts only when dm has been successfully created.
So on the error path we leave structure unmodified.
2013-11-22 20:58:13 +01:00
Alasdair G Kergon
527db4645f gcc: replace #ifdef linux with __linux__ 2013-11-13 13:56:29 +00:00
Zdenek Kabelac
c3e674ad30 activation: _lv_activate is ok when filtered.
If the volume_list filters out volume from activation,
it is still success result for this function.
Change the error message back to verbose level.

Detect if the volume is active localy before zeroing,
so we report error a bit later for cases, where volume
could not be activated because it doesn't pass through volume
list  (but user still could create volume when he disables
zeroing)
2013-11-01 13:02:36 +01:00
Jonathan Brassow
d5896f0afd Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O.  When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices.  This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.

Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel.  While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely.  This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.

I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs).  If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command.  This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror.  They
can then release their lock and allow a repair command to commence.  In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.

This patch brings solution #1.  If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
2013-10-22 19:14:33 -05:00
Peter Rajnoha
039bdad732 activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.

We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.

This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.

For example: lvcreate --thinpool POOL --zero n -L 1G vg

- first, the usual LV is created to do a clean up for pool metadata
  spare. The LV is activated, zeroed, deactivated.

- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
  to avoid any scanning in udev

- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
  udev rule, but since the LV is just a usual LV, we can't make a
  difference. The LV_TEMPORARY internal LV flag helps here. If we
  create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
  and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
  with "invisible" and non-top-level LVs) - udev is directed to
  skip WATCH rule use.

- if the LV_TEMPORARY flag was not used, there would normally be
  a WATCH event generated once the LV is closed after "zeroed"
  stage. This will make problems with immediated deactivation that
  follows.
2013-10-23 14:09:37 +02:00
Peter Rajnoha
48df36b8c5 activation: check for open count with a timeout before removal/deactivation of an LV
This patch reinstates the lv_info call to check for open count of
the LV we're removing/deactivating - this was changed with commit 125712b
some time ago and we relied on the ioctl retry logic deeper in the libdm
while calling the exact 'remove' ioctl.

However, there are still some situations in which it's still required to
check for open count before we do any 'remove' actions - this mainly
applies to LVs which consist of several sub LVs, like it is for
virtual snapshot devices.

The commit 1146691 fixed the issue with ordering of actions during
virtual snapshot removal while the snapshot is still open. But
the check for the open status of the snapshot is still prone to
marking the snapshot as in use with an immediate exit even though
this could be a temporary asynchronous open only, most notably
because of udev and its WATCH udev rule with accompanying scans
for the event which is asynchronous. The situation where this crops
up most often is when we're closing the LV that was open for read-write
and then calling lvremove immediately.

This patch reinstates the original lv_info call for the open status
of the LV in the lv_check_not_in_use fn that gets called before
we do any LV removal/deactivation. In addition to original logic,
this patch adds its own retry loop with a delay (25x0.2 seconds)
besides the existing ioctl retry loop.
2013-10-15 12:44:42 +02:00
Jonathan Brassow
d97583cfd3 RAID: Better error message when attempting scrubbing op on thinpool LV
Component LVs of a thinpool can be RAID LVs.  Users who attempt a
scrubbing operation directly on a thinpool will be prompted to
specify the sub-LV they wish the operation to be performed on.  If
neither of the sub-LVs are RAID, then a message telling them that
the operation can only be performed on a RAID LV will be given.
2013-10-14 15:14:16 -05:00
Zdenek Kabelac
1146691afc snapshot: deactivate virtual snapshot first
Since the virtual snapshot has no reason to stay alive once we
detach related snapshot - deactivate whole thing in front of
snapshot removal - otherwice the code would get tricky for
support in cluster.

The correct full solution would require to have transactions
for libdm operations.

Also enable to the check for snapshot being opened prior
the origin deactivation, otherwise we could easily end
with the origin being deactivate, but snapshot still kept
active, desynchronizing locking state in cluster.
2013-10-14 00:25:15 +02:00
Peter Rajnoha
ce7489ed22 activation: add support for flagging an LV to skip udev scanning during activation
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.

This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
2013-10-08 13:43:14 +02:00
Peter Rajnoha
b4637bd298 fix: make it possible to compile with --disable-devmapper again
Some code has been added recently which makes it impossible to compile
when "configure --disable-devmapper" is used. This patch just shuffles
the code around so it's under proper #ifdef DEVMAPPER_SUPPORT.
2013-09-27 13:58:55 +02:00