1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00
Commit Graph

392 Commits

Author SHA1 Message Date
Zdenek Kabelac
a018c57f0b cache: never activate cache pool
Since cache-pool is purely lvm abstraction layer LV, it never
need any device node, so do not add even 'error' device for it.
2014-04-01 20:17:10 +02:00
Zdenek Kabelac
c0f1eb5f0f dev_manager: check prohibited devices earlier
Reorder detection for internal device - since this test
is much simpler then target analysis, check it sooner.

Replace test for '68' with sizeof & ID_LEN

Add FIXME about device alias problem with is_reserved_lvname,
since this test fails on devices like /dev/dm-X
so we need to convert tests to UUID.
2014-03-12 19:38:34 +01:00
Zdenek Kabelac
4cc5c689b8 thin: add pool uuid suffix for pool volume
Even though we make pool volume as a public visible LV,
we still do not want tools to look at this volume.

While we do not create /dev/vg/lv link, device is still
accessible via /dev/mapper/vg-lv and there is no easy
way to recognize it's private without lvm2 metadata.

Enhance UUID with -pool suffix and directly skip
any LV with a suffix in  device_is_usable() call.

TODO: enhance other targets with this logic.
blkid may probably use same simple logic.
2014-03-12 00:16:27 +01:00
Zdenek Kabelac
6a0d97a65c lvm: change build_dm_uuid API
Pass directly 'lv' into this build routine,
so we can eventually add more private UUID suffixes.
2014-03-12 00:16:20 +01:00
Zdenek Kabelac
4d64e91efd thin: do not check of empty pool with messages
The empty pool is also the pool which has yet queued list of messages
and transaction_id == 1.

Problem is exposed when pool is created inactive.

lvcreate -L10 -T vg/pool -an
lvcreate -V10 -T vg/pool
2014-03-12 00:15:22 +01:00
Zdenek Kabelac
9974136b90 cleanup: indent 2014-02-17 22:25:53 +01:00
Zdenek Kabelac
f0f4248333 activation: drop test r/w vg state for activing LV
VG status read/write is meant to influence only VG metadata.
It's not related to the read/write status of the LV itself.
2014-02-15 11:34:54 +01:00
Jonathan Brassow
96626f64fa cache: Code to allow the create/remove of cache LVs
This patch allows users to create cache LVs with 'lvcreate'.  An origin
or a cache pool LV must be created first.  Then, while supplying the
origin or cache pool to the lvcreate command, the cache can be created.

Ex1:
Here the cache pool is created first, followed by the origin which will
be cached.
~> lvcreate --type cache_pool -L 500M -n cachepool vg /dev/small_n_fast
~> lvcreate --type cache -L 1G -n lv vg/cachepool /dev/large_n_slow

Ex2:
Here the origin is created first, followed by the cache pool - allowing
a cache LV to be created covering the origin.
~> lvcreate -L 1G -n lv vg /dev/large_n_slow
~> lvcreate --type cache -L 500M -n cachepool vg/lv /dev/small_n_fast

The code determines which type of LV was supplied (cache pool or origin)
by checking its type.  It ensures the right argument was given by ensuring
that the origin is larger than the cache pool.

If the user wants to remove just the cache for an LV.  They specify
the LV's associated cache pool when removing:
~> lvremove vg/cachepool

If the user wishes to remove the origin, but leave the cachepool to be
used for another LV, they specify the cache LV.
~> lvremove vg/lv

In order to remove it all, specify both LVs.

This patch also includes tests to create and remove cache pools and
cache LVs.
2014-02-04 16:50:16 -06:00
Jonathan Brassow
75b8ea195c cache: New functions for gathering info on cache devices
Building on the new DM function that parses DM cache status, we
introduce the following LVM level functions to aquire information
about cache devices:
- lv_cache_block_info: retrieves information on the cache's block/chunk usage
- lv_cache_policy_info: retrieves information on the cache's policy
2014-01-28 12:24:51 -06:00
Zdenek Kabelac
c3d82d717c Revert "tree_action: destroy devices from failing activation"
This reverts commit 24639be558.

Ok - seems we could be here a bit too active - and we
may remove devices which are unsuable for reasons we are not
aware of - thus taking down whole device could be way to big hammer.

So we still need some solution to recover from failing preload
and activation - but it needs more tunning.
2013-12-17 15:21:28 +01:00
Zdenek Kabelac
24639be558 tree_action: destroy devices from failing activation
When activation fails - we may leak large tree of partially loaded
devices in the dm table (i.e. failure in snapshot activation)

The best we can do here is try to deactivate whole device and
remove as much inactive table entries as we can.
2013-12-17 14:08:54 +01:00
Zdenek Kabelac
664a695561 thin: merge support for device tree
When thin snapshot merge is requested, tree must detect
if user tries to active such LV while origin or
snapshost is still active.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
572983d793 thin: read table line with thin device id
Add functions to parse thin table line to
obtain thin device id.
2013-12-04 14:30:25 +01:00
Zdenek Kabelac
6bf6430ae9 cleanup: convert log_error with log_warn
Collapse 2 ifs and replace log_error() with log_warn(), since\
the reported message is not causing tools error.
(and cannot be probably triggered anyway).
2013-11-28 12:48:01 +01:00
Zdenek Kabelac
79991aa769 snapshot: drop find_merging_snapshot
Drop find_merging_snapshot() function. Use find_snapshot()
called after check for lv_is_merging_origin() which
is the commonly used code path - so we avoid duplicated
tests and potential risk of derefering NULL point
in unhandled error path.
2013-11-28 12:42:43 +01:00
Jonathan Brassow
d5896f0afd Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O.  When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices.  This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.

Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel.  While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely.  This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.

I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs).  If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command.  This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror.  They
can then release their lock and allow a repair command to commence.  In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.

This patch brings solution #1.  If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
2013-10-22 19:14:33 -05:00
Peter Rajnoha
039bdad732 activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.

We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.

This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.

For example: lvcreate --thinpool POOL --zero n -L 1G vg

- first, the usual LV is created to do a clean up for pool metadata
  spare. The LV is activated, zeroed, deactivated.

- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
  to avoid any scanning in udev

- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
  udev rule, but since the LV is just a usual LV, we can't make a
  difference. The LV_TEMPORARY internal LV flag helps here. If we
  create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
  and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
  with "invisible" and non-top-level LVs) - udev is directed to
  skip WATCH rule use.

- if the LV_TEMPORARY flag was not used, there would normally be
  a WATCH event generated once the LV is closed after "zeroed"
  stage. This will make problems with immediated deactivation that
  follows.
2013-10-23 14:09:37 +02:00
Peter Rajnoha
ce7489ed22 activation: add support for flagging an LV to skip udev scanning during activation
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.

This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
2013-10-08 13:43:14 +02:00
Zdenek Kabelac
85b9c12e92 cleanup: release all memory in error path
Just ensure no memory will stay in pool even in error path.
2013-09-23 11:35:15 +02:00
Alasdair G Kergon
c0f987949b activation: Fix segfault with inactive pvmove LV.
Set flag to avoid recursion back through an inactive pvmove LV when
populating deptree.
2013-08-28 22:56:23 +01:00
Jonathan Brassow
c95f17ea64 Mirror: Fix issue preventing PV creation on mirror LVs
Commit b248ba0a39 attempted to
prevent mirror devices which had a failed device in their
mirrored log from being usable/readable by LVM.  This was to
protect against circular dependancies where one LVM command
could be blocked trying to read one of these affected mirrors
while the LVM command to fix/unblock that mirror was stuck
behind the currently running command.

The above commit went wrong when it used 'device_is_usable()' to
recurse on the mirrored log device to check if it was suspended
or blocked.  The 'device_is_usable' function also contains a check
for reserved names - like *_mlog, etc.  This last check always
triggered when checking a mirror's log simply because of the name,
not because it was suspended or blocked - a false positive.

The solution is to create a new function like 'device_is_usable',
but without the check for reserved names.  Using this new function
(device_is_suspended_or_blocked), we can check the status of a
mirror's log device properly.
2013-08-07 17:42:26 -05:00
Zdenek Kabelac
aed4e9c703 coverity: pointer validation
Check for metadata_lv and make sure we have got proper thin pool segment.

Check we are working with merging snapshot when adding merging target.
2013-07-22 12:41:21 +02:00
Zdenek Kabelac
fd31cc9dfc cleanup: stack and remove braces
Add stack trace for error path.
Remove unneeded braces.
2013-07-18 18:16:17 +02:00
Zdenek Kabelac
9a06094824 thin: improve external origin tree creation
When tree for thin LVs was using external_lv, there has been
far less optimal solution, that has tried to add certain
existing dependencie only when new node was added.
However this has lead to way to complex tree construction since
many repeated checks have been made during such tree build.

This patch move this detection to the proper _partial_tree generation
code and uses for it new  'activation' flag, which is set when
tree for ACTIVATION or PRELOAD is generated.

It increases performance when thins with external origins are used.

(in release update)
2013-07-15 16:00:06 +02:00
Zdenek Kabelac
57be501aa3 dev_manager: lower memory usage
Created dlid for test is not needed afterward, so lower a memory
usage of this call is repeatedly used for building some large tree.

TODO: create function to use given buffer on stack as much cheaper.
2013-07-15 15:59:20 +02:00
Zdenek Kabelac
0443c42e3b thin: add sub volumes as whole volumes
Do not use origin_only when add log_lv and metadata as a subvolume.
The stacked volume needs to access whole volume in this case.
2013-07-15 15:58:07 +02:00
Zdenek Kabelac
97d36d5750 thin: check and use layered origin lv
Code needs to check if the layer origin device is suspended,
It's valid to create  thinvolume snapshot of thinvolume which is also
used as an old-style snapshot. In this case we need to check -real
is suspended.

When adding origin_only - add only layer thin volume.
(in case it's also old-snapshot add only -real device)
2013-07-15 15:51:39 +02:00
Zdenek Kabelac
55d90b6420 cleanup: update commented-out code part
Just make it in-sync with latest proposal.
2013-07-15 15:40:46 +02:00
Mike Snitzer
f9e0adcce5 snapshot: Rename snapshot segment returning methods from find_*_cow to find_*_snapshot
find_cow -> find_snapshot, find_merging_cow -> find_merging_snapshot.
Will thin snapshot code to reuse these methods without confusion.
2013-07-02 16:26:03 -04:00
Peter Rajnoha
d6a91da4be config: add profile arg to find_config_tree_bool 2013-07-02 15:19:09 +02:00
Peter Rajnoha
8ac4fcf8ff config: add profile arg to find_config_tree_str_allow_empty 2013-07-02 15:19:09 +02:00
Peter Rajnoha
eeb7b0f7fa config: add profile arg to find_config_tree_node 2013-07-02 15:19:09 +02:00
Zdenek Kabelac
17a3ddf89e cleanup: drop unused headers
Drop heades which do not provide any used symbols.
2013-06-16 00:07:32 +02:00
Alasdair G Kergon
c2dc21d89f text: miscellaneous comments & message tweaks 2013-06-15 01:28:54 +01:00
Alasdair G Kergon
2fbe1e6e00 rephrasing: miscellaneous changes
Miscellaneous changes to messages, man pages, comments and WHATS_NEW.
2013-05-15 01:50:42 +01:00
Peter Rajnoha
4407133113 toolcontext: check dm version lazily for udev_fallback setting
Setting the cmd->default_settings.udev_fallback also requires DM
driver version check. However, this caused useless mapper/control
access with ioctl if not needed actually. For example if we're not
using activation code, we don't need to know the udev_fallback as
there's no node and symlink processing.

For example, this premature mapper/control access caused problems
when using lvm2app even when no activation happens - there are
situations in which we don't need to use mapper/control, but still
need some of the lvm2app functionality. This is also the case for
lvm2-activation systemd generator which just needs to look at the
lvm2 configuration, but it shouldn't touch mapper/control.
2013-05-13 11:53:53 +02:00
Zdenek Kabelac
dd4fdce16c cleanup: drop unused assignment
Assigned values are unused.
2013-04-21 23:14:04 +02:00
Jonathan Brassow
c363c74a25 CLEAN-UP: Better string checking to avoid substring matches
Commit 9fd7ac7d03 introduced a way a
method of avoiding reading from mirrors with a device failure.  If
a device was found to be dead, the mapping table was checked for
'handle_errors' or 'block_on_error'.  These strings were checked for
in the table string via 'strstr', which could also match on strings
like, 'no_handle_errors' or 'no_block_on_error'.  No such strings
exist, but we don't want to have problems in the future if they do.
So, we check for ' <string>{'\0'|' '}'.
2013-04-12 11:30:04 -05:00
Jonathan Brassow
ff64e3500f RAID: Add scrubbing support for RAID LVs
New options to 'lvchange' allow users to scrub their RAID LVs.
Synopsis:
	lvchange --syncaction {check|repair} vg/raid_lv

RAID scrubbing is the process of reading all the data and parity blocks in
an array and checking to see whether they are coherent.  'lvchange' can
now initaite the two scrubbing operations: "check" and "repair".  "check"
will go over the array and recored the number of discrepancies but not
repair them.  "repair" will correct the discrepancies as it finds them.

'lvchange --syncaction repair vg/raid_lv' is not to be confused with
'lvconvert --repair vg/raid_lv'.  The former initiates a background
synchronization operation on the array, while the latter is designed to
repair/replace failed devices in a mirror or RAID logical volume.

Additional reporting has been added for 'lvs' to support the new
operations.  Two new printable fields (which are not printed by
default) have been added: "syncaction" and "mismatches".  These
can be accessed using the '-o' option to 'lvs', like:
	lvs -o +syncaction,mismatches vg/lv
"syncaction" will print the current synchronization operation that the
RAID volume is performing.  It can be one of the following:
        - idle:   All sync operations complete (doing nothing)
        - resync: Initializing an array or recovering after a machine failure
        - recover: Replacing a device in the array
        - check: Looking for array inconsistencies
        - repair: Looking for and repairing inconsistencies
The "mismatches" field with print the number of descrepancies found during
a check or repair operation.

The 'Cpy%Sync' field already available to 'lvs' will print the progress
of any of the above syncactions, including check and repair.

Finally, the lv_attr field has changed to accomadate the scrubbing operations
as well.  The role of the 'p'artial character in the lv_attr report field
as expanded.  "Partial" is really an indicator for the health of a
logical volume and it makes sense to extend this include other health
indicators as well, specifically:
        'm'ismatches:  Indicates that there are discrepancies in a RAID
                       LV.  This character is shown after a scrubbing
                       operation has detected that portions of the RAID
                       are not coherent.
        'r'efresh   :  Indicates that a device in a RAID array has suffered
                       a failure and the kernel regards it as failed -
                       even though LVM can read the device label and
                       considers the device to be ok.  The LV should be
                       'r'efreshed to notify the kernel that the device is
                       now available, or the device should be 'r'eplaced
                       if it is suspected of failing.
2013-04-11 15:33:59 -05:00
Jonathan Brassow
38f8f4a958 RAID: Capture new RAID kernel sync_action status fields
I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0.  It is backwards compatible with the
old status line - initializing the new fields to '0'.  The new
structure is also more amenable to future changes.  It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features.  It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable).  This allows future fields to be appended.

The new fields that are available are:
 - sync_action : shows what the sync thread in the kernel is doing
                 (idle, frozen, resync, recover, check, repair, or
                 reshape)
 - mismatch_count: shows the number of discrepancies which were
                   found or repaired by a "check" or "repair"
                   process, respectively.
2013-04-08 15:04:08 -05:00
Zdenek Kabelac
b9fe52e811 cleanup: move comment 2013-03-13 15:13:50 +01:00
Zdenek Kabelac
293a06c39a cleanup: indent 2013-03-13 15:13:42 +01:00
Peter Rajnoha
386886f71c config: refer to config nodes using assigned IDs
For example, the old call and reference:

  find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)

...now becomes:

  find_config_tree_str(cmd, devices_dir_CFG)

So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
2013-03-06 10:14:33 +01:00
Zdenek Kabelac
71f4934500 activation: fix pvmove partial tree creation
Do not try to add LV again into the partial tree, if it's been
already added. Otherwise we may end in endless loop.
2013-02-23 12:09:12 +01:00
Zdenek Kabelac
87331dc419 thin: add support for external origin
Add internal support for thin volume's external origin.
2013-02-23 10:36:58 +01:00
Zdenek Kabelac
3679bb1cd9 activation: simplify activation code
Reorder activation code to look similar for preload tree and
activation tree.

Its also give much better suppport for device stacking,
since now we also support activation of snapshot which might
be then used for other devices.
2013-02-23 10:30:03 +01:00
Zdenek Kabelac
0631d233d8 activation: add _add_layer_target_to_dtree
Add function for creation of simple linear mapping over layer device.
2013-02-23 10:29:08 +01:00
Zdenek Kabelac
520cc9a7f8 thin: replace _thin_layer with lv_layer()
Use consitently lv_layer function internally for thin pool layer name.
2013-02-23 10:28:04 +01:00
Zdenek Kabelac
78b23f3595 activation: extend _cached_info
Add layer string to support check of layered devices.
2013-02-23 10:28:01 +01:00
Jonathan Brassow
f5cd9c3563 clean-up: Another functiont that can use 'lv_layer'
lib/activate/dev_manager.c:dev_manager_raid_status() can also use
the new 'lv_layer' function.
2013-02-04 17:10:16 -06:00
Zdenek Kabelac
a4870c79ca thin: use noflush for obtaining transaction_id
Do not flush thin pool data, when reading transation_id status.
2013-02-04 19:05:56 +01:00
Zdenek Kabelac
8ed0b6f312 thin: replace is_active with send_messages
Since is_active is only used for thinp
replace struct member with more meaningful
send_messages flag
2013-02-04 19:01:10 +01:00
Zdenek Kabelac
4af4241ba4 use lv_layer 2013-02-04 19:01:10 +01:00
Zdenek Kabelac
9f433e6ee3 cleanup: postpone lv_is_thin_volume check
Code move to make it easier to follow and
call _add_dev_to_dtree() in the separate if() branch
for thin volumes.
2013-02-04 19:00:19 +01:00
Jonathan Brassow
c8242e5cf4 RAID: Add RAID status accessibility functions
Similar to the way thin* accesses its kernel status, we add a method
for RAID to grab the various values in its status output without the
higher levels (LVM) having to understand how to parse the output.
Added functions include:
        - lib/activate/dev_manager.c:dev_manager_raid_status()
          Pulls the status line from the kernel

        - libdm/libdm-deptree.c:dm_get_status_raid()
          Parses status line and puts components into dm_status_raid struct

        - lib/activate/activate.c:lv_raid_dev_health()
          Accesses dm_status_raid to deliver raid dev_health string

The new structure and functions can provide a more unified way to access
status information.  ('lv_raid_percent' could switch to using these
functions, for example.)
2013-02-01 11:31:47 -06:00
Alasdair G Kergon
06abb2dd4c logging: classify log_debug messages
Place most log_debug() messages into a class.
2013-01-07 22:30:29 +00:00
Zdenek Kabelac
ec49f07b0d mirrors: fix leak in device_is_usable mirror check
Function _ignore_blocked_mirror_devices was not release
allocated strings images_health and log_health.

In error paths it was also not releasing dm_task structure.

Swaped return code of _ignore_blocked_mirror_devices and
use 1 as success.

In _parse_mirror_status use log_error if memory allocation
fails and few more errors so they are no going unnoticed
as debug messages.

On error path always clear return values and free strings.

For dev_create_file  use cache mem pool to avoid memleak.
2012-12-11 11:15:22 +01:00
Jonathan Brassow
b248ba0a39 mirror: Avoid reading mirrors with failed devices in mirrored log
Commit 9fd7ac7d03 did not handle mirrors
that contained mirrored logs.  This is because the status line of the
mirror does not give an indication of the health of the mirrored log,
as you can see here:
        [root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog
        vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A
        vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core
Thus, the possibility for LVM commands to hang still persists when mirror
have mirrored logs.  I discovered this while performing some testing that
does polling with 'pvs' while doing I/O and killing devices.  The 'pvs'
managed to get between the mirrored log device failure and the attempt
by dmeventd to repair it.  The result was a very nasty block in LVM
commands that is very difficult to remove - even for someone who knows
what is going on.  Thus, it is absolutely essential that the log of a
mirror be recursively checked for mirror devices which may be failed
as well.

Despite what the code comment says in the aforementioned commit...
+ * _mirrored_transient_status().  FIXME: It is unable to handle mirrors
+ * with mirrored logs because it does not have a way to get the status of
+ * the mirror that forms the log, which could be blocked.
... it is possible to get the status of the log because the log device
major/minor is given to us by the status output of the top-level mirror.
We can use that to query the log device for any DM status and see if it
is a mirror that needs to be bypassed.  This patch does just that and is
now able to avoid reading from mirrors that have failed devices in a
mirrored log.
2012-10-25 00:42:45 -05:00
Jonathan Brassow
9fd7ac7d03 mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
           and other issues.

The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely.  When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.

The mirror target has an additional mechanism that can cause I/O to
be blocked.  If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded.  The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output.  This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.

Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set.  This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case".  This is insufficient and doesn't solve any problems.  All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.

Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error.  (As mentioned above, the case where a DM device is
suspended is already covered.)  This solves a number of issues that weren't
handled before.  For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set.  Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set.  (The latter point being the
source of the fix for rhbz855398.)
2012-10-23 23:10:33 -05:00
Zdenek Kabelac
cf8e1a0093 thin: origin only suspend
Skip tree creating when used with origin_only flag.
2012-10-03 15:05:55 +02:00
Zdenek Kabelac
eb08f86521 cleanup: initilize percent to INVALID
Always initialize percent to INVALID value, in case target
would have forget to setup this value somehow.
2012-08-23 14:38:48 +02:00
Zdenek Kabelac
fd417db274 check: add internal errors for unexpected paths
Adding couple INTERNAL_ERROR reports for unwanted parameters:

Ensure the 'top' metadata node cannot be NULL for lvmetad.

Make obvious vginfo2 cannot be NULL.

Report internal error if handler and vg is undefined.

Check for handle in poll_vg().

Ensure seg is not NULL in dev_manager_transient().

Report missing read_ahead for _lv_read_ahead_single().

Check for report handler in dm_report_object().

Check missing VG in _vgreduce_single().
2012-08-23 14:37:52 +02:00
Zdenek Kabelac
286cd2006b cleanup: drop unneeded included header files
This headers were not resolving anything used for compiled .c files.
Remove unused util.c file.
2012-08-23 14:37:20 +02:00
Alasdair Kergon
56d49cbf13 Re-enable partial activation of non-thin LVs until it can be fixed. (2.02.90)
- The test should be checking the LV as a whole, not just individual segments.
2012-05-16 12:50:14 +00:00
Alasdair Kergon
067184f32d Handle replacement of an active device that goes missing with an error device.
(E.g. lvchange --refresh --partial on striped LV if a PV disappeared.)
2012-04-24 00:51:26 +00:00
Jonathan Earl Brassow
c62f9f0b2f Unlike 'mirror' segtype, 'raid1' should perform flush on suspend.
The 'mirror' segtype and 'raid1' segtype both set the 'MIRRORED' flag.
However, due to differences in the way these device-mapper targets behave
'mirror' must be suspended with the 'noflush' option and 'raid1' does not
have to be.

This patch ensures that when the 'MIRRORED' flag is checked to see if
'noflush' is needed that it does not also set it for 'raid1' by mistake.
2012-04-20 14:17:44 +00:00
Zdenek Kabelac
e866931169 Improve thin_check option passing
Update a way we handle option passing - so we now support path and options
with space inside.
Fix dm name usage for thin pools with '-' in name.
Use new lvm.conf option thin_check_options to pass in options as string array.
2012-03-14 17:12:05 +00:00
Zdenek Kabelac
aeaec150c0 Some more missing supposedly 64bit operations.
Avoid use 32bit math for extent_size.
2012-03-05 15:05:24 +00:00
Zdenek Kabelac
975b5b42d2 Improve warning
Use thin_dump --repair suggestion in log error message
and use just warning on  deactivation path without repair info
(since node has been deactivated).

Also check whether there is not 16 args for thin_check configured.
2012-03-05 14:15:50 +00:00
Zdenek Kabelac
6c7a6c07ee Add support for thin check
Use libdm callback to execute thin_check before activation
thin pool and after deactivation as well.

Supporting thin_check_executable which may pass in extra options for
the tool.
2012-03-02 21:49:43 +00:00
Zdenek Kabelac
fbf6b89a84 Using enum types for enums
alloc_policy_t, dm_string_mangling_t, percent_range_t, sign_t
2012-02-28 14:24:57 +00:00
Jonathan Earl Brassow
a30832cedd Fix bug that caused RAID devices to be unable to activate if sub-LV was missing.
Commit 02f6f4902f introduced a bug that caused
RAID devices to fail to activate if the device for a single sub-LV failed.
The special case of LVM mirror was handled, but not LVM RAID.
EXAMPLE:
[root@bp-01 ~]# devices vg
  LV            Copy%  Devices
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0)
  [lv_rimage_0]        /dev/sde1(1)
  [lv_rimage_1]        /dev/sdh1(1)
  [lv_rmeta_0]         /dev/sde1(0)
  [lv_rmeta_1]         /dev/sdh1(0)
[root@bp-01 ~]# vgchange -an vg
  0 logical volume(s) in volume group "vg" now active
[root@bp-01 ~]# off.sh sdh
Turning off sdh
[root@bp-01 ~]# vgchange -ay vg --partial
  Partial mode. Incomplete logical volumes will be processed.
  Couldn't find device with uuid fbI0YO-GX7x-firU-Vy5o-vzwx-vAKZ-feRxfF.
  Cannot activate vg/lv_rimage_1: all segments missing.
  0 logical volume(s) in volume group "vg" now active

AFTER this patch:
[root@bp-01 ~]# vgchange -ay vg --partial
  Partial mode. Incomplete logical volumes will be processed.
  Couldn't find device with uuid fbI0YO-GX7x-firU-Vy5o-vzwx-vAKZ-feRxfF.
  1 logical volume(s) in volume group "vg" now active
[root@bp-01 ~]# devices vg
  Couldn't find device with uuid fbI0YO-GX7x-firU-Vy5o-vzwx-vAKZ-feRxfF.
  LV            Copy%  Devices
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0)
  [lv_rimage_0]        /dev/sde1(1)
  [lv_rimage_1]        unknown device(1)
  [lv_rmeta_0]         /dev/sde1(0)
  [lv_rmeta_1]         unknown device(0)
[root@bp-01 ~]# dmsetup table vg-lv; dmsetup status vg-lv
0 1024000 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -
0 1024000 raid raid1 2 AD 1024000/1024000

No WHATSNEW update necessary because this is an intrarelease fix.

 brassow
2012-02-13 17:59:21 +00:00
Zdenek Kabelac
ab852ffe66 Disable partial activation for thin LVs and LVs with all missing segments
Count number of error and existing areas and if there is no existing area
for the LV avoid its activation.

Always disable partial activatio for thin volumes.

For mirrors currently put in hack to let it pass with a special name
since current mirror code needs to activate such LV during some operations.
2012-02-01 13:47:27 +00:00
Zdenek Kabelac
15fd61e492 Fix data% reporting
For reading % of mapped size of thin volume use as origin for
old style snapshot '-real' device needs to be queried.
Fix log_error report given for lvs -a in this case.
2012-01-28 20:12:26 +00:00
Alasdair Kergon
c3f0ed04a6 Make commented out code more obvious 2012-01-25 11:10:06 +00:00
Zdenek Kabelac
efc8ca105d Thin add support for origin_only suspend of thin volumes
Pass in the origin_only flag also for thin volumes - but curently the flag
is not used to its best.

FIXME: achieve the state where only  thin volume snapshot origin is
suspended without its childrens -  let's explore whether this may
happen automatically inside libdm (might be generic for other targets).
So the code would not need to annotate the node for this.
2012-01-25 09:10:13 +00:00
Zdenek Kabelac
78c3b21bfa Thin add messages only for activation tree
Extend lv_activate_opts with bool flag to know for which purpose
dtree is created - and add message only for activation tree
(since that's the only place that may send them).

Extend validation check for thin snapshot creation and test whether
active snapshot origin is suspended before its snapshot is created
(useful in recover scenarios) -  in this case also detect, whether
transaction has been already completed and avoid such suspend check
failure in that case.
2012-01-25 09:06:43 +00:00
Zdenek Kabelac
bdba904d7c Thin add lv_thin_pool_transaction_id
Easy function to get transaction_id status value.
2012-01-25 08:48:42 +00:00
Jonathan Earl Brassow
d5617bccab Fix the way RAID meta LVs are added to the dependency tree.
Similar to the "mirror" segment type's log device, _add_dev_to_dtree should
be called and not _add_lv_to_dtree when adding metadata sub-LVs to the deptree.
Since _add_lv_to_dtree was being called, 'origin_only' could be set if a
snapshot sits on top of the RAID device.  This would cause the actual device
that needed to be added to be skipped in favor of the non-existant device,
"<foo>-real".
2012-01-23 20:56:42 +00:00
Mike Snitzer
23e34c729b Differentiate between snapshot status of "Invalid" and "Merge failed". 2012-01-20 22:02:04 +00:00
Mike Snitzer
861c624acb Lookup snapshot usage percent of origin when a snapshot is merging. 2012-01-20 21:56:01 +00:00
Zdenek Kabelac
76ee08995e Thin add function to read thin volume percent
This value returns percentage of 'mapped' size compared with total LV size.
(Without passed seg pointer it return highest mapped size - but it's
not used yet.)
2012-01-19 15:27:54 +00:00
Zdenek Kabelac
6336898318 Thin updated support for thin pool percent
Support to check also for metadata percent
(By checking whether seg pointer is set)
2012-01-19 15:25:37 +00:00
Zdenek Kabelac
d8106dfee2 Thin rename seg var pool_metadata_lv to metadata_lv
Better fits the code.
2012-01-19 15:23:50 +00:00
Zdenek Kabelac
64e353daec Thin rename local static
Use '_' for local const char.
2012-01-19 15:19:18 +00:00
Alasdair Kergon
a18dcfb533 Add activation/read_only_volume_list to override LV permission in metadata. 2012-01-12 01:51:56 +00:00
Zdenek Kabelac
c0fcaacb8d Thin add dev_manager_thin_pool_percent
dev manager function to read percent info from thin pool.
2011-12-21 13:09:33 +00:00
Zdenek Kabelac
d3b4a0f322 Check lv pointer for NULL before derefence. 2011-12-21 12:59:22 +00:00
Zdenek Kabelac
0d59090eaf Thin move layer suffix into local static const 2011-12-21 12:55:22 +00:00
Alasdair Kergon
8dd6036da4 Add activation/use_linear_target enabled by default. (prajnoha)
LVM metadata knows only of striped segments - not linear ones.
The activation code detects segments with a single stripe and switches
them to use the linear target.

If the new lvm.conf setting is set to 0 (e.g. in a test script), this
'optimisation' is turned off.
2011-11-28 20:37:51 +00:00
Zdenek Kabelac
647c8edf82 Drop pool memory allocated in lv_has_target_type
Remove FIXMES - there should not be any pool free call since
the memory pool is from device manager, and pool is detroyed
after the operation, so doing extra free here would not help here.

However lv_has_target_type() is using cmd mempool so here the extra
call for dm_pool_free makes sence.
2011-11-18 19:42:03 +00:00
Zdenek Kabelac
3de08fc9de Thin clean
Reuse seg pointer already set in _add_lv_to_dtree to have the
value of first_seg(lv) (and is used in other parts of this function).
2011-11-15 17:25:05 +00:00
Zdenek Kabelac
ed2368538a Simplify iteration
Since nothing is removed in dm_list snapshot_segs during the loop,
there is no reason to use _safe iteration, so switch to simplier
dm_list_iterate().
2011-11-15 17:21:02 +00:00
Zdenek Kabelac
8ec016236a Thin fix tpool layer
Since we support snapshots of thin volumes, we could have more layers,
so we have to check whether tpool layer is going to be inserted.

As the _add_segment_to_dtree() is the only place that adds tpool
segment, we may just check pointer (no strcmp for layer).

Switch to use  seg_is_  function instead of lv_is_.
2011-11-15 17:15:03 +00:00
Zdenek Kabelac
a0c4e85c48 Add -tpool layer in activation tree
Let's put the overlay device over real thin pool device.
So we can get the proper locking on cluster.
Overwise the pool LV would be activate once implicitely
and in other case explicitely, confusing locking mechanism.
This patch make the activation of pool LV independent on
activation of thin LV since they will both implicitely use
real -thin pool device.
2011-11-03 14:52:09 +00:00
Zdenek Kabelac
5cc2f9a257 Avoid creation of /dev/vg/thinpool 2011-10-28 20:34:45 +00:00
Zdenek Kabelac
92cdc25882 Drop messages from lvm app context
(revert)
Thinp target uses activation context.
2011-10-17 14:18:07 +00:00
Zdenek Kabelac
7f815706ca Fix lv_info open_count test
When verify_udev_operations was disable, code for stacking fs operation for
lvm links was completely disable - but this code was also used for collecting
information, that a new node is being created.

Add a new flag which is set when a creation of lv symlinks is requested which
should restore old behaviour of lv_info function, that has called fs_sync()
before quere for open count on device.
2011-10-14 13:23:47 +00:00
Zdenek Kabelac
7a6600b148 Use constant for the repeated dlid size specification 2011-10-11 10:02:28 +00:00
Zdenek Kabelac
df251f14dc Use shorter way for if() 2011-10-11 09:03:33 +00:00