1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-04 09:18:36 +03:00
Commit Graph

2843 Commits

Author SHA1 Message Date
Peter Rajnoha
3fee661028 udev+systemd: refine lvm2-pvscan@.service to better track device existence
When using ENV{SYSTEMD_WANTS}=lvm2-pvscan@... to instantiate a service
for lvmetad scan when the new PV appears in the system, the service
is started and executed. However, to track device removal, we need
to bind it (the "BindsTo" systemd directive) to a certain .device
systemd unit.

In default systemd setup, the device is tracked by it's name and
sysfs path (there's normally a sysfs path .device systemd unit for
a device and then the device name .device unit as an alias for it).
Neither of these two is useful for lvmetad update as we need to bind
it to device's <major>:<minor> pair.

The /dev/block/<major>:<minor> is the essential symlink under /dev
that exists for each block device (created by default udev rules
provided by udev directly). So let's use this as an alias for
the device's .device unit as well by means of "ENV{SYSTEMD_ALIAS}"
declaration within udev rules which systemd understands (this will
create a new alias "dev-block-<major>:<minor>.device".

Then we can easily bind the "dev-block-<major>:<minor>" device
systemd unit with instantiated lvm2-pvscan@<major>:<minor>.service.
So once the device is removed from the systemd, the
lvm-pvscan@<major>:<minor>.service executes it's ExecStop action
(which in turn notifies lvmetad about the device being gone).

This completes the udev-systemd-lvmetad interaction then.
2013-10-22 14:22:40 +02:00
Peter Rajnoha
0a48137d39 pvscan: use major:minor as short form of --major and --minor arg for pvscan --cache
Before, pvscan recognized either:
  pvscan --cache --major <major> --minor <minor>
or
  pvscan --cache <DevicePath>

When the device is gone and we need to notify lvmetad about device
removal, only --major/--minor works as we can't translate DevicePath
into major/minor pair anymore. The device does not exist in the system
and we don't keep DevicePath index in lvmetad cache to make the
translation internally into original major/minor pair. It would be
useless to keep this index just for this one exact case.

There's nothing bad about using "--major <major> --minor <minor>",
but it makes our life a bit harder when trying to make an
interconnection with systemd units, mainly with instantiated services
where only one and only one arg can be passed (which is encoded in the
service name).

This patch tries to make this easier by adding support for recognizing
the "<major>:<minor>" as a shortcut for the longer form
"--major <major> --minor <minor>". The rule here is simple: if the argument
starts with "/", it's a DevicePath, otherwise it's a <major>:<minor> pair.
2013-10-22 13:52:18 +02:00
Mike Snitzer
65456a4a29 vgimportclone: remove 2>/dev/null from three lvm commands
There is no point eating stderr for these commands.  In fact the
redirect causes confusion and hurts dubugging.

Also reword an error message if the pvs command fails so as not be
certain that a device is not a PV.  Coupled with removing the stderr
redirect this will improve the user experience in the face of errors.
2013-10-21 18:04:14 -04:00
Peter Rajnoha
546db1c4be udev+systemd: make pvscan --cache -aay run as systemd background job from udev
The new lvm2-pvscan@.service is responsible for on-demand execution
of "pvscan --cache --activate ay" which causes lvmetad to be
updated and LVM activation done if the VG is complete.

Also, use udev-systemd mechanism to instantiate the job as the
lvm2-pvscan@$devnode.service on each newly appeared PV in the system.
This prevents the background job to be killed (that would happen
if it was directly forked from udev rule - this behaviour is seen
in recent versions of udev with the help of systemd that can track
detached processes - the detached process would still be in the same
cgroup).

To enable this official udev-systemd protocol for instantiating
background jobs, use new --enable-udev-systemd-background-jobs
configure switch (it's disabled by default). This option is highly
recommended wherever systemd is used!
2013-10-18 11:38:49 +02:00
Zdenek Kabelac
1b7631101b thin: fix lvconvert for active pool.
Prohibit conversion of pool device with active thin volumes.
Properly restore active states only for active thin pool volume.
Use new LV_NOSCAN when converting volume into thin pool's metadata.
2013-10-16 10:53:01 +02:00
Peter Rajnoha
48df36b8c5 activation: check for open count with a timeout before removal/deactivation of an LV
This patch reinstates the lv_info call to check for open count of
the LV we're removing/deactivating - this was changed with commit 125712b
some time ago and we relied on the ioctl retry logic deeper in the libdm
while calling the exact 'remove' ioctl.

However, there are still some situations in which it's still required to
check for open count before we do any 'remove' actions - this mainly
applies to LVs which consist of several sub LVs, like it is for
virtual snapshot devices.

The commit 1146691 fixed the issue with ordering of actions during
virtual snapshot removal while the snapshot is still open. But
the check for the open status of the snapshot is still prone to
marking the snapshot as in use with an immediate exit even though
this could be a temporary asynchronous open only, most notably
because of udev and its WATCH udev rule with accompanying scans
for the event which is asynchronous. The situation where this crops
up most often is when we're closing the LV that was open for read-write
and then calling lvremove immediately.

This patch reinstates the original lv_info call for the open status
of the LV in the lv_check_not_in_use fn that gets called before
we do any LV removal/deactivation. In addition to original logic,
this patch adds its own retry loop with a delay (25x0.2 seconds)
besides the existing ioctl retry loop.
2013-10-15 12:44:42 +02:00
Jonathan Brassow
f58b26b633 RAID: Report RAID images split with tracking as out-of-sync ("I").
Split image should have an out-of-sync attr ('I') - always.  Even if
the RAID LV has not been written to since the LV was split off, it is
still not part of the group that makes up the RAID and is therefore
"out-of-sync".
2013-10-14 10:48:44 -05:00
Zdenek Kabelac
851bba258c snapshot: rework parsing of snapshot metadata
Add better parsing code for snapshot metadata, which describe
properly errors found for snapshot segment.
2013-10-14 00:26:58 +02:00
Zdenek Kabelac
1146691afc snapshot: deactivate virtual snapshot first
Since the virtual snapshot has no reason to stay alive once we
detach related snapshot - deactivate whole thing in front of
snapshot removal - otherwice the code would get tricky for
support in cluster.

The correct full solution would require to have transactions
for libdm operations.

Also enable to the check for snapshot being opened prior
the origin deactivation, otherwise we could easily end
with the origin being deactivate, but snapshot still kept
active, desynchronizing locking state in cluster.
2013-10-14 00:25:15 +02:00
Zdenek Kabelac
ac961087b0 snapshot: disable merging for virtual snaps
Merging into virtual origin is not supposed to work.
2013-10-12 00:15:55 +02:00
Zdenek Kabelac
81504ba70c snapshot: move virtsnap code from tool to lib
Move code for removal dependency from tool's remove.c
into lib's manipulation code.

Same code then works with lvm2app.
2013-10-12 00:14:52 +02:00
Peter Rajnoha
304159c99a cleanup: WHATS_NEW + compiler warning about discarding const 2013-10-10 09:09:16 +02:00
Alasdair G Kergon
7bed6d1263 filters: Add NVM Express (nvme). 2013-10-09 20:08:07 +01:00
Peter Rajnoha
1b91847beb WHATS_NEW: commit 0decd75 2013-10-09 15:59:19 +02:00
Peter Rajnoha
863be9d9c6 WHATS_NEW: commit d888a05 and 808a5d9 2013-10-09 12:11:12 +02:00
Peter Rajnoha
2f5ddfbade udev: add support for "NOSCAN" flag
Recognize DM_SUBSYSTEM_UDEV_FLAG0 which for LVM is the "LVM_NOSCAN"
flag that causes the scanning to be skipped (mainly blkid) and
also directs all the foreign rules to be skipped as well.

Important thing here is that the "watch" udev rules is still set
as well as the /dev/disk/by-id content created (which does not
require any scanning to be done). Also, the flag is dropped on
any subsequent event and scanning done...
2013-10-08 13:43:14 +02:00
Peter Rajnoha
ce7489ed22 activation: add support for flagging an LV to skip udev scanning during activation
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.

This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
2013-10-08 13:43:14 +02:00
Zdenek Kabelac
92bafade60 thin: fix lvconvert in external origin conversion
Patch 562ad293fd introduced code regression
when LV was converted to a thin LV with external origin and at the same time,
conversion of LV to a thin pool has been requested.
(RHBZ: #997704)

data_lv needs to be assigned after test for external conversion find pool.
2013-10-08 13:41:06 +02:00
Zdenek Kabelac
30746f31dd vgrename: run fullscan
For vgrename run full scan so the command is able to properly
detect name collision.
2013-10-08 13:39:11 +02:00
Alasdair G Kergon
4806f38d70 lvchange: improve discards when pool active error
Existing message deemed misleading:
  Cannot change discards state for active pool volume

https://bugzilla.redhat.com/show_bug.cgi?id=994315
2013-10-07 23:50:09 +01:00
Alasdair G Kergon
761b524519 post-release 2013-10-04 14:41:32 +01:00
Alasdair G Kergon
04d9a52684 release 2.02.103
52 files changed, 598 insertions(+), 264 deletions(-)
2013-10-04 14:32:23 +01:00
Peter Rajnoha
a7ff7aee4f WHATS_NEW: renamed thin_pool_chunk_size_calculation -> policy 2013-10-04 12:36:32 +02:00
Alasdair G Kergon
baf95bbff7 cmdline: Add --ignoreskippedcluster.
Accept --ignoreskippedcluster with pvs, vgs, lvs, pvdisplay, vgdisplay,
lvdisplay, vgchange and lvchange to avoid the 'Skipping clustered
VG' errors when requesting information about a clustered VG
without using clustered locking and still exit with success.

The messages can still be seen with -v.
2013-10-01 21:20:10 +01:00
Peter Rajnoha
e4c7236c07 udev: fix 3min udev timeout so that it is applied for all LVM volumes
The timeout should be set before any volume skipping.
2013-09-27 15:37:16 +02:00
Jonathan Brassow
acdc731e83 RAID: Fix _sufficient_pes_free calculation for RAID
lib/metadata/lv_manip.c:_sufficient_pes_free() was calculating the
required space for RAID allocations incorrectly due to double
accounting.  This resulted in failure to allocate when available
space was tight.

When RAID data and metadata areas are allocated together, the total
amount is stored in ah->new_extents and ah->alloc_and_split_meta is
set.  '_sufficient_pes_free' was adding the necessary metadata extents
to ah->new_extents without ever checking ah->alloc_and_split_meta.
This often led to double accounting of the metadata extents.  This
patch checks 'ah->alloc_and_split_meta' to perform proper calculations
for RAID.

This error is only present in the function that checks for the needed
space, not in the functions that do the actual allocation.
2013-09-26 11:30:07 -05:00
Jonathan Brassow
d6516d2f79 WHATS_NEW: description for previous commit
commit 098896fb29 failed to include
description of what was fixed.

"Conversion from linear to mirror or RAID1 now honors
 mirror_segtype_default."
2013-09-25 22:35:52 -05:00
Peter Rajnoha
dd796d6a94 profile: add thin-performance.profile
Define a "performance" profile for thin pools which is exactly:
  - allocation/thin_pool_zero = 0
  - thin_pool_chunk_size_calculation = "performance"
2013-09-25 16:07:35 +02:00
Peter Rajnoha
8bf425005c conf: add allocation/thin_pool_chunk_size_calculation
Add allocation/thin_pool_chunk_size_calculation lvm.conf
option to select a method for calculating thin pool chunk
sizes and define two possible values - "default" and "performance".
2013-09-25 16:06:38 +02:00
Jonathan Brassow
5ded7314ae RAID: Fix broken allocation policies for parity RAID types
A previous commit (b6bfddcd0a) which
was designed to prevent segfaults during lvextend when trying to
extend striped logical volumes forgot to include calculations for
RAID4/5/6 parity devices.  This was causing the 'contiguous' and
'cling_by_tags' allocation policies to fail for RAID 4/5/6.

The solution is to remember that while we can compare
	ah->area_count == prev_lvseg->area_count
for non-RAID, we should compare
	(ah->area_count + ah->parity_count) == prev_lvseg->area_count
for a general solution.
2013-09-24 21:32:10 -05:00
Peter Rajnoha
6553f86818 lvmconf: use_lvmetad=0 on --enable-cluster, reset to default on --disable-cluster
lvmetad is not yet supported in clustered environment so
disable it automatically if using lvmconf --enable-cluster
and reset it to default value if using lvmconf --disable-cluster.

Also, add a few comments in lvm.conf about locking_type vs. use_lvmetad
if setting it for clustered environment.
2013-09-24 14:03:42 +02:00
Peter Rajnoha
f050278a35 tools: don't install separate command symlink for lvm devtypes 2013-09-24 09:35:20 +02:00
Alasdair G Kergon
11dc6a03c4 lvs: Add seg_size_pe field.
Requested
https://www.redhat.com/archives/linux-lvm/2013-July/msg00112.html
2013-09-23 21:50:14 +01:00
Alasdair G Kergon
7233e584ad pvmove: Accept PE ranges as start+length. 2013-09-23 19:50:34 +01:00
Alasdair G Kergon
bbcc120e5a pvmove: clean exit on failed pvmove restart
At present, before the pvmove command can be used to restart pvmove
polling, the LVs concerned need to be activated e.g. with lvchange -ay.
2013-09-23 19:46:28 +01:00
Alasdair G Kergon
229e0752f1 post-release 2013-09-23 15:55:11 +01:00
Alasdair G Kergon
c8057aec36 release 2.02.102
18 files changed, 137 insertions(+), 203 deletions(-)
2013-09-23 15:43:37 +01:00
Christine Caulfield
431eda63cc clvmd: Fix node up/down handing in corosync module
The corosync cluster interface for clvmd did not correctly
deal with node up/down events so that when a node was removed
from the cluster clvmd would prevent remote operations
from happening, as it thought the node was up but not
running clvmd.

This patch fixes that code by simplifying the case to node
being  up or down - which was the original intention
and is supported by pacemaker and CPG in the higher layers.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2013-09-23 13:23:00 +01:00
Zdenek Kabelac
ebf66ac316 Makefile: add missing deps
Add missing deps for device-mapper build of scripts dir.
Cleanup multiple SUBDIR lines together.
2013-09-23 12:13:51 +02:00
Zdenek Kabelac
3b604e5c8e lvinfo: allow to use lv_info with NULL info
When NULL info struct is passed in - function is usable
as a quick query for  lv_is_active_locally() - with a bonus
we may query for layered device.

So it could be seen as a more efficient lv_is_active_locally().
2013-09-23 12:13:06 +02:00
Alasdair G Kergon
bd75844024 release 2.02.101
112 files changed, 4131 insertions(+), 1312 deletions(-)
2013-09-20 13:56:29 +01:00
Alasdair G Kergon
6e912d949b tools: Avoid overflow in _get_int_arg.
Use strtoull instead of strtol so that argument size is not cut
to 31 bytes on machines with 32-bit long.

(Mikulas)
2013-09-18 01:16:48 +01:00
Alasdair G Kergon
a3a5f58c21 reporting: Add devtypes command.
Add internal devtypes reporting command to display built-in recognised
block device types.  (The output does not include any additional
types added by a configuration file.)

> lvm devtypes -o help
  Device Types Fields
  -------------------
    devtype_all            - All fields in this section.
    devtype_name           - Name of Device Type exactly as it appears in /proc/devices.
    devtype_max_partitions - Maximum number of partitions. (How many device minor numbers get reserved for each device.)
    devtype_description    - Description of Device Type.

> lvm devtypes
  DevType       MaxParts Description
  aoe                 16 ATA over Ethernet
  ataraid             16 ATA Raid
  bcache               1 bcache block device cache
  blkext               1 Extended device partitions
...
2013-09-18 01:09:15 +01:00
Jonathan Brassow
d1bcb21e02 WHATS_NEW: Better description for commit 82228ac
More correct description of changes made to disallow thin+mirror.
2013-09-16 15:37:48 -05:00
Alasdair G Kergon
36c5bb40a2 Makefiles: Fix CC variable override.
The CC override in commit f42b2d4bbf
caused the built-in value to be used instead of the configured value
when it wasn't being overridden.

The behaviour is explained here:
	http://stackoverflow.com/questions/18007326/how-to-change-default-values-of-variables-like-cc-in-makefile
2013-09-16 19:57:14 +01:00
Alasdair G Kergon
97ba18f4cb filters: Add bcache.
N.B. Using bcache devices as PVs is still experimental.
Problems should be reported to the appropriate mailing lists.
2013-09-16 16:56:55 +01:00
Peter Rajnoha
9742c5192e systemd: run lvm2-activation-net.service after lvm2-activation.service
The lvm2-activation-net.service was ordered only with respect to iscsi
and fcoe service before. In addition to that, we also need ordering
with respect to lvm2-activation.service to prevent parallel vgchange -aay
runs which may cause some problems during activation.
See also https://bugs.gentoo.org/show_bug.cgi?id=480066.

With this patch, the ordering is firmly set to:
lvm2-activation-early.service -> lvm2-activation.service -> lvm2-activation-net.service

Thanks to Alexander Tsoy for the original patch (modified a bit here):
https://www.redhat.com/archives/lvm-devel/2013-September/msg00049.html
2013-09-16 11:47:09 +02:00
Peter Rajnoha
e2268aeb92 WHATS_NEW: some missing lines 2013-09-12 14:16:54 +02:00
Zdenek Kabelac
2a6abcb80a tests: singlenode updates
Add more 'realistic' simulation of dlm locking.
Previous version was not capable to maintain multiple locks.
Current version doesn't handle multiqueues for locks,
so the ordering is different.
2013-09-12 10:40:39 +02:00
Jonathan Brassow
82228acfc9 Mirror/Thin: Disallow thinpools on mirror logical volumes
The same corner cases that exist for snapshots on mirrors exist for
any logical volume layered on top of mirror.  (One example is when
a mirror image fails and a non-repair LVM command is the first to
detect it via label reading.  In this case, the LVM command will hang
and prevent the necessary LVM repair command from running.)  When
a better alternative exists, it makes no sense to allow a new target
to stack on mirrors as a new feature.  Since, RAID is now capable of
running EX in a cluster and thin is not active-active aware, it makes
sense to pair these two rather than mirror+thinpool.

As further background, here are some additional comments that I made
when addressing a bug related to mirror+thinpool:
(https://bugzilla.redhat.com/show_bug.cgi?id=919604#c9)
I am going to disallow thin* on top of mirror logical volumes.
Users will have to use the "raid1" segment type if they want this.

This bug has come down to a choice between:
1) Disallowing thin-LVs from being used as PVs.
2) Disallowing thinpools on top of mirrors.

The problem is that the code in dev_manager.c:device_is_usable() is unable
to tell whether there is a mirror device lower in the stack from the device
being checked.  Pretty much anything layered on top of a mirror will suffer
from this problem.  (Snapshots are a good example of this; and option #1
above has been chosen to deal with them.  This can also be seen in
dev_manager.c:device_is_usable().)  When a mirror failure occurs, the
kernel blocks all I/O to it.  If there is an LVM command that comes along
to do the repair (or a different operation that requires label reading), it
would normally avoid the mirror when it sees that it is blocked.  However,
if there is a snapshot or a thin-LV that is on a mirror, the above code
will not detect the mirror underneath and will issue label reading I/O.
This causes the command to hang.

Choosing #1 would mean that thin-LVs could never be used as PVs - even if
they are stacked on something other than mirrors.

Choosing #2 means that thinpools can never be placed on mirrors.  This is
probably better than we think, since it is preferred that people use the
"raid1" segment type in the first place.  However, RAID* cannot currently
be used in a cluster volume group - even in EX-only mode.  Thus, a complete
solution for option #2 must include the ability to activate RAID logical
volumes (and perform RAID operations) in a cluster volume group.  I've
already begun working on this.
2013-09-11 15:58:44 -05:00
Peter Rajnoha
1f4bc637b4 WHATS_NEW: one more for commit 8d1d835 2013-09-11 13:16:36 +02:00
Peter Rajnoha
fe227d14a5 WHATS_NEW: for commit 8d1d8350 and 72a9d4f 2013-09-11 13:01:01 +02:00
Jonathan Brassow
2691f1d764 RAID: Make RAID single-machine-exclusive capable in a cluster
Creation, deletion, [de]activation, repair, conversion, scrubbing
and changing operations are all now available for RAID LVs in a
cluster - provided that they are activated exclusively.

The code has been changed to ensure that no LV or sub-LV activation
is attempted cluster-wide.  This includes the often overlooked
operations of activating metadata areas for the brief time it takes
to clear them.  Additionally, some 'resume_lv' operations were
replaced with 'activate_lv_excl_local' when sub-LVs were promoted
to top-level LVs for removal, clearing or extraction.  This was
necessary because it forces the appropriate renaming actions the
occur via resume in the single-machine case, but won't happen in
a cluster due to the necessity of acquiring a lock first.

The *raid* tests have been updated to allow testing in a cluster.
For the most part, this meant creating devices with '-aey' if they
were to be converted to RAID.  (RAID requires the converting LV to
be EX because it is a condition of activation for the RAID LV in
a cluster.)
2013-09-10 16:33:22 -05:00
Zdenek Kabelac
f5832d8c49 deactivate: drop readahead calc in deactivation
Skip readahead when device will be deactivated.
2013-09-07 09:13:20 +02:00
Zdenek Kabelac
0670bfeb59 thin: validation catch multiseg thin pool/volumes
Multisegment thin pools and volumes are not supported.
Catch such error code path early.
2013-09-07 03:32:07 +02:00
Zdenek Kabelac
655296609e thin: fix monitoring of thin pool volume
Properly skip unmonitoring of thin pool volume in deactivation code
path. Code makes sure if there is just any thin pool user
it stays monitored with all its resources.
2013-09-07 03:31:04 +02:00
Zdenek Kabelac
4c001a7854 thin: fix resize of stacked thin pool volume
When the pool is created from non-linear target the more complex rules
have to be used and stacking needs to properly decode args for _tdata
LV. Also proper allocation policies are being used according to those
set in lvm2 metadata for data and metadata LVs.

Also properly check for active pool and extra code to active it
temporarily.

With this fix it's now possible to use:

lvcreate -L20 -m2 -n pool vg  --alloc anywhere
lvcreate -L10 -m2 -n poolm vg --alloc anywhere
lvconvert --thinpool vg/pool --poolmetadata vg/poolm

lvresize -L+10 vg/pool
2013-09-07 03:24:48 +02:00
Alasdair G Kergon
96880102a3 logging: Write Completed message before resetting. 2013-09-06 01:47:41 +01:00
Jonathan Brassow
cc66dedc0e pvmove: Skip pvmove of RAID, thin, snapshot, origin, and mirror LVs in cluster
pvmove of the above types should only have been enabled in single machine
mode.
2013-09-03 13:17:01 -05:00
Peter Rajnoha
3b51f298bb reinstate: commit 82d83a01ce
It now works as supposed. The source of the problem is fixed
by previous commit d2d6a9da52e04f28e1916bcea3f9fda356b6df29.
2013-09-03 16:49:21 +02:00
Peter Rajnoha
008c33a21b tools: add -b/--background for pvscan --cache -aay
Udev daemon has recently introduced a limit on the number of udev
processes (there was no limit before). This causes a problem
when calling pvscan --cache -aay in lvmetad udev rules which
is supposed to activate the volumes. This activation is itself
synced with udev and so it waits for the activation to complete
before the pvscan finishes. The event processing can't continue
until this pvscan call is finished.

But if we're at the limit with the udev process count, we can't
instatiate any more udev processes, all such events are queued
and so we can't process the lvm activation event for which the
pvscan is waiting.

Then we're in a deadlock since the udev process with the
pvscan --cache -aay call waits for the lvm activation udev
processing to complete, but that will never happen as there's
this limit hit with the number of udev processes.

The process with pvscan --cache -aay actually times out eventually
(3min or 30sec, depends on the version of udev).

This patch makes it possible to run the pvscan --cache -aay
in the background so the udev processing can continue and hence
we can avoid the deadlock mentioned above.
2013-09-03 16:49:21 +02:00
Peter Rajnoha
44c1a02c18 revert: commit 82d83a01ce
The commit 82d83a01ce
"autoactivation: refresh existing VG before autoactivation"
causes problems (dangling udev_sync cookies, slow processing
of the pvscan --cache --major --minor call from udev rules)
when the autoactivation handler is run in parallel on
several PVs that belong to the same VG. Revert this patch
until the exact source of the problem is found and then
properly fixed and handled.
2013-09-02 13:53:27 +02:00
Alasdair G Kergon
78647da1c6 toolcontext: Only reopen stdin if readable.
Don't fail when running lvm commands under versions of nohup that set
up stdin as O_WRONLY!
2013-08-28 23:55:14 +01:00
Alasdair G Kergon
c0f987949b activation: Fix segfault with inactive pvmove LV.
Set flag to avoid recursion back through an inactive pvmove LV when
populating deptree.
2013-08-28 22:56:23 +01:00
Peter Rajnoha
0acd7173d1 systemd: lvm2-activation-generator: remove default dir if args not specified and require all args to be given
Remove default "/tmp" as destination directory if no args
specified for lvm2-activation-generator. Require all the
args to be specified directly for proper functionality.
2013-08-28 16:06:51 +02:00
Jonathan Brassow
2ef48b91ed pvmove: Allow moving snapshot/origin. Disallow converting and merging LVs
The patch allows the user to also pvmove snapshots and origin logical
volumes.  This means pvmove should be able to move all segment types.
I have, however, disallowed moving converting or merging logical volumes.
2013-08-26 16:36:30 -05:00
Jonathan Brassow
caa77b33f2 pvmove: Fix inability to specify LV name when moving RAID, mirror, or thin LV
Top-level LVs (like RAID, mirror or thin) are ignored when determining which
portions of an LV to pvmove.  If the user specified the name of an LV to
move and it was one of the above types, it would be skipped.  The code would
never move on to check whether its sub-LVs needed moving because their names
did not match what the user specified.

The solution is to check whether a sub-LVs is part of the LV whose name was
specified by the user - not just if there was a name match.
2013-08-26 14:12:31 -05:00
Peter Rajnoha
d34ab5e0d3 WHATS_NEW: for 4d3b5724e0 2013-08-26 15:52:15 +02:00
Zdenek Kabelac
6b416f837f thin: support lvchange for data and metadata
Support lvchange operation on stacked thin pool data and metadata
volumes.
2013-08-26 14:55:22 +02:00
Jonathan Brassow
c59167ec13 pvmove: Add support for RAID, mirror, and thin
This patch allows pvmove to operate on RAID, mirror and thin LVs.
The key component is the ability to avoid moving a RAID or mirror
sub-LV onto a PV that already has another RAID sub-LV on it.
(e.g. Avoid placing both images of a RAID1 LV on the same PV.)

Top-level LVs are processed to determine which PVs to avoid for
the sake of redundancy, while bottom-level LVs are processed
to determine which segments/extents to move.

This approach does have some drawbacks.  By eliminating whole PVs
from the allocation list, we might miss the opportunity to perform
pvmove in some senarios.  For example, if we have 3 devices and
a linear uses half of the first, a RAID1 uses half of the first and
half of the second, and a linear uses half of the third (FIGURE 1);
we should be able to pvmove the first device (FIGURE 2).
	FIGURE 1:
        [ linear ] [ -RAID- ] [ linear ]
        [ -RAID- ] [        ] [        ]

	FIGURE 2:
        [  moved ] [ -RAID- ] [ linear ]
        [  moved ] [ linear ] [ -RAID- ]
However, the approach we are using would eliminate the second
device from consideration and would leave us with too little space
for allocation.  In these situations, the user does have the ability
to specify LVs and move them one at a time.
2013-08-23 08:57:16 -05:00
Peter Rajnoha
99fe3b88d2 systemd: lvm2-activation-generator: report only error otherwise be silent
Do not print success status for lvm2-activation-generator:

  "LVM: Activation generator successfully completed."
  "LVM: Logical Volume autoactivation enabled." (if use_lvmetad=1)

Though this information is quite useful during boot, it may
be confusing for users if it happens anytime later and it
actually happens if systemd reloads. This is usually on package
update to update the systemd state and load any new units that are
newly installed in the system. The systemd reload is global and
so any existing generators are rerun at that moment too.
2013-08-22 08:27:51 +02:00
Peter Rajnoha
c8daa15270 filter-mpath: remove superfluous error message about mpath major not equal to dm major
This is a regression caused by commit 3bd9048854.
The error message added with that commit "mpath major %d is not dm major %d" is
superfluous.

When scanning for mpath components, we're looking for a parent device.
But this parent device is not necessarily an mpath device (so the dm device)
if it exists - it can be any other device layered on top (e.g. an MD RAID device).
2013-08-21 14:07:01 +02:00
Jonathan Brassow
f0be9ac904 cmirrord: Prevent secondary checkpoints from corrupting bitmaps
The bug addressed by this patch manifested itself during testing
by showing a mirror that never became 'in-sync' after creation.
The bug is isolated to distributions that do not have support
for openAIS checkpointing (i.e. > RHEL6, > F16).

When a node joins a group that is managing a mirror log, the other
machines in the group send it a checkpoint representing the current
state of the bitmap.  More than one machine can send a checkpoint,
but only the initial one should be imported.  Once the bitmap state
has been imported from the initial checkpoint, operations (such
as resync, mark, and clear operations) can begin.  When subsequent
checkpoints are allowed to be imported, it has the effect of erasing
all the log operations between the initial checkpoint and the ones
that follow.

When cmirrord was updated to handle the absence of openAIS
checkpointing (commit 62e38da133),
the new import_checkpoint() function failed to honor the 'no_read'
parameter.  This parameter was designed to avoid reading all but
the initial checkpoint.  Honoring this parameter has solved the
issue of corrupting bitmap data with secondary checkpoints.
2013-08-20 13:21:09 -05:00
Peter Rajnoha
cac49725c9 udev: fix lvmetad rules to not ignore loop device configuration
If loop device is first configured on systems where /dev/loop-control
is used to dynamically create the loop device itself, there's an
ADD+CHANGE even generated. But next time the existing /dev/loop[0-9]*
is reused, there's only a CHANGE event since the device representing
it is already present in kernel (so no ADD event in this case).

We can't ignore this CHANGE event for loop devices! This is a regression
caused by 756bcabbfe. We already had
a similar problem with MD devices which was fixed by
2ac217d408 (but that one was
only an intra-release fix).
2013-08-16 15:45:00 +02:00
Michael Stapelberg
8cbbe851a8 systemd: use LVM_PATH instead of hardcoded value in activation generator 2013-08-15 09:59:19 +02:00
Peter Rajnoha
82d83a01ce autoactivation: refresh existing VG before autoactivation
When autoactivating a VG, there could be an existing VG with exactly
the same PV UUIDs. The PVs could be reappeared after previous
loss/disconnect (for example disconnecting and reconnecting iscsi).

Since there's no "autodeactivation" yet, the mappings for the LVs
from the VG were left in the system even if the device was disconnected.
These mappings also hold the major:minor of the underlying device.
So if the device reappears, it is assigned a different major:minor
pair (...and kernel name). We need to cope with this during
autoactivation so any existing mappings are corrected for any changes.
The VG refresh does that (the vgchange --refresh functionality) -
call this before VG autoactivation.

(If the VG does not exist yet, the VG refresh is NOP)
2013-08-14 14:04:58 +02:00
Peter Rajnoha
fcbb34bdcc WHATS_NEW: for 0da72743ca 2013-08-14 10:18:02 +02:00
Alasdair G Kergon
80bcdb93ff filters: check for mpath before opening devs
Split out the partitioned device filter that needs to open the device
and move the multipath filter in front of it.

When a device is multipathed, sending I/O to the underlying paths may
cause problems, the most obvious being I/O errors visible to lvm if a
path is down.

Revert the incorrect <backtrace> messages added when a device doesn't
pass a filter.

Log each filter initialisation to show sequence.

Avoid duplicate 'Using $device' debug messages.
2013-08-13 23:26:58 +01:00
Alasdair G Kergon
1a1d3a10ff vgchange: require confirmation with -c and no VGs
Too many people have been running 'vgchange -cy' by mistake
so add a confirmation prompt.  Use --yes to bypass this.
2013-08-13 18:20:11 +01:00
Peter Rajnoha
fd7cac15bc WHATS_NEW: be more precise 2013-08-13 18:25:54 +02:00
Peter Rajnoha
e166c00ac6 WHATS_NEW: one more for a85439 2013-08-13 18:16:05 +02:00
Peter Rajnoha
268b370e24 blkdeactivate: add support for bind mounts
Recent version of util-linux/umount (v2.23+) provides
umount --all-targets that can unmount all the mount targets of
the same device (the bind mounts). Use this if available when
calling the umount blkdeactivate.

Otherwise, for older versions of util-linux, use findmnt
(that is also a part of the util-linux) to iterate over all
mount targets of the same device - this is the manual way.
2013-08-13 17:51:40 +02:00
Peter Rajnoha
a854398764 blkdeactivate: change the way blkdeactivate reports status
The blkdeactivate now suppresses error messages from external
tools that are called. Instead, only a summary message "done"
or "skipped" is issued by blkdeactivate as any error in calling
the external tool (e.g. unmounting or deactivating a device) causes
the device to be skipped and the blkdeactivate continues with the
next device in the tree.

Add new -e/--errors switch to display any error messages from
external tools.

Also, suppress any output given by the external tools and add
new -v/--verbose switch to display it including the verbose
output of the tools called (this will enable error reporting
as well).

Also add blkdeactivate -vv for even more debug (the script's debug).
2013-08-13 17:51:23 +02:00
Alasdair G Kergon
32148369d1 post-release 2013-08-13 11:54:48 +01:00
Alasdair G Kergon
297907899c release 2.02.100
84 files changed, 1540 insertions(+), 442 deletions(-)

Mostly bug fixes this time.

Also note:
  md raid replaces dm mirroring as the default implementation.
  Can call out to thin_repair to fix thin metadata.
  Improved clvmd error detection/debugging information.
2013-08-13 11:29:21 +01:00
Jonathan Brassow
abc89422af Mirror: Fix inability to remove VG's cluster flag if it contains a mirror
According to bug 995193, if a volume group
	1) contains a mirror
	2) is clustered
	3) 'locking_type' = 0 is used
then it is not possible to remove the 'c'luster flag from the VG.  This
is due to the way _lv_is_active behaves.

We shouldn't allow the cluster flag to be flipped unless the mirrors in
the cluster are not active.  This is because different kernel modules
are used depending on whether a mirror is cluster or not.  When we
attempt to see if the mirror is active, we first check locally.  If it
is not, then we attempt to check for remotely active instances if the VG
is clustered.  Since the no_lock locking type is LCK_CLUSTERED, but does
not implement 'query_resource', remote_lock_held will always return an
error in this case.  An error from remove_lock_held is treated as though
the lock _is_ held (i.e. the LV is active remotely).  This blocks the
cluster flag from changing.

The solution is to implement 'query_resource' for the no_lock type.  It
will report a message and return 1.  This will allow _lv_is_active to
function properly.  The LV would be considered not active remotely and
the VG can change its flag.
2013-08-12 13:56:47 -05:00
Alasdair G Kergon
28760275e6 logging: tidy log_sys_error when string empty 2013-08-12 18:40:41 +01:00
Jonathan Brassow
cba228f856 WHATSNEW: typo 2013-08-09 17:17:53 -05:00
Jonathan Brassow
8615234c0f RAID: Fix bug making lvchange unable to change recovery rate for RAID
1) Since the min|maxrecoveryrate args are size_kb_ARGs and they
   are recorded (and sent to the kernel) in terms of kB/sec/disk,
   we must back out the factor multiple done by size_kb_arg.  This
   is already performed by 'lvcreate' for these arguments.
2) Allow all RAID types, not just RAID1, to change these values.
3) Add min|maxrecoveryrate_ARG to the list of 'update_partial_unsafe'
   commands so that lvchange will not complain about needing at
   least one of a certain set of arguments and failing.
4) Add tests that check that these values can be set via lvchange
   and lvcreate and that 'lvs' reports back the proper results.
2013-08-09 17:09:47 -05:00
Zdenek Kabelac
e583ff3d2c thin: thin pool can't be external origin
Avoid trying to convert thin-pool to external origin.
2013-08-09 23:04:30 +02:00
Peter Rajnoha
2f61478436 workaround: gcc v4.8 on 32 bit param. passing bug when -02 opimization used
gcc -O2 v4.8 on 32 bit architecture is causing a bug in parameter
passing. It does not happen with -01 nor -O0.

The problematic part of the code was strlen use in config.c in
the config_def_check fn and the call for _config_def_check_tree in it:

<snip>
  rplen = strlen(rp);
  if (!_config_def_check_tree(handle, vp, vp + strlen(vp), rp, rp + rplen, CFG_PATH_MAX_LEN - rplen, cn, cmd->cft_def_hash)) ...
</snip>

If compiled with -O0 (correct):

Breakpoint 1, config_def_check (cmd=0x819b050, handle=0x81a04f8) at config/config.c:775
(gdb) p	vp
$1 = 0x8189ee0 <_cfg_path> "config"
(gdb) p	strlen(vp)
$2 = 6
(gdb)
_config_def_check_tree (handle=0x81a04f8, vp=0x8189ee0 <_cfg_path>
"config", pvp=0x8189ee6 <_cfg_path+6> "", rp=0xbfffe1e8 "config",
prp=0xbfffe1ee "", buf_size=58, root=0x81a2568, ht=0x81a65
48) at config/config.c:680
(gdb) p	vp
$4 = 0x8189ee0 <_cfg_path> "config"
(gdb) p	pvp
$5 = 0x8189ee6 <_cfg_path+6> ""

If compiled with -O2 (incorrect):

Breakpoint 1, config_def_check (cmd=cmd@entry=0x8183050, handle=0x81884f8) at config/config.c:775
(gdb) p	vp
$1 = 0x8172fc0 <_cfg_path> "config"
(gdb) p strlen(vp)
$2 = 6
(gdb) p	vp + strlen(vp)
$3 = 0x8172fc6 <_cfg_path+6> ""
(gdb)
_config_def_check_tree (handle=handle@entry=0x81884f8, pvp=0x8172fc7
<_cfg_path+7> "host_list", rp=rp@entry=0xbffff190 "config",
prp=prp@entry=0xbffff196 "", buf_size=buf_size@entry=58, ht=0x
818e548, root=0x818a568, vp=0x8172fc0 <_cfg_path> "config") at
config/config.c:674
(gdb) p	pvp
$4 = 0x8172fc7 <_cfg_path+7> "host_list"

The difference is in passing the "pvp" arg for _config_def_check_tree.
While in the correct case, the value of _cfg_path+6 is passed
(the result of vp + strlen(vp) - see the snippet of the code above),
in the incorrect case, this value is increased by 1 to _cfg_path+7,
hence totally malforming the string that is being processed.

This ends up with incorrect validation check and incorrect warning
messages are issued like:

 "Configuration setting "config/checks" has invalid type. Found integer, expected section."

To workaround this issue, remove the "static" qualifier from the
"static char _cfg_path[CFG_PATH_MAX_LEN]". This causes the optimalizer
to be less aggressive (also shuffling the arg list for
_config_def_check_tree call helps).
2013-08-09 13:24:50 +02:00
Peter Rajnoha
8d3347f70b WHATS_NEW: entry for 19baf84290 2013-08-08 10:04:53 +02:00
Jonathan Brassow
68c2d352ec WHATS_NEW: update WHATS_NEW for previous commit 2013-08-07 17:51:21 -05:00
Jonathan Brassow
b15278c3dc Mirror/RAID1: When up|down-converting default to segtype of current LV
If there is no RAID support in the kernel but the default mirror
segtype is "raid1", converting legacy mirrors can be problematic.
For example, changing the log type or converting a mirror to a linear
LV does not require the RAID modules to be present.  However, because
lp->segtype is set to be RAID1 by the configuration file, the command
fails.

We should only be setting lp->segtype when converting mirrors if it is
going to change (e.g. to linear or between mirror types).
2013-08-07 16:01:45 -05:00
Jonathan Brassow
7e1083c985 RAID: Make "raid1" the default mirror segment type 2013-08-06 14:13:55 -05:00
Zdenek Kabelac
003f08c164 clogd: fix descriptor leak when daemonzing 2013-08-06 16:21:51 +02:00
Zdenek Kabelac
7b1315411f clmvd: fix decriptor leak on restart
Do not leave descriptor used for dup2() openned.
2013-08-06 16:20:36 +02:00
Zdenek Kabelac
f6dd5a294b exec: pipe open
Function replaces popen() system and avoids shell execution
and argument parsing (no surprices).
2013-08-06 16:18:43 +02:00
Peter Rajnoha
61e7dc833c WHATS_NEW: previous commit 2013-08-06 14:03:43 +02:00
Peter Rajnoha
e195b5227e thin: apply VG profile if creating a new thin pool
When creating a new thin pool and there's no profile requested
via "lvcreate --profile ...", inherit any VG profile if it's attached.

Currently this applies to these settings:
  allocation/thin_pool_chunk_size
  allocation/thin_pool_discards
  allocation/thin_pool_zero
2013-08-06 11:42:40 +02:00