IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We need to call sync_local_dev_names directly as pvscan uses
VG_GLOBAL lock and this one *does not* cause the synchronization
(sync_dev_names) to be called on unlock (VG_GLOBAL is not a real VG):
define unlock_vg(cmd, vol)
do { \
if (is_real_vg(vol)) \
sync_dev_names(cmd); \
(void) lock_vol(cmd, vol, LCK_VG_UNLOCK); \
} while (0)
Without this fix, we end up without udev synchronization for the
pvscan --cache (mainly for -aay that causes the VGs/LVs to be
autoactivated) and also udev synchronization cookies are then left
in the system since they're not managed properly (code before sets
up udev sync cookies, but we have to call dm_udev_wait at least once
after that to do the wait and cleanup).
Before, the pvscan --cache -aay was called on each ADD and CHANGE
uevent (for a device that is not a device-mapper device) and each CHANGE
event (for a PV that is a device-mapper device).
This causes troubles with autoactivation in some cases as CHANGE event
may originate from using the OPTION+="watch" udev rule that is defined
in 60-persistent-storage.rules (part of the rules provided by udev
directly) and it's used for all block devices
(except fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md* devices). For example, the
following sequence incorrectly activates the rest of LVs in a VG if one
of the LVs in the VG is being removed:
[root@rhel6-a ~]# pvcreate /dev/sda
Physical volume "/dev/sda" successfully created
[root@rhel6-a ~]# vgcreate vg /dev/sda
Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol0" created
[root@rhel6-a ~]# lvcreate -l1 vg
Logical volume "lvol1" created
[root@rhel6-a ~]# vgchange -an vg
0 logical volume(s) in volume group "vg" now active
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi------ 4.00m
lvol1 vg -wi------ 4.00m
[root@rhel6-a ~]# lvremove -ff vg/lvol1
Logical volume "lvol1" successfully removed
[root@rhel6-a ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log
Cpy%Sync Convert
lvol0 vg -wi-a---- 4.00m
...so the vg was deactivated, then lvol1 removed, and we end up with
lvol1 removed (which is ok) BUT with lvol0 activated (which is wrong)!!!
This is because after lvol1 removal, we need to write metadata to the
underlying device /dev/sda and that causes the CHANGE event to be
generated (because of the WATCH udev rule set on this device) and this
causes the pvscan --cache -aay to be reevaluated.
We have to limit this and call pvscan --cache -aay to autoactivate
VGs/LVs only in these cases:
--> if the *PV is not a dm device*, scan only after proper device
addition (ADD event) and not with any other changes (CHANGE event)
--> if the *PV is a dm device*, scan only after proper mapping
activation (CHANGE event + the underlying PV in a state "just
activated")
If a RAID array is not in-sync, replacing devices should not be allowed
as a general rule. This is because the contents used to populate the
incoming device may be undefined because the devices being read where
not in-sync. The kernel enforces this rule unless overridden by not
allowing the creation of an array that is not in-sync and includes a
devices that needs to be rebuilt.
Since we cannot know the sync state of an LV if it is inactive, we must
also enforce the rule that an array must be active to replace devices.
That leaves us with the following conditions:
1) never allow replacement or repair of devices if the LV is in-active
2) never allow replacement if the LV is not in-sync
3) allow repair if the LV is not in-sync, but warn that contents may
not be recoverable.
In the case where a user is performing the repair on the command line via
'lvconvert --repair', the warning is printed before the user is prompted
if they would like to replace the device(s). If the repair is automated
(i.e. via dmeventd and policy is "allocate"), then the device is replaced
if possible and the warning is printed.
If the lvmcache_info_from_pvid() fails to find valid
info, invoke the lookup by dev, and only in this case
call lvmcache_info_from_pvid() again.
Also check for the result of info and return
error directly, so the NULL is not passed
to lvmcache_get_label().
If we fail to get memory for mutex, hash the mutex
or fail somewhere along pthread function calls
return allocated resources back and unlock vg_lock_map mutex.
Detect failure of dm_pool_strdup() and print error in fail path.
Save one extra strchr call - since we already know the distance
for the '=' character.
Drop stack trace from return after log_error().
When the abort_on_internal_errors is enabled, we aborted prior
the syslog logging output.
Since such fatal error gets level _LOG_FATAL it should
not be blocked by debug_level() check so lets move it further,
to get abort error logged also via syslog.
Tabify
Remove use of asize, unneeded.
Don't initialize lvobj->parent_vgobj to NULL, the object ctor already
zeroed everything on alloc.
Redo call to lvm_lv_snapshot to use the liblvm snapshot implementation
we went with.
Add {}s to silence warning in lv_dealloc.
Rename snapshot function for consistency.
Update WHATS_NEW.
Signed-off-by: Andy Grover <agrover@redhat.com>
Function _ignore_blocked_mirror_devices was not release
allocated strings images_health and log_health.
In error paths it was also not releasing dm_task structure.
Swaped return code of _ignore_blocked_mirror_devices and
use 1 as success.
In _parse_mirror_status use log_error if memory allocation
fails and few more errors so they are no going unnoticed
as debug messages.
On error path always clear return values and free strings.
For dev_create_file use cache mem pool to avoid memleak.
Attempting pvmove on RAID LVs replaces the kernel RAID target with
a temporary pvmove target, ultimately destroying the RAID LV. pvmove
must be prevented on RAID LVs for now.
Use 'lvconvert --replace old_pv vg/lv new_pv' if you want to move
an image of the RAID LV.
In case we don't want to activate, autoactivate or have the
VG/LV read-only. Primarily targeted for the auto_activation_volume_list,
but it makes no harm for other settings (the part of the code
that reads these three settings is shared, but there's no
reason to separate it only for this change).
Rework thin feature detection to support runtime
section to allow to disable them selectively.
New lvm.conf option is born: global/thin_disabled_features
Support swapping of metadata device if the thin pool already
exists. This way it's easy to i.e. resize metadata or their
repair operation.
User may create some empty LV, replace existing metadata
or dump and restore them into bigger LV.
Setting this environment variable will cause a full fallback
to old direct node and symlink management in libdevmapper and lvm2.
It means:
- disabling udev synchronization
(--noudevsync in dmsetup and --noudevsync + activation/udev_sync=0
lvm2 config)
- disabling dm and any subsystem related udev rules
(--noudevrules in dmsetup and activation/udev_rules=0 lvm2 config)
- management of nodes/symlinks under /dev directly by libdevmapper/lvm2
(--verifyudev in dmsetup and activation/verify_udev_operations=1
lvm2 config)
- not obtaining any device list from udev database
(devices/obtain_device_list_from_udev=0 lvm2 config)
Note: we could set all of these before - there's no functional change!
However the DM_DISABLE_UDEV environment variable is a nice shortcut
to make it easier for libdevmapper users so that one can switch off all
of the udev management off at one go directly on the command line,
without a need to modify any source or add any extra switches.
If udev synchronization is disabled by means of --noudevsync
option, we should disable just the synchronization and nothing else.
The udev fallback (verifying udev operations and fixing the
nodes/symlinks if found incorrect) is orthogonal and controlled
by a separate activation/verify_udev_operations configuration option.
Allow restoring metadata with thin pool volumes.
No validation is done for this case within vgcfgrestore tool -
thus incorrect metadata may lead to destruction of pool content.
Configurable settings for thin pool create
if they are not specified on command line.
New supported lvm.conf options are:
allocation/thin_pool_chunk_size
allocation/thin_pool_discards
allocation/thin_pool_zero
Check if target supports discards for chunk sizes,
that are not power of 2 (just multiple of 64K),
and enable it in case it's supported by thin kernel target.
Similar to the way the 'mirror', 'raid1' and 'raid10' segment types set
the number of mirrors to 2 ('-m 1') if the argument is not specified,
here we set the number of stripes to 2 if not given on the command line
when creating a RAID10 LV.
Move common functions for lvcreate and lvconvert.
get_pool_params() - read thin pool args.
update_pool_params() - updates/validates some thin args.
It is getting complicated and even few more things will be
implemented, so to avoid reimplementing things differently
in lvcreate and lvconvert code has been splitted
into 2 common functions that allow some future extension.
This patch is intended to fix bug 825323 - FS turns read-only during a double
fault of a mirror leg and mirrored log's leg at the same time. It only
affects a 2-way mirror with a mirrored log. 3+-way mirrors and mirrors
without a mirrored log are not affected.
The problem resulted from the fact that the top level mirror was not
using 'noflush' when suspending before its "down-convert". When a
mirror image fails, the bios are queue until a suspend is recieved. If
it is a 'noflush' suspend, the bios can be safely requeued in the DM
core. If 'noflush' is not used, the bios must be pushed through the
target and if a device is failed for a mirror, that means issuing an
error. When an error is received by a file system, it results in it
turning read-only (depending on the FS).
Part of the problem was is due to the nature of the stacking involved in
using a mirror as a mirror's log. When an image in each fail, the top
level mirror stalls because it is waiting for a log flush. The other
stalls waiting for corrective action. When the repair command is issued,
the entire stacked arrangement is collapsed to a linear LV. The log
flush then fails (somewhat uncleanly) and the top-level mirror is suspended
without 'noflush' because it is a linear device.
This patch allows the log to be repaired first, which in turn allows the
top-level mirror's log flush to complete cleanly. The top-level mirror
is then secondarily reduced to a linear device - at which time this mirror
is suspended properly with 'noflush'.
Don't use lvmetad in lvm2-monitor.service ExecStop to avoid a systemd issue.
- a systemd design issue while processing dependencies
with socket-based activation that ends up with a hang
- https://bugzilla.redhat.com/show_bug.cgi?id=843587
(also tracker bug https://bugzilla.redhat.com/show_bug.cgi?id=871527)
- not using lvmetad in this case is just a workaround, once the bug
above is resolved, we should enable the lvmetad in that specific case
Remove dependency on fedora-storage-init.service in lvm2 systemd units.
- fedora-storage-init.service and fedora-storage-init-late.service is
going to be separated into respective units that belong to each block
device subsystem:
- mpath + mdraid activated via udev solely
- dmraid with its own dmraid-activation.service unit
- lvm2 with the lvm2-activation-generator to generate the
activation units runtime if lvmetad disabled
(global/use_lvmetad=0 set in lvm.conf) and activation done
via udev+lvmetad if lvmetad enabled (global/use_lvmetad=1 set
in lvm.conf)
Depend on lvm2-lvmetad.socket in lvm2-monitor.service systemd unit.
- as lvm2-monitor uses lvmetad if lvmetad is enabled
Commit 9fd7ac7d03 did not handle mirrors
that contained mirrored logs. This is because the status line of the
mirror does not give an indication of the health of the mirrored log,
as you can see here:
[root@bp-01 lvm2]# dmsetup status vg-lv vg-lv_mlog
vg-lv: 0 409600 mirror 2 253:6 253:7 400/400 1 AA 3 disk 253:5 A
vg-lv_mlog: 0 8192 mirror 2 253:3 253:4 7/8 1 AD 1 core
Thus, the possibility for LVM commands to hang still persists when mirror
have mirrored logs. I discovered this while performing some testing that
does polling with 'pvs' while doing I/O and killing devices. The 'pvs'
managed to get between the mirrored log device failure and the attempt
by dmeventd to repair it. The result was a very nasty block in LVM
commands that is very difficult to remove - even for someone who knows
what is going on. Thus, it is absolutely essential that the log of a
mirror be recursively checked for mirror devices which may be failed
as well.
Despite what the code comment says in the aforementioned commit...
+ * _mirrored_transient_status(). FIXME: It is unable to handle mirrors
+ * with mirrored logs because it does not have a way to get the status of
+ * the mirror that forms the log, which could be blocked.
... it is possible to get the status of the log because the log device
major/minor is given to us by the status output of the top-level mirror.
We can use that to query the log device for any DM status and see if it
is a mirror that needs to be bypassed. This patch does just that and is
now able to avoid reading from mirrors that have failed devices in a
mirrored log.
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
clvmd -d option parsing was not working properly.
clvmd -d 2 (with space) has been ignored because of
'::' used in getopt string, and as failsafe it's been used '1'.
Later this debug_arg has been ignored and debug_opt was used
instead which happend to have value '1'.
Submitted-by: Robert Milasan <rmilasan at suse.com>
Reported-by: Robert Milasan <rmilasan at suse.com>