IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Let's put the overlay device over real thin pool device.
So we can get the proper locking on cluster.
Overwise the pool LV would be activate once implicitely
and in other case explicitely, confusing locking mechanism.
This patch make the activation of pool LV independent on
activation of thin LV since they will both implicitely use
real -thin pool device.
To ensure we properly handle LV cluster locking - explicitely do
not allow to change the availability of the thin pool that is in use
for some thin LV.
As soon as the thin volume is created the only way to activate pool
is via implicit dependency.
Ignore thinpool open count for lv/vgchange operations.
When verify_udev_operations was disable, code for stacking fs operation for
lvm links was completely disable - but this code was also used for collecting
information, that a new node is being created.
Add a new flag which is set when a creation of lv symlinks is requested which
should restore old behaviour of lv_info function, that has called fs_sync()
before quere for open count on device.
Using log_warn to report missing symlinks as warning, since the command
itself returns as successful, we should not produce log_error().
log_warn is better fit here.
Cosmetic, since r is already 0 for the error path, no need to assign it there,
and r is assigned to 1 after switch command.
Also makes the code more readable.
This patch also does some clean-up of the splitmirrors code.
I've attempted to clean-up the splitmirrors code to make it easier to
understand with fewer operations. I've tried to reduce the number of
metadata operations without compromising the intermediate stages which
are necessary for easy clean-up in the even of failure.
These changes now correctly handle cluster situations - including exclusive
cluster mirrors. Whereas before, a splitmirror operation would result in
remote nodes having LVM commands report the newly split LV with a proper
name while DM commands would report the old (pre-split) names of the device.
IOW, there was a kernel/userspace mismatch.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
Before, we used to display "Can't remove open logical volume" which was
generic. There 3 possibilities of how a device could be opened:
- used by another device
- having a filesystem on that device which is mounted
- opened directly by an application
With the help of sysfs info, we can distinguish the first two situations.
The third one will be subject to "remove retry" logic - if it's opened
quickly (e.g. a parallel scan from within a udev rule run), this will
finish quickly and we can remove it once it has finished. If it's a
legitimate application that keeps the device opened, we'll do our best
to remove the device, but we will fail finally after a few retries.
leaving behind the LVM-specific parts of the code (convenience wrappers that
handle `struct device` and `struct cmd_context`, basically). A number of
functions have been renamed (in addition to getting a dm_ prefix) -- namely,
all of the config interface now has a dm_config_ prefix.
~> lvconvert --splitmirrors 1 --trackchanges vg/lv
The '--trackchanges' option allows a user the ability to use an image of
a RAID1 array for the purposes of temporary read-only access. The image
can be merged back into the array at a later time and only the blocks that
have changed in the array since the split will be resync'ed. This
operation can be thought of as a partial split. The image is never completely
extracted from the array, in that the array reserves the position the device
occupied and tracks the differences between the array and the split image via
a bitmap. The image itself is rendered read-only and the name (<LV>_rimage_*)
cannot be changed. The user can complete the split (permanently splitting the
image from the array) by re-issuing the 'lvconvert' command without the
'--trackchanges' argument and specifying the '--name' argument.
~> lvconvert --splitmirrors 1 --name my_split vg/lv
Merging the tracked image back into the array is done with the '--merge'
option (included in a follow-on patch).
~> lvconvert --merge vg/lv_rimage_<n>
The internal mechanics of this are relatively simple. The 'raid' device-
mapper target allows for the specification of an empty slot in an array
via '- -'. This is what will be used if a partial activation of an array
is ever required. (It would also be possible to use 'error' targets in
place of the '- -'.) If a RAID image is found to be both read-only and
visible, then it is considered separate from the array and '- -' is used
to hold it's position in the array. So, all that needs to be done to
temporarily split an image from the array /and/ cause the kernel target's
bitmap to track (aka "mark") changes made is to make the specified image
visible and read-only. To merge the device back into the array, the image
needs to be returned to the read/write state of the top-level LV and made
invisible.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
Implementation described in doc/lvm2-raid.txt.
Basic support includes:
- ability to create RAID 1/4/5/6 arrays
- ability to delete RAID arrays
- ability to display RAID arrays
Notable missing features (not included in this patch):
- ability to clean-up/repair failures
- ability to convert RAID segment types
- ability to monitor RAID segment types
teardown sequence. (Previously the snapshot was deactivated while its
origin was active and before its removal was committed to disk, so
restarting after a crash at the point would leave corruption.)
Mirrors used to be created by first creating a linear device and then adding
the other images plus the log. Now mirrors are created by creating all the
images in one go and then adding the log separately. The new way ran into
the condition that cluster mirrors cannot change the log type (in the case
of creation, from core -> disk) while the mirror is not active. (It isn't
active because it is in the process of being created.) The reason this
condition is in place is because a remote node may have the mirror active, and
we don't want to alter the log underneath it.
What we really needed was a way of checking if the mirror was active remotely
but not locally, and in that case do not allow a change of the log. I've added
this check, and cluster mirrors can now be created again.
We've used udev fallback code till now to check whether udev
created/removed the entries in /dev correctly and if not,
a repair was done (giving a warning messagea about that).
This patch adds a possibility to enable this additional check
and subsequent fallback only when required (debugging purposes
mostly) and trust udev completely.
So let's disable the fallback code by default and add a new
configuration option "activation/udev_fallback".
(The original code for creating the nodes will still be used
in case the device directory that is set in lvm.conf differs
from the one that udev uses and also when activation/udev_rules
is set to 0 - otherwise we would end up with no nodes/symlinks
at all)
To avoid modification of 'read-only' volume group structure
add a new structure to pass local data around the code for LV
activation.
As origin_only is one such flag - replace this parameter with new
struct lv_activate_opts.
More parameters might eventually become part of lv_activate_opts.
are affected by the move. (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.
The current fix was selected so as to minimise the testing surface. I
hope eventually to replace it with a cleaner one that extends the
deptree code.
Some lvconvert scenarios still suffer from related problems.
LVM doesn't behave correctly if running as non-root user,
there is warning when it detects it.
Despite this, it produces many error messages, saying nothing.
See https://bugzilla.redhat.com/show_bug.cgi?id=620571
This patch fixes two things:
1) Removes eror message from device_is_usable() which has no
information value anyway (real warning is printed inside it).
2) it fixes device-mapper initialization, if we support
core dm module autoload and device node is present, it should
fail early and not try recreate existing and correct node.
(non-root == permission denied here)
N.B. In future code should support user roles, some more
drastic checks in code are probably contraproductive now.