IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
This patch defines operators and structures that will be used
to store the report selection against which the actual values
reported will be checked.
Selection operators
-------------------
Comparison operators:
=~ - Matching regular expression.
!~ - Not matching regular expression.
= - Equal to.
!= - Not equal to.
>= - Greater than or equal to.
> - Greater than
<= - Less than or equal to.
< - Less than.
Logical and grouping operators:
&& - All fields must match
, - All fields must match
|| - At least one field must match
# - At least one field must match
! - Logical negation
( - Left parenthesis
) - Right parenthesis
This makes it easier to check against the fields (following patches for
report selection) and check whether size units are allowed or not
with the field value.
As part of better error handling, remove DM devices that have been
sucessfully created but failed to load a table. This can happen
when pvmove'ing in a cluster and the cluster mirror daemon is not
running on a remote node - the mapping table failing to load as a
result. In this case, any revert would work on other nodes running
cmirrord because the DM devices on those nodes did succeed in loading.
However, because no table was able to load on the non-cmirrord nodes,
there is no table present that points to what needs to be reverted.
This causes the empty DM device to remain on the system without being
present in any LVM representation.
This patch should only be considered a partial fix to the overall
problem. This is because only the device which failed to load a
table is removed. Any LVs that may have been loaded as requirements
to the DM device that failed to load may be left in place. Complete
clean-up will require tracking those devices which have been created
as dependencies and removing them along with the device that failed
to load a table.
Share DM_REPORT_FIELD_RESERVED_NAME_{HELP,HELP_ALT} between libdm and
any libdm user to handle reserved field names, in this case the virtual
field name to show help instead of failing on unrecognized field.
The libdm user also needs to check the field name so it can fire
proper code in this case (cleanup, exit etc.).
If there ever would be a second call to dm_lib_init()
and envvar would be improperly set, some last set value
would be used while it should reset to default mangling mode.
When the node enters dtree with implicit dependency, it
automatically has udev flags from parent node
and could not be changed later when the node has been
entered again via i.e lvm's preload tracking.
Resolve this by tracking whether the node has been
created by implicit dependency tracking or has been
entered explicitely. Implicit node could be later
upgraded by an explicit _add_dev() with proper udev_flags.
For implicit devices add special udev flags to avoid
any scan and udev rule processing if we resume such device.
Patch allows easier removing of orphan nodes.
We need to use "--verifyudev" for dmsetup mangle command used in
the name-mangling test since without the --verifyudev, we'd end up
with the failed rename.
Also, add direct check for the dev nodes - node with old name must
be gone and node with new name must be present. Before, we checked
just the output of the command.
One bug popped up here when renaming with udev and libdevmapper
fallback checking the udev when target mangle mode is "none"
(fixme added in the libdevmapper's node rename code).
Reuse _node_send_messages for just checking
for valid transaction_id with preload.
This allows earlier detection of incosistent thin pool.
Code does the same thing, except for sending messages.
Improve testing of transation_id to not allow other difference
then either kernel TID is equal or is lower by oned and there
are queued messages for transaction.
Mark messages as submitted if the transaction_id is already matching.
Do not try to deactivate node on failure here and leave it on
proper error path of the caller.
Deactivation of top level node has to happen,
before traversing subtree.
Swap list logic and rather append new nodes to the head
and then use normal iteration.
(in-release update)
Avoid introducing libdm structure allocated in library user.
Use direct call with all currently supported args.
When new arg is added, new function will cover it.
I am reverting the commit below - removing the new 'dm_config_get_int'
function and simply calling 'dm_config_get_uint32' while casting the
'int *' pointer parameter.
Commit being reverted:
commit 94377dfd5e
Author: Jonathan Brassow <jbrassow@redhat.com>
Date: Mon Jan 27 05:26:19 2014 -0600
Misc: New function for reading lvm config file fields
Introduce 'dm_config_get_int', which will be used by the upcoming
cachepool segment type.
This patch defines a structure for holding all of the device-mapper
cache target's status information. The associated function provides
an easy way for higher levels (LVM) to consume the information.
This patch finishes the device-mapper interface for the cache and
cachepool segment types (i.e. the cache target).
This patch adds the cache segment type - the second of two necessary
to create cache logical volumes. This segment type references the
cachepool (the small fast device) and the origin (the large slow device);
linking them to create the cache device. The cache device is the
hierarchical device-mapper device that the user ulitmately makes use
of.
The cache segment sources the information necessary to construct the
device-mapper cache target from the origin and cachepool segments to
which it links.
This patch adds the new cachepool segment type - the first of two
necessary to eventually create 'cache' logical volumes. In addition
to the new segment type, updates to makefiles, configure files, the
lv_segment struct, and some necessary libdevmapper flags.
The cachepool is the LV and corresponding segment type that will hold
all information pertinent to the cache itself - it's size, cachemode,
cache policy, core arguments (like migration_threshold), etc.
Revert activated volumes if callback fails.
This is currently used only for thin_check failure support.
When thin_check detects failure in thin metadata device, it deactivate
volumes in reversed order that have been preloaded for thin pool activation.
After this change lvm command will not leave active pool subvolumes
in dm table.
Pass dnode pointer instead of rather unknown child pointer.
The pointer is currently unused and passing child pointer
is quite undefined, while dnode has at least some usability.
Add internal error warning when string value is used
as sort value for numerical field.
Using log_warn since the function itself does not return error,
so we do not confuse log_error() checker.
On modern systems udev manages nodes in /dev/mapper directory.
It creates, deletes and renames the nodes according to the
state of the kernel driver.
When the dmsetup is compiled without udev support (--enable-udev_sync)
and runs on the system with running udevd it tries to manage nodes in
/dev/mapper too, so it can race with udev.
dmsetup checks if the node was created/deleted/renamed with the stat
syscall, and skips the operation if it was. However, if udev
creates/deletes/renames the node after the stat syscall and before the
mknod/unlink/rename syscall, dmsetup reports an error.
Since in the system everything happened as expected, skip reporting
error for such case.
These races can be easily provoked by inserting sleep at appropriate
places.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
This file may be included by other programs, so it should be compliant
with the C standard.
* use __linux__ instead of linux - __linux__ is always defined, linux is
not defined when gcc runs in standard-compliant mode (with -std=c89 or
-std=c99) because the C standard doesn't allow polluting namespace
with arbitrary defines.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Just like we have symbolic names assigned to general DM udev flags
(DM_UDEV_* flags), we have the same for any subsystem flags now
(DM_SUBSYSTEM_UDEV_FLAG*), making it easier to use.
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps. For
example, if there is a 3-way mirror:
[0][1][2]
and we remove device#0, the devices will be shifted down
[1][2]
and renamed.
[0][1]
This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though. This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name. The solution is to check for a conflicting name before
attempting to rename. If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.
Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV. This "hack"
only worked for single machine use-cases of LVM. Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
Do not allow passing '' names to kernel.
This test was missing also in kernel, so it has allowed
to create device with '' name. This then confused dmsetup tool,
since such name is unexpected and unsupported. To remove
such name from table, user has to use -j -m to specify which device
should be removed.
This patch fixes the posibility to run this operation:
dmsetup rename existingdev ''
after this operation commands like 'dmsetup table' are failing.
This patch prohibits to use such name.
Recent kernels allow messages to respond with a string.
Add dm_task_get_message_response() to libdevmapper to perform some
basic sanity checks and return this.
Have 'dmsetup message' display any response.
DM statistics will make extensive use of this.
(From Mikulas.)
libdm-common.c:883:42: warning: pointer/integer type mismatch in conditional expression
define log_sys_error(x, y) log_err("%s%s%s failed: %s", y, *y ? ": " : "", x, strerror(errno))
So the "y" which was 'path ? : "SELinux context reset"' from
previous commit did not quite fit the other "? :" in the log_sys_macro.
- null_fd resource leak on error path in _reopen_fd_null fn
- dead code in verify_message in clvmd code
- dead code in _init_filter_components in toolcontext code
- null dereference in dm_prepare_selinux_context on error path if
setfscreatecon fails while resetting SELinux context
Support tests with abort when libdm encounters internal
error - i.e. for dmsetup tool.
Code execution will be aborted when
env var DM_ABORT_ON_INTERNAL_ERRORS is set to 1
When resuming a node needed by a higher layer of the tree,
if the resume fails, only remove it if the node did not
originally have a live table.
Ref. 97f8454ecc
Clear send_messages flag when they have been delivered successfully.
There is no need to validate it for all other activations of the same
node in the dm_tree.
Also add extra debug message which shows the reason for skipping
sending of messages because the transaction_id has already the matching
value.
Show 'at' pointer address with pool name.
It's useful for debugging to be able to locate pointer address in the
debug trace log. It's only available when compiled with extra debug
compilation flag DEBUG_POOL in make.tmpl.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
This patch may not be fully correct. It tries to solve
the imbalanced suspend counter.
The problem starts when some LV is created and fails in resume path.
(i.e. resuming to large PV (enforced) over small loop devices)
This fails in _resume_node() after dm_task_run(). And while
existing device with empty table is left in inactive table,
further calls are reporting this device is in suspend state.
When later the lvm2 tries to rollback created device and deactivate it,
it will end with internal error, when we try to decrement
never incremented suspend counter.
As an 'easy fix' for now update suspend counter only for live nodes.
TODO: explore better fix.
Since we use get_status also in dmeventd, which may use one pool
for a single device, in case it would be repeatedly returning error,
it may not be freeing the pool and would cause slow but steady growth.
To stay safe in the error path release any allocated memory.
To detect mounted device, use also /proc/self/mountinfo
as so far the check was only able to detect ext4 mounted filesystem.
TODO:
Once proper testing for this feature is added, it may appear,
mountinfo check is enough and covers all cases and sysfs check
could be removed.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Revert commit 31c24dd9f2. This commit
was used to force a RAID device-mapper table to be loaded into the
kernel despite the fact that it was identical to the one already
loaded. The effect allowed a RAID array with a transiently failed
device to refresh and reintegrate the failed device. This operation
is better done in the kernel on a 'resume'. Since,
'lvchange --refresh' already performs a suspend/resume cycle, the
above commit is not needed once the kernel change is made. Reverting
the commit removes an unnecessary (at least for now) change to the
device-mapper interface.
I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0. It is backwards compatible with the
old status line - initializing the new fields to '0'. The new
structure is also more amenable to future changes. It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features. It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable). This allows future fields to be appended.
The new fields that are available are:
- sync_action : shows what the sync thread in the kernel is doing
(idle, frozen, resync, recover, check, repair, or
reshape)
- mismatch_count: shows the number of discrepancies which were
found or repaired by a "check" or "repair"
process, respectively.
Previous commit included changes to WHATSNEW, but the code changes
were missing. Here is the description from the previous commit:
commit bbc6378b73
Author: Jonathan Brassow <jbrassow@redhat.com>
Date: Thu Feb 21 11:31:36 2013 -0600
RAID: Make 'lvchange --refresh' restore transiently failed RAID PVs
A new function (dm_tree_node_force_identical_table_reload) was added to
avoid the suppression of identical table reloads. This allows RAID LVs
to reload the on-disk superblock information that contains which devices
have failed and the bitmaps. If the failed device has returned, this has
the effect of restoring the device and initiating recovery. Without this
patch, the user had to completely deactivate their RAID LV and re-activate
it in order to restore the failed device. Now they simply need to
suspend and resume (which is done by 'lvchange --refresh').
The identical table suppression is only avoided if the LV is not PARTAIL
(i.e. all of it's devices can be seen and read by LVM) and the kernel
status of the array contains failed devices. In other words, the function
will only be called in the case where we may have success in restoring
a failed device in the array.
There's a possibility to interconnect the dm_config_node with an
ID, which in our case is used to reference the configuration
definition ID from config_settings.h. So simply interconnecting
struct dm_config_node with struct cfg_def_item.
This patch also adds support for enhanced config node output besides
existing "output line by line". This patch adds a possibility to
register a callback that gets called *before* the config node is
processed line by line (for example to include any headers on output)
and *after* the config node is processed line by line (to include any
footers on output). Also, it adds the config node reference itself
as the callback arg in addition to have a possibility to extract more
information from the config node itself if needed when processing the
output callback (e.g. the key name, the id, or whether this is a
section or a value etc...).
If the config node from lvm.conf/--config tree is recognized and valid,
it's always coupled with the config node definition ID from
config_settings.h:
struct dm_config_node {
int id;
const char *key;
struct dm_config_node *parent, *sib, *child;
struct dm_config_value *v;
}
For example if the dm_config_node *cn holds "devices/dev" configuration,
then the cn->id holds "devices_dev_CFG" ID from config_settings.h, -1 if
not found in config_settings.h and 0 if matching has not yet been done.
To support the enhanced config node output, a new structure has been
defined in libdevmapper to register it:
struct dm_config_node_out_spec {
dm_config_node_out_fn prefix_fn; /* called before processing config node lines */
dm_config_node_out_fn line_fn; /* called for each config node line */
dm_config_node_out_fn suffix_fn; /* called after processing config node lines */
};
Where dm_config_node_out_fn is:
typedef int (*dm_config_node_out_fn)(const struct dm_config_node *cn, const char *line, void *baton);
(so in comparison to existing callbacks for config node output, it has
an extra dm_config_node *cn arg in addition)
This patch also adds these functions to libdevmapper:
- dm_config_write_node_out
- dm_config_write_one_node_out
...which have exactly the same functionality as their counterparts
without the "out" suffix. The "*_out" functions adds the extra hooks
for enhanced config output (prefix_fn and suffix_fn mentioned above).
One can still use the old interface for config node output, this is
just an enhancement for those who'd like to modify the output more
extensively.
Export this functionality from libdevmapper just for
convenience and general use when reading boolean values
which could be defined either in a numeric way with 0/1
or by using strings with "true"/"false", "yes"/"no",
"on"/"off", "y"/"n".
When a section was empty in a configuration tree (no children - this is
allowed) and we were looking for a config node inside that section, the
_find_config_node function incorrectly returned the section itself if
the node inside that section was not found.
For example the configuration below:
The config:
abc {
}
And a function call to get the "def" node inside "abc" section:
_find_config_node(..., "abc/def")
...returned the "abc" node instead of NULL ("def" not found).
This in turn caused segfaults in the code using lookups in such
a configuration tree as we (correctly) expected that the node
returned was always the one we were looking for or NULL if not
found. But if incorrect node was returned instead, we processed
that as if this was the node we were looking for and so we
processed its value as well. But sections don't have values => segfault.
On glibc, those are erroneously (namespace pollution) pulled in via
other headers. this doesn't work with conformant libcs (musl libc in
this case), we simply need to include all needed headers.
Signed-Off-By: John Spencer <maillist-lvm@barfooze.de>