IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Not releasing objects back to the pool is fine for short-lived
pools since the memory will be freed when dm_pool_destroy() is
called.
Any pool that may be long-lived needs to be more careful to free
objects back to the pool to avoid leaking memory that will not be
reclaimed until the pool is destroyed at process exit time.
The report pool currently leaks each headings line and some row
data.
Although dm_report_output() tries to free the first allocated row
this may end up freeing a later row due to sorting of the row list
while reporting. Store a pointer to the first allocated row from
_do_report_obect() instead and free this at the end of
_output_as_columns(), _output_as_rows(), and dm_report_clear().
Also make sure to call dm_pool_free() for the headings line built
in _report_headings().
When dmstats is introduced it will maintain dm_report objects for
the whole lifetime of the process: without these changes a stats
report could leak around 600k in 10m (exact rate depends on field
selection and data values):
top - 12:11:32 up 4 days, 3:16, 15 users, load average: 0.01, 0.12, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6473 root 20 0 130196 3124 2792 S 0.0 0.0 0:00.00 dmstats
top - 12:22:04 up 4 days, 3:26, 15 users, load average: 0.06, 0.11, 0.13
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6498 root 20 0 130836 3712 2752 S 0.0 0.0 0:00.60 dmstats
With this patch no increase in RSS is seen:
top - 13:54:58 up 4 days, 4:59, 15 users, load average: 0.12, 0.14, 0.14
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.0 0.0 0:00.00 dmstats
top - 14:04:31 up 4 days, 5:09, 15 users, load average: 1.02, 0.67, 0.36
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13962 root 20 0 130196 2996 2688 S 0.3 0.0 0:00.32 dmstats
This also affects report output for repeating reports in the
DM_REPORT_OUTPUT_COLUMNS_AS_ROWS case; row state is not fully cleared for
the next iteration leading to progressive growth of the heading width:
vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
253:253:253:253:253
2:0:1:4:3
L--w:L--w:L--w:L--w:L--w
1:2:1:1:1
3:1:1:1:2
0:0:0:0:0
LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
:::::vg_hex-lv_home:vg_hex-lv_swap:vg_hex-lv_root:luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:vg_hex-lv_images
:::::253:253:253:253:253
:::::2:0:1:4:3
:::::L--w:L--w:L--w:L--w:L--w
:::::1:2:1:1:1
:::::3:1:1:1:2
:::::0:0:0:0:0
:::::LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiv08BCGvF4WsJSqWUDUt7qtf2hEmjtVvo:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiKf7XIiwdAYOJfaGhQe9fu26cTEICGgFS:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOiEZj7ZXbmrWDuGhd7vvi88VF0NdTMG8iA:CRYPT-LUKS1-797339213f684c929eb7d0aca4c6ba3e-luks-79733921-3f68-4c92-9eb7-d0aca4c6ba3e:LVM-9t8ITqLZa6AuuyVoz5Olp1KwF9ZDBfOi2rKredlBPnw2X7v1BiCuEpFo6gaE7BRw
Whenver reporting field name is registered with libdevmapper and if
the field name contains any number of underscores ('_'), libdm
can now automatically recognize any of its variant without any
underscores used.
For example:
..for underscores in prefixes:
pvs -o pv_name
pvs -o name
pvs -o pvname (newly recognized besides pvname)
..for underscores in the name:
lvs -o cache_mode
lvs -o cachemode
..or even multiple underscores:
pvs -o pv___na___me
It's all variant of the same field name.
Moved out from lib/display and a little documentation added.
It's tuned to LVM's requirements historically and its behaviour
might not always be what you would expect.
Move the DEBUG_MEM decision inside libdevmapper.so instead of exposing
it in libdevmapper.h which causes failures if the binary and library
were compiled with opposite debugging settings.
... Using uninitialized value "lockd_state" when calling "lockd_vg"
(even though lockd_vg assigns 0 to the lockd_state, but it looks at
previous state of lockd_state just before that so we need to have
that properly initialized!)
libdm/libdm-report.c:2934: uninit_use_in_call: Using uninitialized value "tm". Field "tm.tm_gmtoff" is uninitialized when calling "_get_final_time".
daemons/lvmlockd/lvmlockctl.c:273: uninit_use_in_call: Using uninitialized element of array "r_name" when calling "format_info_r_action". (just added FIXME as this looks unfinished?)
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
Add support for sending message in suspend tree for thin-pools.
When this operation is requested whole subtree suspend is then skipped.
This is experimantal support for new lvm2 code for sending message
in suspend phase where 'thin-pool origin-only suspend' will send
messages instead of really suspending thin-pool tree.
When suspening thin volume origin-only - only thin volume is suspended,
then messages are posted and thin-pool suspend is skipped.
Recognize date and time specification within selection criteria
that is formulated in a more free-form way besides to the original
basic YYYY-MM-DD HH:MM format that libdevmapper supports.
Currently, this free-form format is recognized for lv_time field.
Users are able to use expressions from this set:
- weekday names ("Sunday" - "Saturday" or abbreviated as "Sun" - "Sat")
- labels for points in time ("noon", "midnight")
- labels for a day relative to current day ("today", "yesterday")
- points back in time with relative offset from today (N is a number)
( "N" "seconds"/"minutes"/"hours"/"days"/"weeks"/"years" "ago")
( "N" "secs"/"mins"/"hrs" ... "ago")
( "N" "s"/"m"/"h" ... "ago")
- time specification either in hh:mm:ss format or with AM/PM suffixes
- month names ("January" - "December" or abbreviated as "Jan" - "Dec")
For example:
$ date
Fri Jul 3 10:11:13 CEST 2015
$ lvmconfig --type full report/time_format
time_format="%a %Y-%m-%d %T %z %Z [%s]"
$ lvs
LV VG Time
lvol0 vg Fri 2014-08-22 21:25:41 +0200 CEST [1408735541]
lvol2 vg Sun 2015-04-26 14:52:20 +0200 CEST [1430052740]
root fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
swap fedora Wed 2015-05-27 08:09:21 +0200 CEST [1432706961]
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time=yesterday'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "June 30"'
LV VG Time
lvol1 vg Tue 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "noon June 30"'
LV VG Time
lvol3 vg Tue 2015-06-30 14:52:23 +0200 CEST [1435668743]
lvol6 vg Wed 2015-07-01 13:35:56 +0200 CEST [1435750556]
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 9AM"'
LV VG Time
lvol4 vg Thu 2015-07-02 12:12:02 +0200 CEST [1435831922]
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
$ lvs -S 'time since "2 July 1PM"'
LV VG Time
lvol5 vg Thu 2015-07-02 14:30:32 +0200 CEST [1435840232]
...and so on.
Wire the dm_report_reserved_handler instance call in reporting/selection
infrastructure to handle reserved value actions (currently only
DM_REPORT_RESERVED_PARSE_FUZZY_NAME and DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
actions).
With fuzzy names we mean the names for which it's hard or even impossible
to enumerate all possible variations of the name - the name needs to
be evaluated. An example of fuzzy name is a name which has a base
(substring) which matches and it can contain arbitrary variations
around this base. We can cover human language better with fuzzy
names as people may use several different names (or sentences) to
denote the same thing.
With dynamic values we mean the values which are not constants
and they need to be evaluated in runtime. An example of dynamic
value is a value which depends on current system state (e.g. time,
current configuration or any other state which may change and it
needs runtime evaluation).
There's a handler that can be registered with reporting/selection
using dm_report_reserved_handler instance. This is a central point
in which the computation/evaluation happens when processing reserved
values. Currently, there are two actions declared:
DM_REPORT_RESERVED_PARSE_FUZZY_NAME
(translates fuzzy name into canonical name)
DM_REPORT_RESERVED_GET_DYNAMIC_VALUE
(gets value for canonical name)
The handler is then registered as value in struct
dm_report_reserved_value (see explaining comments besided
the struct dm_report_reserved_value in libdevmapper.h).
Also, this patch provides support for simple caching of values
used during report/selection via dm_report_value_cache_{set,get}.
This is supposed to be used mainly in the dm_report_reserved_handler
instances to save values among calls so all the handler calls work
with the same base value used in computation/evaluation and/or
possibly to save resources if the evaluation is more time-consuming.
The cache is attached to the dm_report handle and so the cache is
dropped one dm_report is dropped.
Generic numbers and time values share some operators so make sure
we have the flags correctly adjusted based on expected type if
we're using reserved values.
_node_name() prepares into dm_tree internal buffer device
name and it (major:minor) for easy usage for debug messages.
To avoid any allocation a small buffer in struct dm_tree is preallocated
to store this message.
This patch adds support for time values used in reporting fields.
The raw values are always stored as number of seconds since epoch.
The support that comes with this patch is the basic one which allows
only for recognition of strictly formatted date and time in selection
criteria (the format follows a subset of formats defined by ISO 8601):
date time timezone
date:
YYYY-MM-DD (or shortly YYYYMMDD)
YYYY-MM (shortly YYYYMM), auto DD=1
YYYY, auto MM=01 and DD=01
time:
hh:mm:ss (or shortly hhmmss)
hh:mm (or shortly hhmm), auto ss=0
hh (or shortly hh), auto mm=0, auto ss=0
timezone (always with + or - sign):
+hh:mm or -hh:mm (or shortly +hhmm or -hhmm)
+hh or -hh
Or directly the time (number of seconds) since Epoch (1970-01-01 00:00:00 UTC)
when the number value is prefixed by "@":
@number_of_seconds_since_epoch
This patch also adds aliases for comparison operators
used together with time values which are more intuitive
to use:
since (as alias for >=)
after (as alias for >)
until (as alias for <=)
before (as alias for <)
For example:
$ lvmconfig --type full report/time_format
time_format="%Y-%m-%d %T %z %Z [%s]"
$ lvs -o name,time vg
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol2 2015-04-26 14:52:20 +0200 CEST [1430052740]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
$ lvs vg -o name,time -S 'time since "2015-04-26 15:00" && time until "2015-06-30 6:00"'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
$ lvs vg -o name,time -S 'time since @1435519541'
LV Time
lvol0 2015-06-28 21:25:41 +0200 CEST [1435519541]
lvol1 2015-06-30 03:25:43 +0200 CEST [1435627543]
lvol3 2015-06-30 14:52:23 +0200 CEST [1435668743]
This is basic time recognition support that is directly a part of
libdevmapper. Recognition of more free-form expressions will be a
part of subsequent patches.
This patch allows for registration and recognition of reserved
values which are ranges, so they're composed of two values actually
to denote the lower and upper bound for the range (stored as an array
with exactly two items to define the boundaries).
Also, this patch allows for flagging reserved values as named-only
which means that such values are not strictly reserved. The strictly
reserved values are reserved values as used before this patch.
Distinction between strictly-reserved and named-only values
is clearly visible with comparisons. Normally, strictly reserved
value is not accounted for if we do "greater than" or "lower than"
comparisons, for example:
1 2 3 ....
|
abc
- we have "abc" as reserved value for field with value "2"
- the value reported for the field is "abc" (or "2", it doesn't matter here)
- the selection we're processing is -S 'field < abc'
- the result of the selection gives nothing as "abc" is strictly
reserved value (bound to "2") and there's no order defined for
it and it would only match if we directly compared the value
(so -S 'field = abc' would match)
With named-only values, the "abc" is named-only value for "2",
so selection -S 'field < abc" is the same as using -S 'field < 2'.
The "abc" is just an alias for some value so the value or its
assigned name can be used equally in selection criteria.
There are two basic groups of formatting flags (32 bits):
- common ones applicable for all config value types (lower 16 bits)
- type-related formatting flags (higher 16 bits)
With this patch, we initially support four new flags that
modify the the way the config value is displayed:
Common flags:
=============
DM_CONFIG_VALUE_FMT_COMMON_ARRAY - causes array config values
to be enclosed in "[ ]" even if there's only one item
(previously, there was no way to recognize an array with one
item and scalar value, hence array values with one member
were always displayed without "[ ]" which libdm accepted
when reading, but it may have been misleading for users)
DM_CONFIG_VALUE_FMT_COMMON_EXTRA_SPACE - causes extra spaces to
be inserted in "key = value" (or key = [ value, value, ... ] in
case of arrays), compared to "key=value" seen on output before.
This makes the output more readable for users.
Type-related flags:
===================
DM_CONFIG_VALUE_FMT_INT_OCTAL - prints integers in octal form with
"0" as a prefix (libdm's config reading code can handle this via
strtol just fine so it's properly recognized as number in octal
form already if there's "0" used as prefix)
DM_CONFIG_VALUE_FMT_STRING_NO_QUOTES - makes it possible to print
strings without enclosing " "
This patch also adds dm_config_value_set_format_flags and
dm_config_value_get_format_flags functions to set and get
these formatting flags.
There are reports of unexplained ioctl failures when using dmeventd.
An explanation might be that the wrong value of errno is being used.
Change libdevmapper to store an errno set by from dm ioctl() directly
and provide it to the caller through a new dm_task_get_errno() function.
[Replaced f9510548667754d9209b232348ccd2d806c0f1d8]
More exact clean of library exported symbols files.
Also use $(firstword) test to check for empty string
so 'make clean' has now cleaner condensed look.
Clean also created include links.
Introduce new implmentation of dm_task_get_info() function
with support for reading internal_suspend.
.
This time it is done in a 'versioned' way.
We keep the old fashion dm_task_get_info(Base) to implement
the old behavior of 1.02.95 libdm code.
libdm version 1.02.96 introduced 'macro' wrapper
dm_task_get_info_with_deferred_remove with new implementation
of dm_task_get_info() - we cannot do anything else then to
provide compatible version of this symbol.
Now in version 1.02.97 we add new versioned implementation of
dm_task_get_info(DM_1_02_97) symbol.
This has the effect that i.e. rpm build will finaly resolve proper
dependency on a new symbol - so it will be no longer possible,
to build a new binary and use old library
(rpm -q --provides will show libdevmapper.so.1.02(DM_1_02_97)(64bit))
Also the history is now tracked. If a new function is added (or
reimplemented), it needs to be placed in proper file,
so it could be exported with right versioning symbol.
File .exported_symbols.Base should and any existing older DM
should be treated as read-only after a release.
Also - only libdm has been currently enhanced with versioned .Base
file, as soon as other libs (liblvm, libdevmapper-event) needs changes
they should also get their exported symbol files - meanwhile
make.tmpl handles both cases.
Scenario:
$ vgs -o+vg_mda_copies
VG #PV #LV #SN Attr VSize VFree #VMdaCps
fedora 1 2 0 wz--n- 9.51g 0 unmanaged
vg 16 9 0 wz--n- 1.94g 1.83g 2
$ lvs -o+read_ahead vg/lvol6 vg/lvol7
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Before this patch:
$vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
VG #VMdaCps
vg 2
Problem:
Reserved values can be only used with exact match = or !=, not <,<=,>,>=.
In the example above, the "unamanaged" is internally represented as
18446744073709551615, but this should be ignored while not comparing
field directly with "unmanaged" reserved name with = or !=. Users
should not be aware of this internal mapping of the reserved value
name to its internal value and hence it doesn't make sense for such
reserved value to take place in results of <,<=,> and >=.
There's no order defined for reserved values!!! It's a special
*reserved* value that is taken out of the usual value range
of that type.
This is very similar to what we have already fixed with
2f7f6932dc, but it's the other way round
now - we're using reserved value name in selection criteria now
(in the patch 2f7f693, we had concrete value and we compared it
with the reserved value). So this patch completes patch 2f7f693.
This patch also fixes this problem:
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Data% Rahead
lvol6 vg Vwi-a-tz-- 1.00g pool lvol5 0.00 auto
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Problem:
In the example above, the internal reserved value "auto" is in the
range of selection "> 32k" - it shouldn't match as well. Here the
"auto" is internally represented as MAX_DBL and of course, numerically,
MAX_DBL > 256k. But for users, the reserved value should be uncomparable
to any number so the mapping of the reserved value name to its interna
value is transparent to users. Again, there's no order defined for
reserved values and hence it should never match if using <,<=,>,>=
operators.
This is actually exactly the same problem as already described in
2f7f6932dc, but that patch failed for
size field types because of incorrect internal representation used.
With this patch applied, both problematic scenarios mentioned
above are fixed now:
$ vgs -o vg_name,vg_mda_copies -S 'vg_mda_copies < unmanaged'
(blank)
$ lvs -o+read_ahead vg/lvol6 vg/lvol7 -S 'read_ahead > 32k'
LV VG Attr LSize Pool Origin Rahead
lvol7 vg Vwi---tz-k 1.00g pool lvol6 256.00k
Dop unused value assignments.
Unknown is detected via other combination
(!linear && !striped).
Also change the log_error() message into a warning,
since the function is not really returning error,
but still keep the INTERNAL_ERROR.
Ret value is always set later.
The new dm_report_object_is_selected fn makes it possible to opt whether the
object reported should be displayed on output or not. Also, in addition to
that, it makes it possible to save the result of selection (either 0 or 1).
So dm_report_object_is_selected is simply more general form of object
reporting fn - combinations now allow for:
dm_report_object_is_selected(rh, object, 1, NULL):
This is exactly the original dm_report_object fn and it's fully equal
to it.
dm_report_object_is_selected(rh, object, 0, selected):
Do not display the result on output, but save info whether the object
is selected or not in 'selected' variable.
dm_report_object_is_selected(rh, object, 1, selected):
Display the result on output (if it passes selection criteria) and save
whether the object is selected or not in 'selected' variable.
dm_report_object(rh, object, 0, NULL):
This combination is not allowed - it will end up with internal error.
We're either interested in selection status or we want to display the
result on output or both, but never nothing of the two.
Support error_if_no_space feature for thin pools.
Report more info about thinpool status:
(out_of_data (D), metadata_read_only (M), failed (F) also as health
attribute.)
API for seg reporting is breaking internal lvm coding - it cannot
use vgmem mem pool for allocation of reported value.
So use separate pool instead of 'vgmem' for non vg related allocations
Add consts for many function params - but still many other are left
for now as non-const - needs deeper level of change even on libdm side.
We only checked global per-report-type reserved values for compatibility
with selection code. This patch also adds a check for per-report-field
reserved values. This avoids problems where unsupported report type is
used as reserved value which could cause hard to debug problems
otherwise. So this additional check stops from registering unsupported
and unhandled per-field reserved values.
Registerting such unsupported reserved value is a programmatic error,
so report internal error in this case to stop us from making a mistake
here in the future or even today where STR_LIST fields can't have
reserved values yet.
Under certain circumstances, the selection code can segfault:
$ vgs --select 'pv_name=~/dev/sda' --unbuffered vg0
VG #PV #LV #SN Attr VSize VFree
vg0 6 3 0 wz--n- 744.00m 588.00m
Segmentation fault (core dumped)
The problem here is the use of --ubuffered together with regex used in
selection criteria. If the report output is not buffered, each row is
discarded as soon as it is reported. The bug is in the use of report
handle's memory - in the example above, what happens is:
1) report handle is initialized together with its memory pool
2) selection tree is initialized from selection criteria string
(using the report handle's memory pool!)
2a) this also means the regex is initialized from report handle's mem pool
3) the object (row) is reported
3a) any memory needed for output is intialized out of report handle's mem pool
3b) selection criteria matching is executed - if the regex is checked the
very first time (for the very first row reported), some more memory
allocation happens as regex allocates internal structures "on-demand",
it's allocating from report handle's mem pool (see also step 2a)
4) the report output is executed
5) the object (row) is discarded, meaning discarding all the mem pool
memory used since step 3.
Now, with step 5) we have discarded the regex internal structures from step 3b.
When we execute reporting for another object (row), we're using the same
selection criteria (step 3b), but tihs is second time we're using the regex
and as such, it's already initialized completely. But the regex is missing the
internal structures now as they got discarded in step 5) from previous
object (row) reporting (because we're using "unbuffered" reporting).
To resolve this issue and to prevent any similar future issues where each
object/row memory is discarded after output (the unbuffered reporting) while
selection tree is global for all the object/rows, use separate memory pool
for report's selection.
This patch replaces "struct selection_node *selection_root" in struct
dm_report with new struct selection which contains both "selection_root"
and "mem" for separate mem pool used for selection.
We can change struct dm_report this way as it is not exposed via libdevmapper.
(This patch will have even more meaning for upcoming patches where selection
is used even for non-reporting commands where "internal" reporting and
selection criteria matching happens and where the internal reporting is
not buffered.)
Add new dm_report_compact_fields function to cause report outout
(dm_report_output) to ignore fields which don't have any value set
in any of the rows reported. This provides support for compact report
output where only fields which have something to report are displayed.
The dm_report_set_output_selection was not implemented in the end -
we have dm_report_init_with_selection instead. This is just a remnant
from development code that got into libdevmapper.h by mistake.
The order of the resulting tree is based on the first appearance of
sections. With no section repeats, the sections stay as listed in the
config file. Sections using the brace syntax 'section { key = value }' are
treated the same way: 'section { x = 1 } section { y = 2 }' is the same as
'section/x = 1 section/y = 2' is the same as 'section { x = 1 y = 2 }'
Do not use 'any' policy name as a value in config tree - so we stick
with 'policy_settings' and extra 'policy_name' for libdm params.
Update lvm2 API as well.
Example of supported metadata:
policy = "mq"
policy_settings {
migration_threshold = 2048
sequential_threshold = 512
random_threshold = 4
read_promote_adjustment = 10
}
Support new PASSTHROUGH 'feature' flag.
Add dm_config_node to pass in policy args.
Really use origin_uuid instead of using extra call
to pass seg_areas.
Switch to 64bit feature flag bit set so there is
enough space in future for new bits...
When transaction_id is set 0 for thin-pool, libdm avoids validation
of thin-pool, unless there are real messages to be send to thin-pool.
This relaxes strict policy which always required to know
in front transaction_id for the kernel target.
It now allows to activate thin-pool with any transaction_id
(when transaction_id is passed in)
It is now upto application to validate transaction_id from life
thin-pool volume with transaction_id within it's own metadata.
Some values are reserved for special purpose like 'undefined', 'unmanaged' etc.
When using >, <, >= and < comparison operators where the range is considered,
do not include reserved values as proper values in this range which
would otherwise result in not so obvious criteria match (as the reserved value is
actually transparent for the user). It's incorrect.
Example scenario:
$ vgs -o vg_name,vg_mda_copies vg1 vg2
VG #VMdaCps
vg1 1
vg2 unmanaged
The "unmanaged" is actually mapped onto reserved value
18446744073709551615 (2^64 - 1) internally.
Such reseved value is already caught on selection criteria input
properly:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=18446744073709551615'
Numeric value 18446744073709551615 found in selection is reserved.
However, we still need to fix situaton where the reserved value may be
included in resulting range:
Before this patch:
$ vgs -o vg_name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies >= 1'
VG #VMdaCps
vg1 1
vg2 unmanaged
With this patch applied:
$ vgs -o vg_name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies >= 1'
VG #VMdaCps
vg1 1
From the examples above, we can see that without this patch applied,
the vg_mda_copies >= 1 also matched the reserved value 18446744073709551615
(which is represented by the "unamanged" string on report). When
applying the operators, such values must be skipped! They're meant to
be matched only against their string representation only, e.g.:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=unmanaged'
VG #VMdaCps
vg2 unmanaged
...or any synonyms:
$ vgs -o name,vg_mda_copies vg1 vg2 -S 'vg_mda_copies=undefined'
VG #VMdaCps
vg2 unmanaged
This is probably better approach than 3880ca5eca.
If dm module is not loaded during dm_is_dm_major call, there are no
lines for dm in /proc/devices, of course. Normally, dm_is_dm_major
is called to check existing devices, hence if module is not loaded,
we can expect there's no DM device present at the same time so we
can directly return 0 here (meaning the major number being inspected
is not dm device's one).
See also https://bugzilla.redhat.com/show_bug.cgi?id=1059711.
For dm_is_dm_major to determine whether the major number given as
an argument belongs to a DM device, libdm code needs to know what
the actual DM major is to do the comparison.
It may happen that the dm-mod module is not loaded during this
call and so for the completness let's try our best before we start
giving various errors - we can still make use of dm-mod autoloading,
though only since kernels 2.6.36 where this feature was introduced.
Commit 94786a3bbf introduced
another bug - since sscanf needs extra 1 byte for \0.
Since there is no easy way to do a macro evaluation for (PATH_MAX-1)
and string concatation of this number to get resulting (%4095s) - let's
go with easiest path and restore extra byte for 0.
Other option would be to prepare sscanf parsing string in runtime.
But lets resolve it when we look at PATH_MAX handling later...
Add extra safety detection for thin pool transaction id
and query pool status after confirmed message.
In case there is a missmatch, immeditelly abort further
processing.
Avoid playing with +1.
PATH_MAX code needs probably more thinking anyway, since
there is no MAX path in Linux - user may easily create path
with 64kB chars - so 4kB buffer is surelly not enough for
such dirs.
Note:
http://insanecoding.blogspot.cz/2007/11/pathmax-simply-isnt.html
This patch adds a new flag --deferred to dmsetup remove. If this flag is
specified and the device is open, it is scheduled to be deleted on
close.
struct dm_info is extended.
The existing dm_task_get_info() is converted into a wrapper around the
new version dm_task_get_info_with_deferred_remove() so existing binaries
can still use the old smaller structure.
Recompiled code will pick up the new larger structure.
From: Mikulas Patocka <mpatocka@redhat.com>
Using "[ ]" operator together with "&&" (or ",") inside causes the
string list to be matched if and only if all the items given match
the value reported and the number of items also match. This is
strict list matching and the original behaviour we already have.
In contrast to that, the new "{ }" operator together with "&&" inside
causes the string list to be matched if and only if all the items given
match the value reported but the number of items don't need to match.
So we can provide a subset in selection criteria and if the subset
is found, it matches.
For example:
$ lvs -o name,tags
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
$ lvs -o name,tags -S 'tags=[a,b]'
LV LV Tags
lvol1 a,b
$ lvs -o name,tags -S 'tags={a,b}'
LV LV Tags
lvol1 a,b
lvol3 a,b,y
So in the example above the a,b is subset of a,b,y and therefore
it also matches.
Clearly, when using "||" (or "#") inside, the { } and [ ] is the
same:
$ lvs -o name,tags -S 'tags=[a#b]'
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
$ lvs -o name,tags -S 'tags={a#b}'
LV LV Tags
lvol0 a
lvol1 a,b
lvol2 b,c,x
lvol3 a,b,y
Also in addition to the above feature, fix list with single value
matching when using [ ]:
Before this patch:
$ lvs -o name,tags -S 'tags=[a]'
LV LV Tags
lvol0 a
lvol1 a,b
lvol3 a,b,y
With this patch applied:
$ lvs -o name,tags -S 'tags=[a]'
LV LV Tags
lvol0 a
In case neither [] or {} is used, assume {} (the behaviour is not
changed here):
$ lvs -o name,tags -S 'tags=a'
LV LV Tags
lvol0 a
lvol1 a,b
lvol3 a,b,y
So in new terms 'tags=a' is equal to 'tags={a}'.
2.02.106 added suffixes to some LV uuids in the kernel.
If any of these LVs is activated with 2.02.105 or earlier,
and then a later version is used, the LVs appear invisible and
activation commands fail.
The code now has to check the kernel for both old and new uuids.
Change the help heading from 'Common Fields' to 'Special Fields' for
the fields: selected, help, ?
Remove the code that does 'all' processing with these special fields as
each of them changes the behaviour of the command in an undesirable way.
'lvs -o all,selected' was of course just printing help.
(via internal expansion to 'lv_all,common_all')
and if we ignored the help fields, then '-o common_all' would still
pull in 'selected' and change the way rows were output.
In contrast to per-type reserved values that are applied for all fields
of that type, per-field reserved values are only applied for concrete
field only.
Also add 'struct dm_report_field_reserved_value' to libdm for per-field
reserved value definition. This is defined by field number (an index
in the 'fields' array which is given for the dm_report_init_with_selection
function during report initialization) and the value to use for any
of the specified reserved names.
A field where it has no meaning to do any type of comparison is the
implicit "help" or "?" field. The error given was a bit cryptic
before this patch, the FLD_UNCOMPARABLE flag makes it easier to identify
this situation anywhere in the code and provide much better error message.
This flag can be applied to other fields that may appear in the future -
mostly usable for implicit fields as they always have special purpose
(so we're not exporting it in libdevmapper for now - usual reporting
fields don't need this).
Before this patch:
$ vgs -S help=1
dm_report_object: no data assigned to field help
dm_report_object: no data assigned to field help
(...which is true actually, but let's provide something better...)
With this patch applied:
$vgs -S help=1
Selection field is uncomparable: help.
Selection syntax error at 'help=1'.
$vgs -S '(name=vg && help=1) || vg_size > 1g'
Selection field is uncomparable: help.
Selection syntax error at 'help=1) || vg_size > 1g'.
It's better to have implicit fields at the very end of the output
so users can see them without scrolling back if the list of fields
is long (the "help" is also an implicit field now so it should be
easily visible).
We have "help" and "?" defined as implicit fields now. As such, we
don't need to export these names in libdevmapper (as it was introduced
by commit 7c86131233 within this release).
If anyone uses these field names by mistake, the libdevmapper code can
error out correctly if it detects that the set of explicit field names
(the ones supplied by "fields" arg in dm_report_init/dm_report_init_with_selection)
contains any of the implicit field names (the ones defined internally
by libdevmapper itself).
Making "help" and "?" implicit also simplifies code since the
dm_report_init caller (lvm/dmsetup) doesn't need to check on
dm_report_init return whether "help" or "?" was hit while parsing
fields/sort keys in libdevmapper.
The libdevmapper now sets internal "RH_ALREADY_REPORTED" flag
after it reports the "help" or "?" implicit field. Then libdevmapper
itself checks for this flag in dm_report_object and if found,
the actual reporting is skipped (because the "help" implicit field
was reported instead of the actual report).
Fix gcc warnings:
libdm-report.c:1952:5: warning: "end_op_flag_hit" may be used uninitialized in this function [-Wmaybe-uninitialized]
libdm-report.c:2232:28: warning: "custom" may be used uninitialized in this function [-Wmaybe-uninitialized]
And snap_percent is not 0% in dm < 1.10.0 so
don't test comparison with 0% here.
Implicit fields are fields that are registered with the report
and reported internally by libdevmapper itself (compared to explicit
fields that are registered by the layer above libdevmapper - e.g. LVM,
dmsetup...).
The "selected" field is the implicit field (for now the only one)
that reports the result of the selection. Since the selection itself
is the property of the libdevmapper, the upper layer using dm_report_init
can't register this field itself and it must be done directly at
libdevmapper layer.
The "selected" field is internally registered as part of the "common"
report type with id 0x80000000 (the last bit in uin32_t) which is then
reserved (the explicit report types are then checked if they do not
contain this id and if yes, we error out).
This way, the "selected" field is recognized by all libdevmapper users
that initialize the reporting with "dm_report_init_with_selection".
If reporting is initialized with the classical "dm_report_init",
there's no functional change (so the "selected" field is not defined
and it's not recognized).
Make dm_report_init_with_selection to accept an argument with an
array of reserved values where each element contains a triple:
{dm report field type, reserved value, array of strings representing this value}
When the selection is parsed, we always check whether a string
representation of some reserved value is not hit and if it is,
we use the reserved value assigned for this string instead of
trying to parse it as a value of certain field type.
This makes it possible to define selections like:
... --select lv_major=undefined (or -1 or unknown or undef or whatever string representations are registered for this reserved value in the future)
... --select lv_read_ahead=auto
... --select vg_mda_copies=unmanaged
With this, each time the field value of certain type is hit
and when we compare it with the selection, we use the proper
value for comparison.
For now, register these reserved values that are used at the moment
(also more descriptive names are used for the values):
const uint64_t _reserved_number_undef_64 = UINT64_MAX;
const uint64_t _reserved_number_unmanaged_64 = UINT64_MAX - 1;
const uint64_t _reserved_size_auto_64 = UINT64_MAX;
{
{DM_REPORT_FIELD_TYPE_NUMBER, _reserved_number_undef_64, {"-1", "undefined", "undef", "unknown", NULL}},
{DM_REPORT_FIELD_TYPE_NUMBER, _reserved_number_unmanaged_64, {"unmanaged", NULL}},
{DM_REPORT_FIELD_TYPE_SIZE, _reserved_size_auto_64, {"auto", NULL}},
NULL
}
Same reserved value of different field types do not collide.
All arrays are null-terminated.
The list of reserved values is automatically displayed within
selection help output:
Selection operands
------------------
...
Reserved values
---------------
-1, undefined, undef, unknown - Reserved value for undefined numeric value. [number]
unmanaged - Reserved value for unmanaged number of metadata copies in VG. [number]
auto - Reserved value for size that is automatically calculated. [size]
Selection operators
-------------------
...
When the field list is displayed as help for constructing selection
criteria, show also the field value type. This is useful for users
to know what set of operators are allowed for the type - the subsequent
"Selection operands" section in the help output summarize all known
types that can be used in selection.
The "<lvm command> -S/--select help" shows help (including list of fields to match against):
...field list here including the field type name...
Selection operands
------------------
field - Reporting field.
number - Non-negative integer value.
size - Floating point value with units specified.
string - Characters quoted by ' or " or unquoted.
string list - Strings enclosed by [ ] and elements delimited by either
"all items must match" or "at least one item must match" operator.
regular expression - Characters quoted by ' or " or unquoted.
Selection operators
-------------------
Comparison operators:
=~ - Matching regular expression.
!~ - Not matching regular expression.
= - Equal to.
!= - Not equal to.
>= - Greater than or equal to.
> - Greater than
<= - Less than or equal to.
< - Less than.
Logical and grouping operators:
&& - All fields must match
, - All fields must match
|| - At least one field must match
# - At least one field must match
! - Logical negation
( - Left parenthesis
) - Right parenthesis
[ - List start
] - List end
Selection list items are enclosed in '[' and ']' (if there's only
one item, the '[' and ']' can be omitted). Each element of the list
is a string (either quoted or unquoted, like the usual string operand
used in selection) and each element is delimited either by conjunction
(meaining "match all") or disjunction operator (meaning "match any").
For example, if "," is the conjuction operator and "/" is the
disjunction operator then:
lv_tags=[a,b,c]
...will match all fields where tags contain *all* a, b and c.
lv_tags=[a/b/c]
...will match all fields where tags contain *any* of a, b, or c.
Mixing operators within the list is not supported:
lv_tags=[a,b/c]
...will give an error.
The order in which items are defined in the selection do not matter.
This patch enhances the selection parsing functionality to recognize
such lists.
The {pv,vg,lv,seg}_tags and lv_modules fields are reported as string
lists using the new dm_report_field_string_list - so we just pass
the list to the fn that takes care of reporting and item sorting itself.
Add a separate dm_report_field_string_list fn to libdevmapper to
support reporting string lists. Before, the code used libdevmappers's
dm_report_field_string fn which required formatting the list to a
single string. This functionality is now moved to libdevmapper
and the code that needs to report the string list just needs
to pass the list itself and libdevmapper will take care of this.
This also enhances code reuse.
The dm_report_field_string_list also accepts an argument to define
custom delimiter to use. If not defined, a default "," (comma) is
used as item delimiter in the string list reported.
The dm_report_field_string_list automatically sorts the items in
the list before formatting it to a final string. It also encodes
the position and length within the final string where each element
can be found. This can be used to support checking against each
list item reported since since when formatted as a single string
for the actual report, we would lose this information otherwise
(we don't want to copy each item, the position and length within
the final string is enough for us to get the original items back).
When such lists are checked against the selection tree, we can check
each item individually this way and we can support operators like
"match any" and "match all".
The list of strings is used quite frequently and we'd like to reuse
this simple structure for report selection support too. Make it part
of libdevmapper for general reuse throughout the code.
This also simplifies the LVM code a bit since we don't need to
include and manage lvm-types.h anymore (the string list was the
only structure defined there).
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
The dm_report_init_with_selection is the same as dm_report_init
but it contains an additional argument to set the selection
in the form of a string that contains field names to check against and
selection operators. The selection string is parsend and a selection
tree is composed for use in the checks against individual fields when
the report is processed. The parsed selection tree is stored in dm_report
structure as "selection_root".
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
Add support for parsing numbers, strings (quoted or unquoted), regexes
and operators amogst these operands in selection condition supplied.
This is rebased and edited version of the original design and
patch proposed by Jun'ichi Nomura:
http://www.redhat.com/archives/dm-devel/2007-April/msg00025.html
This patch defines operators and structures that will be used
to store the report selection against which the actual values
reported will be checked.
Selection operators
-------------------
Comparison operators:
=~ - Matching regular expression.
!~ - Not matching regular expression.
= - Equal to.
!= - Not equal to.
>= - Greater than or equal to.
> - Greater than
<= - Less than or equal to.
< - Less than.
Logical and grouping operators:
&& - All fields must match
, - All fields must match
|| - At least one field must match
# - At least one field must match
! - Logical negation
( - Left parenthesis
) - Right parenthesis
This makes it easier to check against the fields (following patches for
report selection) and check whether size units are allowed or not
with the field value.
As part of better error handling, remove DM devices that have been
sucessfully created but failed to load a table. This can happen
when pvmove'ing in a cluster and the cluster mirror daemon is not
running on a remote node - the mapping table failing to load as a
result. In this case, any revert would work on other nodes running
cmirrord because the DM devices on those nodes did succeed in loading.
However, because no table was able to load on the non-cmirrord nodes,
there is no table present that points to what needs to be reverted.
This causes the empty DM device to remain on the system without being
present in any LVM representation.
This patch should only be considered a partial fix to the overall
problem. This is because only the device which failed to load a
table is removed. Any LVs that may have been loaded as requirements
to the DM device that failed to load may be left in place. Complete
clean-up will require tracking those devices which have been created
as dependencies and removing them along with the device that failed
to load a table.
Share DM_REPORT_FIELD_RESERVED_NAME_{HELP,HELP_ALT} between libdm and
any libdm user to handle reserved field names, in this case the virtual
field name to show help instead of failing on unrecognized field.
The libdm user also needs to check the field name so it can fire
proper code in this case (cleanup, exit etc.).
If there ever would be a second call to dm_lib_init()
and envvar would be improperly set, some last set value
would be used while it should reset to default mangling mode.
When the node enters dtree with implicit dependency, it
automatically has udev flags from parent node
and could not be changed later when the node has been
entered again via i.e lvm's preload tracking.
Resolve this by tracking whether the node has been
created by implicit dependency tracking or has been
entered explicitely. Implicit node could be later
upgraded by an explicit _add_dev() with proper udev_flags.
For implicit devices add special udev flags to avoid
any scan and udev rule processing if we resume such device.
Patch allows easier removing of orphan nodes.
We need to use "--verifyudev" for dmsetup mangle command used in
the name-mangling test since without the --verifyudev, we'd end up
with the failed rename.
Also, add direct check for the dev nodes - node with old name must
be gone and node with new name must be present. Before, we checked
just the output of the command.
One bug popped up here when renaming with udev and libdevmapper
fallback checking the udev when target mangle mode is "none"
(fixme added in the libdevmapper's node rename code).
Reuse _node_send_messages for just checking
for valid transaction_id with preload.
This allows earlier detection of incosistent thin pool.
Code does the same thing, except for sending messages.
Improve testing of transation_id to not allow other difference
then either kernel TID is equal or is lower by oned and there
are queued messages for transaction.
Mark messages as submitted if the transaction_id is already matching.
Do not try to deactivate node on failure here and leave it on
proper error path of the caller.
Deactivation of top level node has to happen,
before traversing subtree.
Swap list logic and rather append new nodes to the head
and then use normal iteration.
(in-release update)
Avoid introducing libdm structure allocated in library user.
Use direct call with all currently supported args.
When new arg is added, new function will cover it.
I am reverting the commit below - removing the new 'dm_config_get_int'
function and simply calling 'dm_config_get_uint32' while casting the
'int *' pointer parameter.
Commit being reverted:
commit 94377dfd5e
Author: Jonathan Brassow <jbrassow@redhat.com>
Date: Mon Jan 27 05:26:19 2014 -0600
Misc: New function for reading lvm config file fields
Introduce 'dm_config_get_int', which will be used by the upcoming
cachepool segment type.
This patch defines a structure for holding all of the device-mapper
cache target's status information. The associated function provides
an easy way for higher levels (LVM) to consume the information.
This patch finishes the device-mapper interface for the cache and
cachepool segment types (i.e. the cache target).
This patch adds the cache segment type - the second of two necessary
to create cache logical volumes. This segment type references the
cachepool (the small fast device) and the origin (the large slow device);
linking them to create the cache device. The cache device is the
hierarchical device-mapper device that the user ulitmately makes use
of.
The cache segment sources the information necessary to construct the
device-mapper cache target from the origin and cachepool segments to
which it links.
This patch adds the new cachepool segment type - the first of two
necessary to eventually create 'cache' logical volumes. In addition
to the new segment type, updates to makefiles, configure files, the
lv_segment struct, and some necessary libdevmapper flags.
The cachepool is the LV and corresponding segment type that will hold
all information pertinent to the cache itself - it's size, cachemode,
cache policy, core arguments (like migration_threshold), etc.
Revert activated volumes if callback fails.
This is currently used only for thin_check failure support.
When thin_check detects failure in thin metadata device, it deactivate
volumes in reversed order that have been preloaded for thin pool activation.
After this change lvm command will not leave active pool subvolumes
in dm table.
Pass dnode pointer instead of rather unknown child pointer.
The pointer is currently unused and passing child pointer
is quite undefined, while dnode has at least some usability.
Add internal error warning when string value is used
as sort value for numerical field.
Using log_warn since the function itself does not return error,
so we do not confuse log_error() checker.
On modern systems udev manages nodes in /dev/mapper directory.
It creates, deletes and renames the nodes according to the
state of the kernel driver.
When the dmsetup is compiled without udev support (--enable-udev_sync)
and runs on the system with running udevd it tries to manage nodes in
/dev/mapper too, so it can race with udev.
dmsetup checks if the node was created/deleted/renamed with the stat
syscall, and skips the operation if it was. However, if udev
creates/deletes/renames the node after the stat syscall and before the
mknod/unlink/rename syscall, dmsetup reports an error.
Since in the system everything happened as expected, skip reporting
error for such case.
These races can be easily provoked by inserting sleep at appropriate
places.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
This file may be included by other programs, so it should be compliant
with the C standard.
* use __linux__ instead of linux - __linux__ is always defined, linux is
not defined when gcc runs in standard-compliant mode (with -std=c89 or
-std=c99) because the C standard doesn't allow polluting namespace
with arbitrary defines.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Just like we have symbolic names assigned to general DM udev flags
(DM_UDEV_* flags), we have the same for any subsystem flags now
(DM_SUBSYSTEM_UDEV_FLAG*), making it easier to use.
When images and their associated metadata are removed from a RAID1 LV,
the remaining sub-LVs are "shifted" down to fill the gaps. For
example, if there is a 3-way mirror:
[0][1][2]
and we remove device#0, the devices will be shifted down
[1][2]
and renamed.
[0][1]
This can create a problem for resume_lv (specifically,
dm_tree_activate_children) during the renaming process though. This
is because it will attempt to rename the higher indexed sub-LVs first
and find that it cannot because there are currently other sub-LVs with
that name. The solution is to check for a conflicting name before
attempting to rename. If a conflict is found and that conflicting
sub-LV is also in the process of renaming, we can defer the current
rename until the conflicting sub-LV has renamed and cleared the
conflict.
Now that resume_lv can handle these types of rename conflicts, we can
remove the workaround in RAID that was attempting to resume a RAID1
LV from the bottom-up in order to force a proper rename in assending
order before attempting a resume on the top-level LV. This "hack"
only worked for single machine use-cases of LVM. Clearing this up
paves the way for exclusive activation of RAID LVs in a cluster.
Do not allow passing '' names to kernel.
This test was missing also in kernel, so it has allowed
to create device with '' name. This then confused dmsetup tool,
since such name is unexpected and unsupported. To remove
such name from table, user has to use -j -m to specify which device
should be removed.
This patch fixes the posibility to run this operation:
dmsetup rename existingdev ''
after this operation commands like 'dmsetup table' are failing.
This patch prohibits to use such name.
Recent kernels allow messages to respond with a string.
Add dm_task_get_message_response() to libdevmapper to perform some
basic sanity checks and return this.
Have 'dmsetup message' display any response.
DM statistics will make extensive use of this.
(From Mikulas.)
libdm-common.c:883:42: warning: pointer/integer type mismatch in conditional expression
define log_sys_error(x, y) log_err("%s%s%s failed: %s", y, *y ? ": " : "", x, strerror(errno))
So the "y" which was 'path ? : "SELinux context reset"' from
previous commit did not quite fit the other "? :" in the log_sys_macro.
- null_fd resource leak on error path in _reopen_fd_null fn
- dead code in verify_message in clvmd code
- dead code in _init_filter_components in toolcontext code
- null dereference in dm_prepare_selinux_context on error path if
setfscreatecon fails while resetting SELinux context
Support tests with abort when libdm encounters internal
error - i.e. for dmsetup tool.
Code execution will be aborted when
env var DM_ABORT_ON_INTERNAL_ERRORS is set to 1
When resuming a node needed by a higher layer of the tree,
if the resume fails, only remove it if the node did not
originally have a live table.
Ref. 97f8454ecc
Clear send_messages flag when they have been delivered successfully.
There is no need to validate it for all other activations of the same
node in the dm_tree.
Also add extra debug message which shows the reason for skipping
sending of messages because the transaction_id has already the matching
value.
Show 'at' pointer address with pool name.
It's useful for debugging to be able to locate pointer address in the
debug trace log. It's only available when compiled with extra debug
compilation flag DEBUG_POOL in make.tmpl.
This patch adds the ability to set the minimum and maximum I/O rate for
sync operations in RAID LVs. The options are available for 'lvcreate' and
'lvchange' and are as follows:
--minrecoveryrate <Rate> [bBsSkKmMgG]
--maxrecoveryrate <Rate> [bBsSkKmMgG]
The rate is specified in size/sec/device. If a suffix is not given,
kiB/sec/device is assumed. Setting the rate to 0 removes the preference.
This patch may not be fully correct. It tries to solve
the imbalanced suspend counter.
The problem starts when some LV is created and fails in resume path.
(i.e. resuming to large PV (enforced) over small loop devices)
This fails in _resume_node() after dm_task_run(). And while
existing device with empty table is left in inactive table,
further calls are reporting this device is in suspend state.
When later the lvm2 tries to rollback created device and deactivate it,
it will end with internal error, when we try to decrement
never incremented suspend counter.
As an 'easy fix' for now update suspend counter only for live nodes.
TODO: explore better fix.
Since we use get_status also in dmeventd, which may use one pool
for a single device, in case it would be repeatedly returning error,
it may not be freeing the pool and would cause slow but steady growth.
To stay safe in the error path release any allocated memory.
To detect mounted device, use also /proc/self/mountinfo
as so far the check was only able to detect ext4 mounted filesystem.
TODO:
Once proper testing for this feature is added, it may appear,
mountinfo check is enough and covers all cases and sysfs check
could be removed.
'lvchange' is used to alter a RAID 1 logical volume's write-mostly and
write-behind characteristics. The '--writemostly' parameter takes a
PV as an argument with an optional trailing character to specify whether
to set ('y'), unset ('n'), or toggle ('t') the value. If no trailing
character is given, it will set the flag.
Synopsis:
lvchange [--writemostly <PV>:{t|y|n}] [--writebehind <count>] vg/lv
Example:
lvchange --writemostly /dev/sdb1:y --writebehind 512 vg/raid1_lv
The last character in the 'lv_attr' field is used to show whether a device
has the WriteMostly flag set. It is signified with a 'w'. If the device
has failed, the 'p'artial flag has priority.
Example ("nosync" raid1 with mismatch_cnt and writemostly):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg Rwi---r-m 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-w 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-- 1 linear 4.00m
Example (raid1 with mismatch_cnt, writemostly - but failed drive):
[~]# lvs -a --segment vg
LV VG Attr #Str Type SSize
raid1 vg rwi---r-p 2 raid1 500.00m
[raid1_rimage_0] vg Iwi---r-- 1 linear 500.00m
[raid1_rimage_1] vg Iwi---r-p 1 linear 500.00m
[raid1_rmeta_0] vg ewi---r-- 1 linear 4.00m
[raid1_rmeta_1] vg ewi---r-p 1 linear 4.00m
A new reportable field has been added for writebehind as well. If
write-behind has not been set or the LV is not RAID1, the field will
be blank.
Example (writebehind is set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r-- 512
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Example (writebehind is not set):
[~]# lvs -a -o name,attr,writebehind vg
LV Attr WBehind
lv rwi-a-r--
[lv_rimage_0] iwi-aor-w
[lv_rimage_1] iwi-aor--
[lv_rmeta_0] ewi-aor--
[lv_rmeta_1] ewi-aor--
Revert commit 31c24dd9f2. This commit
was used to force a RAID device-mapper table to be loaded into the
kernel despite the fact that it was identical to the one already
loaded. The effect allowed a RAID array with a transiently failed
device to refresh and reintegrate the failed device. This operation
is better done in the kernel on a 'resume'. Since,
'lvchange --refresh' already performs a suspend/resume cycle, the
above commit is not needed once the kernel change is made. Reverting
the commit removes an unnecessary (at least for now) change to the
device-mapper interface.
I've updated the dm_status_raid structure and dm_get_status_raid()
function to make it handle the new kernel status fields that will
be coming in dm-raid v1.5.0. It is backwards compatible with the
old status line - initializing the new fields to '0'. The new
structure is also more amenable to future changes. It includes a
'reserved' field that is currently initialized to zero but could
be used to hold flags describing new features. It also now uses
pointers for the character strings instead of attempting to allocate
their space along with the structure (causing the size of the
structure to be variable). This allows future fields to be appended.
The new fields that are available are:
- sync_action : shows what the sync thread in the kernel is doing
(idle, frozen, resync, recover, check, repair, or
reshape)
- mismatch_count: shows the number of discrepancies which were
found or repaired by a "check" or "repair"
process, respectively.
Previous commit included changes to WHATSNEW, but the code changes
were missing. Here is the description from the previous commit:
commit bbc6378b73
Author: Jonathan Brassow <jbrassow@redhat.com>
Date: Thu Feb 21 11:31:36 2013 -0600
RAID: Make 'lvchange --refresh' restore transiently failed RAID PVs
A new function (dm_tree_node_force_identical_table_reload) was added to
avoid the suppression of identical table reloads. This allows RAID LVs
to reload the on-disk superblock information that contains which devices
have failed and the bitmaps. If the failed device has returned, this has
the effect of restoring the device and initiating recovery. Without this
patch, the user had to completely deactivate their RAID LV and re-activate
it in order to restore the failed device. Now they simply need to
suspend and resume (which is done by 'lvchange --refresh').
The identical table suppression is only avoided if the LV is not PARTAIL
(i.e. all of it's devices can be seen and read by LVM) and the kernel
status of the array contains failed devices. In other words, the function
will only be called in the case where we may have success in restoring
a failed device in the array.
There's a possibility to interconnect the dm_config_node with an
ID, which in our case is used to reference the configuration
definition ID from config_settings.h. So simply interconnecting
struct dm_config_node with struct cfg_def_item.
This patch also adds support for enhanced config node output besides
existing "output line by line". This patch adds a possibility to
register a callback that gets called *before* the config node is
processed line by line (for example to include any headers on output)
and *after* the config node is processed line by line (to include any
footers on output). Also, it adds the config node reference itself
as the callback arg in addition to have a possibility to extract more
information from the config node itself if needed when processing the
output callback (e.g. the key name, the id, or whether this is a
section or a value etc...).
If the config node from lvm.conf/--config tree is recognized and valid,
it's always coupled with the config node definition ID from
config_settings.h:
struct dm_config_node {
int id;
const char *key;
struct dm_config_node *parent, *sib, *child;
struct dm_config_value *v;
}
For example if the dm_config_node *cn holds "devices/dev" configuration,
then the cn->id holds "devices_dev_CFG" ID from config_settings.h, -1 if
not found in config_settings.h and 0 if matching has not yet been done.
To support the enhanced config node output, a new structure has been
defined in libdevmapper to register it:
struct dm_config_node_out_spec {
dm_config_node_out_fn prefix_fn; /* called before processing config node lines */
dm_config_node_out_fn line_fn; /* called for each config node line */
dm_config_node_out_fn suffix_fn; /* called after processing config node lines */
};
Where dm_config_node_out_fn is:
typedef int (*dm_config_node_out_fn)(const struct dm_config_node *cn, const char *line, void *baton);
(so in comparison to existing callbacks for config node output, it has
an extra dm_config_node *cn arg in addition)
This patch also adds these functions to libdevmapper:
- dm_config_write_node_out
- dm_config_write_one_node_out
...which have exactly the same functionality as their counterparts
without the "out" suffix. The "*_out" functions adds the extra hooks
for enhanced config output (prefix_fn and suffix_fn mentioned above).
One can still use the old interface for config node output, this is
just an enhancement for those who'd like to modify the output more
extensively.
Export this functionality from libdevmapper just for
convenience and general use when reading boolean values
which could be defined either in a numeric way with 0/1
or by using strings with "true"/"false", "yes"/"no",
"on"/"off", "y"/"n".
When a section was empty in a configuration tree (no children - this is
allowed) and we were looking for a config node inside that section, the
_find_config_node function incorrectly returned the section itself if
the node inside that section was not found.
For example the configuration below:
The config:
abc {
}
And a function call to get the "def" node inside "abc" section:
_find_config_node(..., "abc/def")
...returned the "abc" node instead of NULL ("def" not found).
This in turn caused segfaults in the code using lookups in such
a configuration tree as we (correctly) expected that the node
returned was always the one we were looking for or NULL if not
found. But if incorrect node was returned instead, we processed
that as if this was the node we were looking for and so we
processed its value as well. But sections don't have values => segfault.
On glibc, those are erroneously (namespace pollution) pulled in via
other headers. this doesn't work with conformant libcs (musl libc in
this case), we simply need to include all needed headers.
Signed-Off-By: John Spencer <maillist-lvm@barfooze.de>
This patch fixes problem reported here:
https://www.redhat.com/archives/dm-devel/2013-January/msg00311.html
Fixing it by separating function for duplicating string token.
---
When /etc/lvm/lvm.conf is truncated at the first '"' of a line, all LVM
utilities crash with a segfault.
The segfault only seems to occur if the last character is the first '"'
(double quote) of a line. If you truncate it at any other point, lvm
detects the error and report parse error
lvm.conf ends like this.
$hexdump -C lvm.conf
....
69 72 20 3d 20 22 2f 64 65 76 22 0a 0a 0a 20 20 |ir = "/dev"... |
20 20 23 20 41 6e 20 61 72 72 61 79 20 6f 66 20 | # An array of |
64 69 72 65 63 74 6f 72 69 65 73 20 74 68 61 74 |directories that|
20 63 6f 6e 74 61 69 6e 20 74 68 65 20 64 65 76 | contain the dev|
69 63 65 20 6e 6f 64 65 73 20 79 6f 75 20 77 69 |ice nodes you wi|
73 68 0a 20 20 20 20 23 20 74 6f 20 75 73 65 20 |sh. # to use |
77 69 74 68 20 4c 56 4d 32 2e 0a 20 20 20 20 73 |with LVM2.. s|
63 61 6e 20 3d 20 5b 20 22 2f 78 22 2c 0a 20 20 |can = [ "/x",. |
20 20 20 20 20 20 20 20 20 20 20 22 | "|
...
Reported-by: dongmao zhang <dmzhang suse com>
Similar to the way thin* accesses its kernel status, we add a method
for RAID to grab the various values in its status output without the
higher levels (LVM) having to understand how to parse the output.
Added functions include:
- lib/activate/dev_manager.c:dev_manager_raid_status()
Pulls the status line from the kernel
- libdm/libdm-deptree.c:dm_get_status_raid()
Parses status line and puts components into dm_status_raid struct
- lib/activate/activate.c:lv_raid_dev_health()
Accesses dm_status_raid to deliver raid dev_health string
The new structure and functions can provide a more unified way to access
status information. ('lv_raid_percent' could switch to using these
functions, for example.)
Add log/debug_classes to lvm.conf to allow debug messages to be
classified and filtered at runtime.
The dm_errno field is only used by log_error(), so I've redefined it
for log_debug() messages to hold the message class.
By default, all existing messages appear, but we can add categories that
generate high volumes of data, such as logging all traffic to/from
lvmetad.
If the resume of preloaded node fails, do not leave such
node in the table - since it may not be easy to detach such
node later when the node is i.e. internal.
i.e. failing activation of the thin pool with mismatching
chunk size may leave -tpool device in the table, which
could have been then removed only by dmsetup command.
$ export DM_DISABLE_UDEV=1
$ dmsetup create test --table "0 1 zero"
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
$ lvchange -ay vg/lvol0
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will manage logical volume symlinks in device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, LVM will obtain device list by scanning device directory.
Udev is running and DM_DISABLE_UDEV environment variable is set. Bypassing udev, device-mapper library will manage device nodes in device directory.
Setting this environment variable will cause a full fallback
to old direct node and symlink management in libdevmapper and lvm2.
It means:
- disabling udev synchronization
(--noudevsync in dmsetup and --noudevsync + activation/udev_sync=0
lvm2 config)
- disabling dm and any subsystem related udev rules
(--noudevrules in dmsetup and activation/udev_rules=0 lvm2 config)
- management of nodes/symlinks under /dev directly by libdevmapper/lvm2
(--verifyudev in dmsetup and activation/verify_udev_operations=1
lvm2 config)
- not obtaining any device list from udev database
(devices/obtain_device_list_from_udev=0 lvm2 config)
Note: we could set all of these before - there's no functional change!
However the DM_DISABLE_UDEV environment variable is a nice shortcut
to make it easier for libdevmapper users so that one can switch off all
of the udev management off at one go directly on the command line,
without a need to modify any source or add any extra switches.
cookie_set variable found in the struct dm_task should be always
set to 1 after dm_task_set_cookie_call, even if udev_sync is disabled
as the cookie itself carries synchronization informations *as well as*
extra flags to control other aspects of udev support.
For example, one could disable the synchronization itself, but still
direct the libdm code to disable library fallback via
DM_UDEV_DISABLE_LIBRARY_FALLBACK flag. These extra flags still need
to be carried out!
A concrete example:
$ dmsetup create test --table "0 1 zero" --noudevsync
This disables synchronization with udev. As the --verifyudev option is
not used, we don't want to do any corrections. In other words, we
need DM_UDEV_DISABLE_LIBRARY_FALLBACK flag to be used. However,
with --noudevsync this was not the case - the flag was ignored!
This patch fixes the case when noudevsync is used but there are still
some extra flags passed within the cookie flag part. The synchronization
part of the cookie stays zero (which is ok as dm_udev_wait call on such a
cookie is simply a NOOP).
Use log_warn to print non-fatal warning messages.
Use of log_error would confuse checker for testing
whether proper error has been reported for some real error.
On each ioctl return, the device UUID is decoded from \xNN format.
If the UUID of the device being *removed* is malformed (e.g. it
hasn't been corrected before), just remove it without any error
as the UUID is not needed anymore - the device is gone anyway.
Otherwise a misleading error message would be issued just after
the removal:
# dmsetup remove test
The UUID "a b" should be mangled but it contains blacklisted characters.
Command failed