IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Variable props.send_messages has 3 states and was not used properly
here. Activation in this moment does not need to verify thin-pool status
as that has been already checked on preload.
So only if there are some real messages (value 2) call function
for sending them.
Replicator never really existed in upstream kernel and its support
got deprecated.
Also its support never got finished so no code is supposed to be
using it anyway.
Libdm symbols are remaining, just the implementation will always
return failure - so any user of:
dm_tree_node_add_replicator_dev_target()
dm_tree_node_add_replicator_target().
will now always recieve error message.
Separate handling of error code from _info_by_dev.
This error can only happeng when we are running out of memory.
In such case there is urgent need to stop any futher proceeding
of command and run to error ASAP.
Propagate delayed resume at least for preload case in a simple way.
Currently PVMOVE depends on internal logic where 'mirror' with
corelog is 'possible' PVMOVE. In such case resume of 'created'
node is 'delayed'.
This is mostly an ugly internal hack - but for the moment being when we
add propagation for preload - it does work reasonable.
TODO: provide standard API and avoid this internal 'guessing'.
We always preferred and recommended socket activation for our services
so remove the Install section in related .service units which are unused
in this case and keep only the Install section in associated .socket
units.
Signed-off-by: Bastian Blank <waldi@debian.org>
When dmeventd receives SIGTERM/INT/HUP/QUIT it validates if exit is possible.
If there was any device still monitored, such exit request used to
be ignored/refused. This 'usually' worked reasonably well, however if there
is very short time period between last device is unmonitored and signal
reception - there was possibility such EXIT was ignored, as dmeventd has
not yet got into idle state even commands like 'vgchange -an' has already
finished.
This patch changes logic towards scheduling EXIT to the nearest
point when there is no monitored device.
EXIT is never forgotten.
NOTE: if there is only a single monitored device and someone sends
SIGTERM and later someone uses i.e. 'lvchange --refresh' after
unmonitoring dmeventd will exit and new instance needs to be
started.
There's nothing special about /boot other than it's used during boot.
But when blkdeactivate is called either on all devices or including a
device where the /boot is on top, we should also include this mount
point when doing unmount before deactivation of supported devices.
The new blkdeactivate -r|mdraidoption wait causes blkdeactivate to wait
for any resync/recovery/reshape that is currently in progress before
deactivating the device.
If this option is used, blkdeactivate calls mdadm -W|--wait before
mdadm -S|--stop.
Revert dc50f2f4a0.
We're canonicalizing/escaping the names here and we're reusing the
variable name so the code doesn't need to use extra variables and
further assignments that may confuse us. Let's keep the code simple.
The
local name=(...$name)
is not the same as
local name
name=(...$name)
(I know various code-checking tools fuss about this and recommend
the 2nd way, but let's ignore those tools' nitpicking here please.)
There was a typo in blkdeactivate --dmoption/--lvmoption/mpathoption,
it had missing "s" at the end and it was not recognized properly, only
short names for the options (-d/-l/-m).
The following commands now pass the device list through a
--select|-S filter before processing:
suspend resume clear wipe_table remove deps status table
Add the new concise format to dmsetup create, either as a single
command-line parameter or from stdin.
Based on patches submitted by
Enric Balletbo i Serra <enric.balletbo@collabora.com>.
Create a new table output format that concisely shows multiple devices
on one line.
dmsetup table --concise [device...]
<dev_name>,<uuid>,<flags>[,<table>]*[;<dev_name>,<uuid>,<flags>[,<table>]*]*
Table lines are separated by commas.
Devices are separated by semi-colons.
Flags is currently 'ro' or 'rw' (and might be extended in a
yet-to-be-defined way in future).
Any comma, semi-colon or backslash within a field is quoted by a
preceding backslash.
The format can later be supplied as input to dmsetup or even to the
booting kernel as an alternative way to set up devices.
Based on patches submitted by
Enric Balletbo i Serra <enric.balletbo@collabora.com>.
The blkid we call in 13-dm-disk.rules also returns identifiers for
partitions based on which the /dev/disk/by-part{uuid,label} and
gpt-auto-root symlinks should be created in the same manner as we
already create symlinks for filesystem labels and uuids.
This is because we handle blkid calls and symlink creation under
/dev/disk ourselves in our 13-dm-disk.rules for device-mapper devices
for us to have more control over this process.
See also https://lists.freedesktop.org/archives/systemd-devel/2017-July/039220.html
and original report http://tracker.ceph.com/issues/19489 for
the exact case where these symlinks were missing.
Dmeventd reuses 'dm_task' struct for some STATUS operation, but due to
missing reinitization of dm_task target list, it has caused misprocesing
of recieved events as the parsed target has been simply added to the
list of existing status and cause multiple actions being called for
single event.
Add function to adjust printing of percent values in better way.
Rounding here is going along following rules:
0% & 100% are always clearly reported with .0 decimal points.
Values slightly above 0% we make sure a nearest bigger
non zero value with given precission is printed
(i.e. 0.01 for %.2f will be shown)
For values closely approaching 100% we again detect and adjust value
that is less then 100 when printed.
(i.e. 99.99 for %.2f will be shown).
For other values we are leaving them with standard rounding mechanism
since we care mainly about corner values 0 & 100 which need to be
printed precisely.
When we want to report primary leg failure, check for intial 'a',
since otherwice 'Aa idle' is normally visible.
Also reset array of bit flags marking dead devices, once
plugin detects raid is in sync.
Functionality of ignore suspend devices is already granted by:
lvm2_disable_dmeventd_monitoring() -> init_run_by_dmeventd() ->
init_ignore_suspended_devices().
In fact plugins should never use --config because it has
some unpleasant technical issues.
Current existing kernels reports status sometimes in weird form.
Instead of showing what is the exact progress, we need to estimate
this in-sync state from several surrounding states.
Main reason here is to never report 100% sync state for a raid device
which will be undergoing i.e. recovery.
It's not an error to attempt to update regions from an fd that has
been truncated (or otherwise no longer has any allocated extents):
in this case, the call should remove all regions corresponding to
the group, and return an empty region table.
With recent updates for thin pool monitoring in version 169
we lost multiple WARNINGs to be printed in syslog, when
pool crossed 80%, 85%, 90%, 95%, 100%.
Restore this logic as we want to keep user informed more
then just once when 80% boundary is passed.
Removing some unused new lines and changing some incorrect "can't
release until this is fixed" comments. Rename license.txt to make
it clear its merely an included file, not itself a licence.
Older library version was not detecting unknown 'feature' bits
and could let start target without needed option.
New versioned symbol now checks for supported feature bits.
_Base version keeps accepting only previously known features and
mask/ignores unknown bits.
NB: if the older binary passed in 'random' bits, it will not get
metadata2 by chance. New linked binary get new validation function.
Library user is required to not pass 'trash' for unsupported bits,
as such calls will be rejected.
Dm cache target version 1.10 introduces new cache metadata format
(upstream kernel >=4.11).
New format is enable by passing new target feature flag metadata2.
Interace side on libdm uses DM_CACHE_FEATURE_METADATA2.
This feature bit is now also recognized on status
and set in 'feature_flags' field of dm_status_cache structure.
Code also adds check for 'highest' supported feature flag bit.
So it rejects properly any 'unknown' feature bit set by application.
Better code to enforce writethrough caching for cleaner policy.
Only check for cleaner when DM_CACHE_FEATURE_PASSTHROUGH or
DM_CACHE_FEATURE_WRITEBACK is set.
Some archs can use even 64K pages and then lvm2 runs into trouble if
the stack is 'too small' to fit extra page capturing stack overwrite.
So when lvm2 limits stack - add extra mem page - be it 4K or 64K.
Relates to ppc64le bug: https://bugzilla.redhat.com/1387279
When we preload device with smaller size, we avoid its resume,
so later suspend/resume of full device tree my process all
existing in flight bios.
Also update comment and avoid using confusing opposite meaning.
Kernel 4.10 (dm-crypt v1.15.0) and later supports loading device
tables with crypt segment having key in kernel keyring retention
service.
dmsetup hid key section of tables output. With this patch dmsetup
no longer hides key section if it detects kernel key description
instead of hex byte representation of key itself.
When showing sizes with 'H|human' units we do use standard rounding.
This however is confusing users from time to time,
when the printed number uses some biger units i.e. GiB and there is just
tiny fraction of space missing.
So here is some real-life example with new 'r' unit.
$lvs
LV VG Attr LSize Pool Origin
lvol0 vg -wi-a----- 1.99g
lvol1 vg -wi-a----- <2.00g
lvol2 vg -wi-a----- <2.01g
Meaning is - lvol1 has 'slightly' less then 2.00g - from sign '<' user
can be aware the LV doesn't have full 2.00GiB in size so he
will be less surpriced allocation of 2G volume will not succeed.
$ vgs
VG #PV #LV #SN Attr VSize VFree
vg 2 2 0 wz--n- <6,00g <2,01g
For uses needing 'old' undecorated human unit simply will continue
to use 'H|h' units.
The new R|r may further change when we would recongnize some
other way how to improve readability.
When thin-pool processes event and 'lvextend --use-policies' fails
rather capture up-to-date new info as the fullness percentage may
have jumped noticable. This way we could use 'more' correct numbers
when checking for thresholds.
The system is likely in some very inconsisten state.
Do not try to make it even more problematic with trying
to invoke tools like thin_check via callback.
Handle files that contain multiple logical extents in a single
physical extent properly:
- In FIEMAP terms a logical extent is a contiguous range of
sectors in the file's address space.
- One or more physically adjacent logical extents comprise a
physical extent: these are the disk areas that will be mapped
to regions.
- An extent boundary occurs when the start sector of extent
n+1 is not equal to (n.start + n.length).
This requires that we accumulate the length values of extents
returned by FIEMAP until a discontinuity is found (since each
struct fiemap_extent returned by FIEMAP only represents a single
logical extent, which may be contiguous with other logical
extents on-disk).
This avoids creating large numbers of regions for physically
adjacent (logical) extents and fixes the earlier behaviour which
would only map the first logical extent of the physical extent,
leaving gaps in the region table for these files.
When mapping regions to a file descriptor, a temporary table of
extent descriptors is built using the dm_pool object building
interface.
Previously this use borrowed the dms->mem region and counter
table pool (since nothing can interleave with the allocation
while the caller is still in dm_stats_create_regions_from_fd()).
This turns out to be problematic for error recovery. When a
region creation operation fails partway through file mapping,
we need to roll back the set of already created regions and
this requires a listed handle: the dm_stats_list() will then
allocate from the same pool as the extents; we either have
to throw away valid list data, or leak the extent table, to
return the handle in a valid state.
Avoid this problem by creating a new, temporary mem pool in
_stats_create_file_regions() to hold the extent data, and
discarding it on exit from the function.
Instead of compiling 2 log call for 2 different logging functions,
and runtime decide which version to use - use only 'newer' function
and when user sets his own OLD dm_log logging translate it runtime
for old arg list set.
The positive part is - we get shorter generated library,
on the negative part this translation means, we always have evaluate
all args and print the message into local on stack buffer, before
we can pass this buffer to the users' logging function with proper
expected parameters (and such function may later decide to discard
logging based on message level so whole printing was unnecessary).
Ensure different logging function for dmeventd.c logging
and dm and lvm library.
We can recognize we want to show every log_info() and
log_notice() message from dmeventd.c code while not
exposing those from libdm/libdevmapper-event
Also switch to use log with errno - it's not changing
anything and doesn't bring any more features yet to dmeventd
logging but we just properly pass dm_errno_or_class properly
through the whole code stack for possible future use
(i.e. support of class logging for dmeventd).
Reword the logging logic and try to restore previous logging
behavior for 'standalone' running daemon while preserving
debuggable feautures it has gained.
So actual rules:
dmeventd without any '-d' option will syslog all messages
from dmeventd.c it dmeventd plugins.
log_notice()==log_verbose()
log_info()==log_very_verbose()
But to show also log_debug() used has to give '-ddd'.
When user specified '-d, -dd, -ddd, -dddd' it
will also enable tracing of messages from libdm & lib
executed code - which is mainly useful for testing
i.e.: 'dmeventd -fldddd'
Introduce macros:
log_level(), log_stderr(), log_once(), log_bypass_report()
For easier and more consisten way how to 'decoder' bits
of info from passed 'level'.
This patch fixes potential problem when 'level' of message
might not have always masked right bits.
If a device disappears after obtaining the list of devices but before
processing it as a member of that list, dmsetup exits with a failure code.
Most commands still produce what output they can in these circumstances,
but 'ls --tree' and 'info -c' with fields depending on device dependencies
didn't. Change this.
Integrate back _unblock_sigalrm() and check for error code of
pthread_sigmask() function so we do not use uninitialized
sigmask_t on error path (Coverity).
The dm_stats_delete_region() call removes a region from the bound
device, and, if the region is grouped, from the group leader
group descriptor stored in aux_data.
To do this requires a listed handle: previous versions of the
library do not since no dependencies exist between regions without
grouping.
This leads to strange behaviour when a command built against an old
version of the library is used with one supporting groups. Deleting
a region with dmstats succeeds, but logs errors:
# dmstats list
Name RgID RgSta RgSiz #Areas ArSize ProgID
vg_hex-root 0 0 1.00g 1 1.00g dmstats
vg_hex-root 1 1.00g 1.00g 1 1.00g dmstats
vg_hex-root 2 2.00g 1.00g 1 1.00g dmstats
# dmstats delete --regionid 2 vg_hex/root
Region ID 2 does not exist
Could not delete statistics region.
Command failed
# dmstats list
Name RgID RgSta RgSiz #Areas ArSize ProgID
vg_hex-root 0 0 1.00g 1 1.00g dmstats
vg_hex-root 1 1.00g 1.00g 1 1.00g dmstats
This happens because the call to dm_stats_delete_region() is inside
a dm_stats_walk_*() iterator: upon entry to the call, the iterator
is at its end conditions and about to terminate. Due to the call to
dm_stats_list() inside the function, it returns with an iterator at
the beginning of a walk and performs a further iteration before
exiting. This final loop makes a further attempt to delete the
(already deleted) region, leading to the confusing error messages.
The current dmsetup.c handles DR_STATS and DR_STATS_META reports
separately in _display_info_cols(), meaning that the stats walk
functions are never called for these report types.
Versions before v2.02.159 have a loop using dm_stats_walk_do() and
dm_stats_walk_while(), that executes once for non-stats reports,
and once per region, or area, for DR_STATS/DR_STATS_META reports.
This older behaviour relies on the documented behaviour that the
walk functions will accept a NULL pointer as the struct dm_stats*
argument.
This was broken by commit f1f2df7b: the NULL test on dms and
dms->regions were incorrectly moved from the dm_stats_walk_end()
wrapper to the internal '_stats_walk_end()' helper.
Since the pointer is dereferenced in between these points, using
an older dmsetup with current libdm results in a segfault when
running a non-stats report:
# dmsetup info -c vg00/lvol0
Segmentation fault (core dumped)
Restore the NULL checks to the wrapper function as intended.
blkdeactivate -m disablequeueing causes "multipathd disablequeueing maps"
call inside blkdeactivate script before deactivating devices. This
avoids a situation where blkdeactivate may wait for paths to appear if
multipath is set to queueing and there's a stack of other devices and/or
mount points on top of such multipath device.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1344381.
Add support do dm_stats_walk*() to walk over the set of
available groups using the cursor embedded in the dm_stats
handle, and to obtain the type of the object at the current
stats cursor location. A set of flags is introduced to
control which objects are visited:
DM_STATS_WALK_AREA
DM_STATS_WALK_REGION
DM_STATS_WALK_GROUP
DM_STATS_WALK_ALL
A final flag suppresses visits to regions that contain only a
single area - since the aggregate of such a region is idential
to the area it contains this allows these duplicates to be
filtered out:
DM_STATS_WALK_SKIP_SINGLE_AREA
If flags are not initialised before beginning a walk the default
set matches the behaviour of previous versions of the library.
Also accept group identifiers as immediate arguments to the
counter, metric, and property functions by adding control
flags to the region and area identifiers passed in.
Region and area properties are mapped to their equivalents for
the group (for example: group size is reported as the sum of
all regions contained in the group). Counter and metric values
are aggregated for the region or group.
Add a grouping facility to the libdm-stats library that allows the
user to bind several regions together as a group. Groups may be
used to aggregate data from several regions for reporting, or to
select and sort among large sets of regions.
A textual descriptor ("group tag") is associated with each group
and is stored in the first group member's aux_data field. The
tag contains the group member list and an optional alias for the
group, allowing the user to assign meaningful names to groups of
regions.
These descriptors are parsed in @stats_list message responses and
populate the resulting region and area tables with the group
structure.
Groups with overlapping regions are permitted but since this will
result in some events being counted more than once a warning is
printed in this case.
Nested and overlapping groups are not currently supported and
attempting to create these configurations results in error.
Add a new enum based interface for accessing counter and metric
values that uses a single function for each:
uint64_t dm_stats_get_counter(const struct dm_stats *dms,
dm_stats_counter_t counter
uint64_t region_id, uint64_t area_id);
int dm_stats_get_metric(const struct dm_stats *dms, int metric,
uint64_t region_id, uint64_t area_id,
double *value);
This simplifies the implementation of value aggregation for
groups of regions. The named function interface now calls the
enum interface internally so that all new functionality is
available regardless of the method used to retrieve values.
Add a function to parse a list of integer values and ranges into
a dm_bitset representation. Individual values signify that that bit
is set in the resulting mask and ranges are given as a pair of
start and end values, M-N, such that M and N are the first and
last members of the range (inclusive).
The implementation is based on the kernel's __bitmap_parselist()
that is used for cpumasks and other set configuration passed in
string form from user space.
Run umount code only when either thin data or metadata are
above 95% - so if there are resize failures with 60%.
system fill keep running.
Also umount will only be tried with lvm2 LVs.
Foreign users are ATM unsuppored.
The DM_REPORT_OUTPUT_MULTIPLE_TIMES instructs reporting code to
keep rows even after dm_report_output call - the rows are not
destroyed in this case which makes it possible to call dm_report_output
multiple times.
log_print is used during cmd line processing to log the result of the
operation (e.g. "Volume group vg successfully changed" and similar).
We don't want output from log_print to be interleaved with current
reports from group where log is reported as well. Also, the information
printed by log_print belongs to the log report too, so it should be
rerouted to log report if it's set.
Since the code in libdm-report which is responsible for doing the report
output uses log_print too, we need to use a different kind of log_print
which bypasses any log report currently used for logging (...simply,
we can't call log_print to output the log report itself which in turn
would again reroute to report - the report would never get on output
this way).
This patch introduces DM_REPORT_GROUP_JSON report group type. When using
this group type and when pushing a report to such a group, these flags
are automatically unset:
DM_REPORT_OUTPUT_ALIGNED
DM_REPORT_OUTPUT_HEADINGS
DM_REPORT_OUTPUT_COLUMNS_AS_ROWS
...and this flag is set:
DM_REPORT_OUTPUT_BUFFERED
The whole group is encapsulated in { } for the outermost JSON object
and then each report is reported on output as array of objects where
each object is the row from report:
{
"report_name1": [
{field1="value", field2="value",...},
{field1="value", field2="value",...}
...
],
"report_name2": [
{field1="value", field2="value",...},
{field1="value", field2="value",...}
...
]
...
}
This patch introduces DM_REPORT_GROUP_BASIC report group type. This
type has exactly the classical output format as we know from before
introduction of report groups. However, in addition to that, it allows
to put several reports into a group - this is the very basic grouping
scheme that doesn't change the output format itself:
Report: report1_name
Header1 Header2 ...
value value ...
value value ...
... ... ...
Report: report2_name
Header1 Header2 ...
value value ...
value value ...
... ... ...
There's no change in output for this report group type - with this type,
we only make sure there's always only one report in a group at a time,
not more.
This patch introduces DM report group (represented by dm_report_group
structure) that is used to group several reports to make a whole. As a
whole, all the reports in the group follow the same settings and/or
formatting used on output and it controls that the output is properly
ordered (e.g. the output from different reports is not interleaved
which would break readability and/or syntax of target output format
used for the whole group).
To support this feature, there are 4 new functions:
- dm_report_group_create
- dm_report_group_push
- dm_report_group_pop
- dm_report_group_destroy
From the naming used (dm_report_group_push/pop), it's clear the reports
are pushed onto a stack. The rule then is that only the report on top
of the stack can be reported (that means calling dm_report_output).
This way we make sure that the output is not interleaved and provides
determinism and control over the output.
Different formats may allow or disallow some of the existing report
flags controlling output itself (DM_REPORT_OUTPUT_*) to be set or not so
once the report is pushed to a group, the grouping code makes sure that
all the reports have compatible flags set and then these flags are
restored once each report is popped from the report group stack.
We also allow to push/pop non-report item in which case such an item
creates a structure (e.g. to put several reports together with any
opening and/or closing lines needed on output which pose as extra
formatting structure besides formatting the reports).
The dm_report_group_push function accepts an argument to pass any
format-specific data needed (e.g. handle, name, structures passed
along while working with reports...).
We can call dm_report_output directly anytime we need (with the only
restriction that we can call dm_report_output only for the report that
is currently on top of the group's stack). Or we don't need to call
dm_report_output explicitly in which case all the reports in a stack are
reported on output automatically once we call dm_report_group_destroy.
This reverts commit 8fd886f735.
This was a deliberate omission because logging token-by-token metadata
parsing greatly increases the amount of logging for hardly any benefit.
In general, only LVM config file settings need to be logged, and in
places where it's considered important to log particular elements of
metadata that should be done using specific log_* lines.
This area can be revisited.
When dm_tree_find_node_by_uuid() fails to find passed uuid,
report in lof_debug the complete original uuid,
not the one stripped of LVM- prefix.
TODO: inspect manipulation with LVM- prefix here.