IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When dmeventd receives SIGTERM/INT/HUP/QUIT it validates if exit is possible.
If there was any device still monitored, such exit request used to
be ignored/refused. This 'usually' worked reasonably well, however if there
is very short time period between last device is unmonitored and signal
reception - there was possibility such EXIT was ignored, as dmeventd has
not yet got into idle state even commands like 'vgchange -an' has already
finished.
This patch changes logic towards scheduling EXIT to the nearest
point when there is no monitored device.
EXIT is never forgotten.
NOTE: if there is only a single monitored device and someone sends
SIGTERM and later someone uses i.e. 'lvchange --refresh' after
unmonitoring dmeventd will exit and new instance needs to be
started.
Avoid adding -g more then once for debug builds.
Avoid enabling DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
When we want to report primary leg failure, check for intial 'a',
since otherwice 'Aa idle' is normally visible.
Also reset array of bit flags marking dead devices, once
plugin detects raid is in sync.
Functionality of ignore suspend devices is already granted by:
lvm2_disable_dmeventd_monitoring() -> init_run_by_dmeventd() ->
init_ignore_suspended_devices().
In fact plugins should never use --config because it has
some unpleasant technical issues.
Add the checks necessary to distiguish the state of a RAID when the primary
source for syncing fails during the "recover" process.
It has been possible to hit this condition before (like when converting from
2-way RAID1 to 3-way and having the first two devices die during the "recover"
process). However, this condition is now more likely since we treat linear ->
RAID1 conversions as "recover" now - so it is especially important we cleanly
handle this condition.
With recent updates for thin pool monitoring in version 169
we lost multiple WARNINGs to be printed in syslog, when
pool crossed 80%, 85%, 90%, 95%, 100%.
Restore this logic as we want to keep user informed more
then just once when 80% boundary is passed.
Previous commit 506d88a2ec introduced disabling lvmetad on repairs.
Avoid calling lvscan and use of any --config options altogether
in the mirror and raid DSOs.
Related: rhbz1380521
Commit 07ded8059c assumed that the mirror is blocked which is not the case.
It is accessible, degraded and in need of repair because some of its legs
(partially) failed. Any auto-repair via dmeventd fails though because
of lvmetad not providing proper data about the failed PV(s). That's why
this workaround got introduced in commit 76f6951c3e until we get to
the lvmetad interaction core issue.
Mind any mirror auto-repair failure is caused by such lvmetad interaction
problems not yet solved so disabling lvmetad works as a resort as elaborated
on in the related bz.
Reintroducing the interim solution.
Resolves: rhbz1380521
Effectively revert whole 76f6951c3e.
We need to figure out some other solution.
At this moment usage of --config with 'repair' of blocked mirror
is 'freezing' combination.
Automatic dmeventd repair of mirrors with active lvmetad configured
(mirror_image_fault_policy = "allocate") fails because the lvscan
run before the repair in the mirror DSO does not update the
lvmetad cache properly thus "lvconvert --repair ..." fails.
Need to scan the mirror LV before and after the repair
to have proper cache content after the repair finished.
The cache can't be relied on or the repair will fail.
Resolves: rhbz1380521
Some archs can use even 64K pages and then lvm2 runs into trouble if
the stack is 'too small' to fit extra page capturing stack overwrite.
So when lvm2 limits stack - add extra mem page - be it 4K or 64K.
Relates to ppc64le bug: https://bugzilla.redhat.com/1387279
For more advanced support we need to ensure better logic for calling
external much more advanced script for maintanance of thin-pool.
So this new code ensures:
When thin-pool data or metadata is bigger then 50%,
then with each 5% increment, action is called.
This is independent from autoextend_threshold.
This action always happens when thin-pool is over threshold,
(so no action when it's exactly i.e. 60%).
The only exception is 100% full thin-pool - which invokes 'last'
action.
Since thin-pool occupancy may change also downward, code needs
to also handle possibly reduction of occupancy of thin-pool.
So when usage drop from 90% to 50%, thin-pool will start to call
again action when it will pass 55% threshold.
This give external commands lot of option i.e. to call 'fstrim'
before actual resize is needed.
Default internal logic will stop trying to do any 'rescue' action
when executed command fails.
This will be now fully in hands of external script if such
behaviour is needed.
Instead of stopping monitoring after couple failing retries,
keep monitoring forever, just make larger delays between command
retries (ATM upto ~42 minutes).
So syslog is not spammed too often, yet commands have a chance to
be retried and succeed eventually...
When dmeventd configured command does not start with 'lvm ' prefix,
it's going to be an 'external' command.
In this case we split command by spaces to argv strings.
When thin-pool processes event and 'lvextend --use-policies' fails
rather capture up-to-date new info as the fullness percentage may
have jumped noticable. This way we could use 'more' correct numbers
when checking for thresholds.
Translate log_info() into log_very_verbose() which is macro
supposed to be used by our code.
log_info() is internal macro with eventually some 'symbolic' meaning
in syslogging daemons.
Ensure different logging function for dmeventd.c logging
and dm and lvm library.
We can recognize we want to show every log_info() and
log_notice() message from dmeventd.c code while not
exposing those from libdm/libdevmapper-event
Also switch to use log with errno - it's not changing
anything and doesn't bring any more features yet to dmeventd
logging but we just properly pass dm_errno_or_class properly
through the whole code stack for possible future use
(i.e. support of class logging for dmeventd).
Reword the logging logic and try to restore previous logging
behavior for 'standalone' running daemon while preserving
debuggable feautures it has gained.
So actual rules:
dmeventd without any '-d' option will syslog all messages
from dmeventd.c it dmeventd plugins.
log_notice()==log_verbose()
log_info()==log_very_verbose()
But to show also log_debug() used has to give '-ddd'.
When user specified '-d, -dd, -ddd, -dddd' it
will also enable tracing of messages from libdm & lib
executed code - which is mainly useful for testing
i.e.: 'dmeventd -fldddd'
Introduce macros:
log_level(), log_stderr(), log_once(), log_bypass_report()
For easier and more consisten way how to 'decoder' bits
of info from passed 'level'.
This patch fixes potential problem when 'level' of message
might not have always masked right bits.
Integrate back _unblock_sigalrm() and check for error code of
pthread_sigmask() function so we do not use uninitialized
sigmask_t on error path (Coverity).
The dm-raid target now rejects device rebuild requests during ongoing
resynchronization thus causing 'lvconvert --repair ...' to fail with
a kernel error message. This regresses with respect to failing automatic
repair via the dmeventd RAID plugin in case raid_fault_policy="allocate"
is configured in lvm.conf as well.
Previously allowing such repair request required cancelling the
resynchronization of any still accessible DataLVs, hence reasoning
potential data loss.
Patch allows the resynchronization of still accessible DataLVs to
finish up by rejecting any 'lvconvert --repair ...'.
It enhances the dmeventd RAID plugin to be able to automatically repair
by postponing the repair after synchronization ended.
More tests are added to lvconvert-rebuild-raid.sh to cover single
and multiple DataLV failure cases for the different RAID levels.
- resolves: rhbz1371717
Run umount code only when either thin data or metadata are
above 95% - so if there are resize failures with 60%.
system fill keep running.
Also umount will only be tried with lvm2 LVs.
Foreign users are ATM unsuppored.