IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In HA cluster, we have "clvm" resource agent to manage clvmd daemon.
The agent invokes clvmd like: "clvmd -T90 -d0", which always prints
a scaring error message:
"""
local socket: connect failed: No such file or directory
"""
When specifed with "-d" option, clvmd tries to check if an instance
of the clvmd daemon is already running through a testing connection.
The connect() will fail with this ENOENT error in such case, so supress
the error message in such case.
TODO: add missing error reaction code - since ofter log_error, program
is not supposed to continue running (log_error() is for reporting
stopping problems).
Signed-off-by: Eric Ren <zren@suse.com>
ATM we have several instances of daemonizing code.
Each has its 'special' logic so not completely easy
to unify them all into a single routine.
Start to unify them and use one strategy for rediricting
all input/outpus to /dev/null - use 'dup2' function for this
and open /dev/null before fork to make sure it's available.
Avoid reading already released memory and do a continue directly.
Invalid read of size 1
at 0x1201B0: main_loop (clvmd.c:931)
by 0x11F640: main (clvmd.c:666)
Address 0x72ddef0 is 32 bytes inside a block of size 224 free'd
at 0x4C30D18: free (vg_replace_malloc.c:530)
by 0x54D6FD1: dm_free_wrapper (dbg_malloc.c:357)
by 0x122E6E: process_work_item (clvmd.c:2034)
by 0x123003: lvm_thread_fn (clvmd.c:2085)
by 0x590A3A8: start_thread (pthread_create.c:465)
by 0x5C3C7FE: clone (in /usr/lib64/libc-2.25.90.so)
Block was alloc'd at
at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
by 0x54D6EF1: dm_malloc_aux (dbg_malloc.c:286)
by 0x54D6F1C: dm_zalloc_aux (dbg_malloc.c:291)
by 0x54D6F96: dm_zalloc_wrapper (dbg_malloc.c:345)
by 0x11F89C: local_rendezvous_callback (clvmd.c:731)
by 0x1203D2: main_loop (clvmd.c:964)
by 0x11F640: main (clvmd.c:666)
Initialize mutex upfront any debugging and fix this report:
Mutex reinitialization: mutex 0x485d20, recursion count 0, owner 1.
at 0x4C38480: pthread_mutex_init_intercept (drd_pthread_intercepts.c:821)
by 0x4C38480: pthread_mutex_init (drd_pthread_intercepts.c:830)
by 0x11F359: main (clvmd.c:562)
mutex 0x485d20 was first observed at:
at 0x4C38F63: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:885)
by 0x4C38F63: pthread_mutex_lock (drd_pthread_intercepts.c:898)
by 0x11E920: debuglog (clvmd.c:254)
by 0x11F1D8: main (clvmd.c:527)
Centralise editing of the client list into _add_client() and
_del_client(). Introduce _local_client_count to track the size of the
list for debugging purposes. Simplify and standardise the various ways
the list gets walked.
While processing one element of the list in main_loop(),
cleanup_zombie() may be called and remove a different element, so make
sure main_loop() refreshes its list state on return. Prior to this
patch, the list edits for clients disappearing could race against the
list edits for new clients connecting and corrupt the list and cause a
variety of segfaults.
An easy way to trigger such failures was by repeatedly running shell
commands such as:
lvs &; lvs &; lvs &;...;killall -9 lvs; lvs &; lvs &;...
Situations that occasionally lead to the failures can be spotted by
looking for 'EOF' with 'inprogress=1' in the clvmd debug logs.
Some archs can use even 64K pages and then lvm2 runs into trouble if
the stack is 'too small' to fit extra page capturing stack overwrite.
So when lvm2 limits stack - add extra mem page - be it 4K or 64K.
Relates to ppc64le bug: https://bugzilla.redhat.com/1387279
Simply running concurrent copies of 'pvscan | true' is enough to make
clvmd freeze: pvscan exits on the EPIPE without first releasing the
global lock.
clvmd notices the client disappear but because the cleanup code that
releases the locks is triggered from within some processing after the
next select() returns, and that processing can 'break' after doing just
one action, it sometimes never releases the locks to other clients.
Move the cleanup code before the select.
Check all fds after select().
Improve some debug messages and warn in the unlikely event that
select() capacity could soon be exceeded.
- closer to the recommendation of man-pages (7) if possible
- Add crossrefs
- Sort options and crossrefs
- Fix default timeout (60 secs) of -t
- Documents -I[auto]
Signed-off-by: Stéphane Aulery <saulery@free.fr>
LVM2.2.02.112/daemons/clvmd/clvmd.c:1131: warning[arrayIndexOutOfBoundsCond]: Array 'row[8]' accessed at index 8, which is out of bounds. Otherwise condition 'j==8' is redundant.
This code:
int i,j = 0;
...
for (i = 0; i < len; ++i) {
...
if ((j == 8) || (i + 1 == len)) {
for (;j < 8; ++j) {
...
}
...
j = 0;
}
}
Indeed - j is 0 at the beginning, then iterating till j < 8,
then always zeroed at the end of the outer loop - so "j" never
reaching value of 8 - the j == 8 condition is redundant.
Prior adding new reply to the list, check
if the reply thread is not already finished.
In that case discard adding message
(which would otherwise be leaked).
Use mutex to access localsock values, so check
num_replies when the thread is not yet finished.
Check for threadid prior the mutex taking
(though this check is probably not really needed)
Added complexity with extra reply mutex is not worth the troubles.
The only place which may slightly benefit from this mutex is timeout
and since this is rather error case - let's convert it to
localsock.mutex and keep it simple.
Move the pthread mutex and condition creation and destroy
to correct place right after client memory is allocatedd
or is going to be released.
In the original place it's been in race with lvm thread
which could have still unlock mutex while it's been already
destroyed.
When TEST_MODE flag is passed around the cluster,
it's been use in thread unprotected way, so it may have
influenced behaviour of other running parallel lvm commands
(activation/deactivation/suspend/resume).
Fix it by set/query function only under lvm mutex.
For hold_un/lock function calls check lock_flags bits directly.
Operate with lvm_thread_exit while holding lvm_thread_mutex.
Don't leave unfinished work in the lvm thread queue
and always finish all queued tasks before exit,
so no cmd struct is left in the list.
(in-release fix)
We have to close cluster in some predicatable way,
otherwise we may access released memory from different
threads.
So move closedown till the point we know all thread
are closed. New messages from cluster are discarded.
When multiple threads act on the same 'quit' variable
the order of exit becomes unpredictable.
So let the main_loop() finish first and then clean up
all queued lvm jobs.
Do not add any new work, when lvm_thread_exit is set.
Properly clean 'client' structure only for LOCAL_SOCK type.
(Fixes bug from commit 460c19df62)
(in release fix)
Also cleanup-up associated pthreads by using cleanup_zombie() function.
Since this function may change the list, restart scanning always from
the list header.
Note: couple following patches are necessary to make this working properly.