glusterd: gluster v status is showing wrong status for glustershd

When we restart the bricks, connect and disconnect events happen
for glustershd. glusterd use two threads to handle disconnect and
connects events from glustershd. When we restart the bricks we'll
get both disconnect and connect events. So both the threads will
compete for the big lock.

We want disconnect event to finish before connect event. But If
connect thread gets the big lock first, it sets svc->online to
true, and then disconnect thread will et svc->online to false.
So, glustershd will be disconnected from glusterd and wrong status
is shown.

After killing shd, glusterd sleeps for 1 second. To avoid the problem,
If glusterd releses the lock before sleep and acquires it after sleep,
disconnect thread will get a chance to handle the
glusterd_svc_common_rpc_notify before other thread completes connect
event.

Change-Id: Ie82e823fdfc936feb7c0ae10599297b050ee9986
fixes: bz#1585391
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
This commit is contained in:
Sanju Rakonde 2018-06-02 16:36:22 +05:30 committed by Atin Mukherjee
parent 07ede8d443
commit fe71ee74fd

View File

@ -12,6 +12,7 @@
#include <limits.h>
#include <signal.h>
#include "glusterd.h"
#include "common-utils.h"
#include "xlator.h"
#include "logging.h"
@ -69,13 +70,17 @@ glusterd_proc_stop (glusterd_proc_t *proc, int sig, int flags)
/* NB: Copy-paste code from glusterd_service_stop, the source may be
* removed once all daemon management use proc */
int32_t ret = -1;
pid_t pid = -1;
xlator_t *this = NULL;
int32_t ret = -1;
pid_t pid = -1;
xlator_t *this = NULL;
glusterd_conf_t *conf = NULL;
this = THIS;
GF_ASSERT (this);
conf = this->private;
GF_ASSERT (conf);
if (!gf_is_service_running (proc->pidfile, &pid)) {
ret = 0;
gf_msg (this->name, GF_LOG_INFO, 0,
@ -104,7 +109,9 @@ glusterd_proc_stop (glusterd_proc_t *proc, int sig, int flags)
if (flags != PROC_STOP_FORCE)
goto out;
synclock_unlock (&conf->big_lock);
sleep (1);
synclock_lock (&conf->big_lock);
if (gf_is_service_running (proc->pidfile, &pid)) {
ret = kill (pid, SIGKILL);
if (ret) {