Fix spurious brick disconnects

Spurious disconnect were caused by a race condition inside
rpc_transport_ref()/rpc_transport_unref() that allowed the refcount
to drop to zero while the transport was still in use. The race
condition is made possible because of an uninitiaized mutex
produced when socket_server_event_handler() copies the transport

BUG: 764655
Change-Id: I34fe097a0ac21b0dbf58f5eed84880e3fd9814f2
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/4900
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
This commit is contained in:
Emmanuel Dreyfus 2013-04-29 17:05:03 +02:00 committed by Anand Avati
parent fc8aa43d46
commit ddad856d37

View File

@ -2474,6 +2474,15 @@ socket_server_event_handler (int fd, int idx, void *data,
if (!new_trans)
goto unlock;
ret = pthread_mutex_init(&new_trans->lock, NULL);
if (ret == -1) {
gf_log (this->name, GF_LOG_WARNING,
"pthread_mutex_init() failed: %s",
strerror (errno));
close (new_sock);
goto unlock;
}
new_trans->name = gf_strdup (this->name);
memcpy (&new_trans->peerinfo.sockaddr, &new_sockaddr,