glusterfs/tests/bugs/bug-874498.t

65 lines
1.6 KiB
Perl
Raw Normal View History

#!/bin/bash
. $(dirname $0)/../include.rc
. $(dirname $0)/../volume.rc
. $(dirname $0)/../afr.rc
cleanup;
TEST glusterd
TEST pidof glusterd
TEST $CLI volume info;
TEST $CLI volume create $V0 replica 2 $H0:$B0/brick1 $H0:$B0/brick2;
TEST $CLI volume start $V0;
TEST glusterfs --volfile-server=$H0 --volfile-id=$V0 $M0;
B0_hiphenated=`echo $B0 | tr '/' '-'`
kill -9 `cat /var/lib/glusterd/vols/$V0/run/$H0$B0_hiphenated-brick1.pid` ;
echo "GLUSTER FILE SYSTEM" > $M0/FILE1
echo "GLUSTER FILE SYSTEM" > $M0/FILE2
FILEN=$B0"/brick2"
XATTROP=$FILEN/.glusterfs/indices/xattrop
function get_gfid()
{
path_of_file=$1
gfid_value=`getfattr -d -m . $path_of_file -e hex 2>/dev/null | grep trusted.gfid | cut --complement -c -15 | sed 's/\([a-f0-9]\{8\}\)\([a-f0-9]\{4\}\)\([a-f0-9]\{4\}\)\([a-f0-9]\{4\}\)/\1-\2-\3-\4-/'`
echo $gfid_value
}
GFID_ROOT=`get_gfid $B0/brick2`
GFID_FILE1=`get_gfid $B0/brick2/FILE1`
GFID_FILE2=`get_gfid $B0/brick2/FILE2`
count=0
for i in `ls $XATTROP`
do
if [ "$i" == "$GFID_ROOT" ] || [ "$i" == "$GFID_FILE1" ] || [ "$i" == "$GFID_FILE2" ]
then
count=$(( count + 1 ))
fi
done
EXPECT "3" echo $count
TEST $CLI volume start $V0 force
EXPECT_WITHIN $PROCESS_UP_TIMEOUT "Y" glustershd_up_status
EXPECT_WITHIN $CHILD_UP_TIMEOUT "1" afr_child_up_status_in_shd $V0 0
EXPECT_WITHIN $CHILD_UP_TIMEOUT "1" afr_child_up_status_in_shd $V0 1
tests: fix further issues with bug-874498.t The failure of bug-874498.t seems to be a "bug" in glustershd. The situation seems to be when both subvolumes of a replica are "local" to glustershd, and in such cases glustershd is sensitive to the order in which the subvols come up. The core of the issue itself is that, without the patch (#4784), self-heal daemon completes the processing of index and no entries are left inside the xattrop index after a few seconds of volume start force. However with the patch, the stale "backing file" (against which index performs link()) is left. The likely reason is that an "INDEX" based crawl is not happening against the subvol when this patch is applied. Before #4784 patch, the order in which subvols came up was : [2013-04-09 22:55:35.117679] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49156, attached to remote volume '/d/backends/brick1'. ... [2013-04-09 22:55:35.118399] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49157, attached to remote volume '/d/backends/brick2'. However, with the patch, the order is reversed: [2013-04-09 22:53:34.945370] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-1: Connected to 10.3.129.13:49153, attached to remote volume '/d/backends/brick2'. ... [2013-04-09 22:53:34.950966] I [client-handshake.c:1456:client_setvolume_cbk] 0-patchy-client-0: Connected to 10.3.129.13:49152, attached to remote volume '/d/backends/brick1'. The index in brick2 has the list of files/gfid to heal. It appears to be the case that when brick1 is the first subvol to be detected as coming up, somehow an INDEX based crawl is clearing all the index entries in brick2, but if brick2 comes up as the first subvol, then the backing file is left stale. Also, doing a "gluster volume heal full" seems to leave out stale backing files too. As the crawl is performed on the namespace and the backing file is never encountered there to get cleared out. So the interim (possibly permanent) fix is to have the script issue a regular self-heal command (and not a "full" one). The failure of the script itself is non-critical. The data files are all healed, and it is just the backing file which is left behind. The stale backing file too gets cleared in the next index based healing, either triggered manually or after 10mins. Change-Id: I5deb79652ef449b7e88684311e804a8a2aa4725d BUG: 874498 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4798 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
2013-04-09 17:22:01 -07:00
TEST $CLI volume heal $V0
##Expected number of entries are 0 in the .glusterfs/indices/xattrop directory
EXPECT_WITHIN $HEAL_TIMEOUT '1' count_sh_entries $FILEN;
TEST $CLI volume stop $V0;
TEST $CLI volume delete $V0;
cleanup;