11517 Commits

Author SHA1 Message Date
Sanoj Unnikrishnan
04ede2e163 Quota: heal directory on newly added bricks when quota limit is reached
Problem: if a lookup is done on a newly added brick for a path on which limit
has been reached, the lookup fails to heal the directory tree due to quota.

Solution: Tag the lookup as an internal fop and ignore it in quota.
Since marking internal fop does not usually give enough contextual information.
Introducing new flags to pass the contextual info.

Adding dict_check_flag and dict_set_flag to aid flag operations.
A flag is a single bit in a bit array (currently limited to 256 bits).

Change-Id: Ifb6a68bcaffedd425dd0f01f7db24edd5394c095
fixes: bz#1505355
BUG: 1505355
Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>
2018-03-28 04:07:12 +00:00
Poornima G
7d95a6ff71 quick-read: Provide statistics to the monitor
Updates: #425

Change-Id: Iea5198821f4eabc46bc63529afa4a92d4b4c2be0
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-28 03:35:55 +00:00
Sanju Rakonde
ab8d18945b glusterd: changing the op-version of volume stop mgmt v3
Change-Id: Iefc5a00d36436b23181871fa365f27b8d90cff0a
fixes: bz#1560441
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
2018-03-27 11:03:30 +05:30
Vishal Pandey
25fc639120 glfs heal binary change to accomodate socket file arguments
Change-Id: I755d6552decd015aec7859ad2cf99c76c8bee9dc
fixes: bz#1558380
BUG: 1558380
Signed-off-by: Vishal Pandey <vpandey@redhat.com>
2018-03-27 02:24:11 +00:00
Sanju Rakonde
9c047f4cae glusterd: Implementing volume stop in mgmt v3
Change-Id: I8f9c594cf56331d54eb4884335699744685ef20d
fixes: bz#1560441
Signed-off-by: Sanju Rakonde <srakonde@redhat.com>
2018-03-26 20:10:25 +05:30
Atin Mukherjee
a611679428 tests: fix nl-cache.t failure
commit fef9293 changed network.inode-lru-limit from 50000 to 200000 in
nl-cache group profile but the test wasn't changed to reflect it
accordingly.

Change-Id: Ibb5fb0a387f160f6b726246b161a9a7b33135755
fixes: bz#1560589
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
2018-03-26 19:28:25 +05:30
Amar Tumballi
de0e7efead .testignore: remove the group profile files
Change-Id: I6619827f1bf6fe9bd974537af6169164b19a0aa5
fixes: bz#1560393
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-26 18:31:21 +05:30
Niklas Hambüchen
6611aa8203 glusterfind: Log remote stderr on node_cmd error. Fixes #1559130
The problem of lost stderr was introduced in
commit feea851fad4f89b48bfe89fe3b75250cc7bd6501.

Change-Id: Ic98f9bc9682ae3bd9c3ebea3855667fc8ba2843d
BUG: 1559130
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2018-03-26 12:48:16 +00:00
Susant Palai
52c63dafd2 md-cache: fix ./tests/basic/md-cache/bug-1418249.t
inode table size is currently set to 200000. Hence the need of change in
testcase which was expecting the old value 50000.

Change-Id: I8e44b1d0a2da1e8100bebd25f48bb36e2897b4f8
fixes: bz#1560393
Signed-off-by: Susant Palai <spalai@redhat.com>
2018-03-26 09:47:01 +05:30
Poornima G
fef929342e extras/group: Change the server inode table size when upcall is on
By default server inode table size is 16K, when upcall is enabled,
there is going to be too many forgets sent on inodes as the brick can
hold only 16K inodes in memory, so we increased this to 50K. This is
still less than the client inode table size. We have seen performance
improvement when server inode table size is set to 200000(almost as
client inode table size). Hence changing the value to 200000.

Increasing this increases the memory consumption by <1MB.

BUG: 1559235
Change-Id: I931db965cd34bf33094328541bd5a633b3357805
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-24 05:38:47 +00:00
Poornima G
e043938e28 nl-cache: Provide statistics to the monitor
Updates: #429

Change-Id: Ic2e64422055f1838d5d453643c739ef1e9319cfe
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-24 05:38:26 +00:00
Poornima G
bf671adddf md-cache: Provide statistics to the monitor
Updates: #427

Change-Id: Ib1f45016ac75d7bc2755db0dd4b68ce1d95d26c3
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-24 05:38:00 +00:00
Sanoj Unnikrishnan
bc04046f0e features/quota: Add new fields to translator options for GD2
alert-time, soft timeout, hard timeout, default soft limit
and deem-statfs will be settable through volume set command.
hence marked as settable.
Other options are used only via quota commands.

Updates #302

Change-Id: I02d258cc3aa7fe58ccbadd59441cce64cfd9ba6e
Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>
2018-03-24 05:14:48 +00:00
James Le Cuirot
d978ff0e3a build: Fix misleading TIRPC result in configure summary
Requesting ipv6-default even if you explicitly disable libtirpc will
then implicitly enable libtirpc because that is required. That is fine
but the configure summary should not then show TIRPC as disabled when
it is not.

The result has also been made clearer by stating that TIRPC is
"missing" when it has been tried but not found.

BUG: 1553938
Change-Id: I945bd6859aaf3defa682b0d05ee34a9827b9c45f
Signed-off-by: James Le Cuirot <chewi@gentoo.org>
2018-03-24 05:14:05 +00:00
James Le Cuirot
d121b97f9a build: Fix configure --without-ipv6-default behaviour
The current behaviour disables ipv6-default when no switch is given at
all but otherwise checks if libtirpc was requested, regardless of
whether you have given --with-ipv6-default or --without-ipv6-default.

I believe the intention was to enable when libtirpc is requested by
default but otherwise respect the switch given.

This is important because ipv6-default breaks Gluster for systems that
have IPv6 disabled.

BUG: 1553926
Change-Id: I76b91ae2699574b2e5b777453732bb5cbd79bbca
Signed-off-by: James Le Cuirot <chewi@gentoo.org>
2018-03-24 05:14:05 +00:00
Niklas Hambüchen
0a08afbb9a libgfchangelog: Correct the log message
Provide correct error message for changelog end time check
Updated error message to print "wrong result for end".

Original patch by Keith Schincke <kschinck@redhat.com>
from https://review.gluster.org/#/c/8121/

Change-Id: Ia3458cbac7784bfc71c05da67391a3f8259f18f0
BUG: 1559126
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2018-03-24 05:11:02 +00:00
Niklas Hambüchen
0056feaa21 python: Remove all uses of find_library. Fixes #1450593
`find_library()` doesn't consider LD_LIBRARY_PATH on Python < 3.6.

Change-Id: Iee26085cb5d14061001f19f032c2664d69a378a8
BUG: 1450593
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2018-03-24 05:10:31 +00:00
Niklas Hambüchen
aaa4e373f3 glusterfind: Show C function when raising ChangelogException
Fixes: #432

Change-Id: I9ab031e098aff717e619d9deb6410281b96de14a
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2018-03-24 05:09:37 +00:00
Amar Tumballi
a69e8a664a rfc.sh: provide a unified way to update bugs or github issues ID
Change-Id: Ie78d87b31512da6201ae26f3d391fa3f8e5b68d1
fixes: bz#1545891
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-22 15:01:38 +00:00
Csaba Henk
5b46d55660 client: make fuse direct I/O strategies explicit
So far the --direct-io-mode option has been presented
as of being Boolean valued. That is however not exact,
as a third behavior is chosen if the option is not
specified.

We accept now the "auto" value as an explicit choice
for the default heuristics, and indicate in the
descriptions of the option (which occur in commandline
help and in the gluterfs / mount.glusterfs man pages)
that auto is the default.

The default heuristics was briefly described in the
commandline help. We are getting rid of that, because:
- it's not the right place to provide such details;
- there is no guarantee of keeping the current heuristics
  so it might go out of sync with reality;
- that is already the case to some degree, because the
  description did not take into account that the default
  heuristics varies between platforms (on Mac, it's just
  "off"), and that xlators can also prescribe direct I/O
  for the file of their choice (see change
  I3fe3312cd96baa4eecfe1247ab7255b4f455f049).

Change-Id: Ia83479c0c67fe66b7fc2e0e8db5b7792d9f44b28
Signed-off-by: Csaba Henk <csaba@redhat.com>
2018-03-22 04:27:11 +00:00
Milind Changire
286871f550 rpcsvc: enable ownthread feature for glusterfs4_0_fop_prog
Ownthread feature needs enabling for glusterfs4_0_fop_prog

Change-Id: Idce63eb094ae0fdfcddbd52d0dee25aa0e074926
BUG: 1559075
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-03-22 02:49:34 +00:00
Niklas Hambüchen
ba87963b76 socket: Improve error logging when loading SSL files fails
* Say which file had the problem
* Dump openssl error stack

Fixes gluster/glusterfs#431.

Change-Id: I66e9a0ae7758e9d7d8a5f19cc8ff898f01f2b491
Signed-off-by: Niklas Hambüchen <mail@nh2.me>
2018-03-21 18:44:53 +01:00
Xavi Hernandez
b5f307fa5e cluster/ec: fix SHD crash for null gfid's
When the self-heal daemon is doing a full sweep it uses readdirp to
get extra stat information from each file. This information is
obtained in two steps by the posix xlator: first the directory is
read to get the entries and then each entry is stated to get additional
info. Between these two steps, it's possible that the file is removed
by the user, so we'll get an error, leaving stat info empty.

EC's heal daemon was using the gfid blindly, causing an assert failure
when protocol/client was trying to encode the gfid.

To fix the problem a check has been added. If we detect a null gfid, we
simply ignore it and continue healing.

Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228
BUG: 1558016
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2018-03-21 16:27:36 +00:00
Pranith Kumar K
448dec703d cluster/afr: Switch to active-fd-count for open-fd checks
BUG: 1557932
Change-Id: I3783e41b3812267bc10c0d05d062a31396ce135b
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-21 10:36:32 +05:30
Pranith Kumar K
2da6650dfa storage/posix: Add active-fd-count option in gluster
Problem:
when dd happens on sharded replicate volume all the writes on shards happen
through anon-fd. When the writes don't come quick enough, old anon-fd closes
and new fd gets created to serve the new writes. open-fd-count is decremented
only after the fd is closed as part of fd_destroy(). So even when one fd is on
the way to be closed a new fd will be created and during this short period it
appears as though there are multiple fds opened on the file. AFR thinks another
application opened the same file and switches off eager-lock leading to
extra latency.

Fix:
Have a different option called active-fd whose life cycle starts at
fd_bind() and ends just before fd_destroy()

BUG: 1557932
Change-Id: I2e221f6030feeedf29fbb3bd6554673b8a5b9c94
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-21 10:36:31 +05:30
Ashish Pandey
ade6262cb6 cluster/ec: Add test cases for stripe-cache option
Change-Id: I1508a336a7a927b389a19815ef57001cdf29b109
BUG: 1558074
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
2018-03-20 19:07:15 +00:00
Pranith Kumar K
2a326ad32e features/shard: Do list_del_init() while list memory is valid
Problem:
shard_post_lookup_fsync_handler() goes over the list of inode-ctx that need to
be fsynced and in cbk it removes each of the inode-ctx from the list. When the
first member of list is removed it tries to modifies list head's memory with
the latest next/prev and when this happens, there is no guarantee that the
list-head which is from stack memory of shard_post_lookup_fsync_handler() is
valid.

Fix:
Do list_del_init() in the loop before winding fsync.

BUG: 1557876
Change-Id: If429d3634219e1a435bd0da0ed985c646c59c2ca
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-20 08:57:37 +00:00
Sunny Kumar
ccd7825334 georep : Pause/Resume of geo-replication with wrong user
While performing pause/resume on geo-replication with wrong user
(other user then you setup), always returns success. Which further
leads to snapshot creation failure as it is detecting active
geo-replication session.

Change-Id: I6e96e8dd3e861348b057475387f0093cb903ae88
BUG: 1550936
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
2018-03-20 07:27:16 +00:00
Mohit Agrawal
cf06dd5440 glusterd: TLS verification fails while using intermediate CA
Problem: TLS verification fails while using intermediate CA
         if mgmt SSL is enabled.

Solution: There are two main issue of TLS verification failing
          1) not calling ssl_api to set cert_depth
          2) The current code does not allow to set certificate depth
             while MGMT SSL is enabled.
          After apply this patch to set certificate depth user
          need to set parameter option transport.socket.ssl-cert-depth <depth>
          in /var/lib/glusterd/secure_acccess instead to set in
          /etc/glusterfs/glusterd.vol. At the time of set secure_mgmt in ctx
          we will check the value of cert-depth and save the value of cert-depth
          in ctx.If user does not provide any value in cert-depth in that case
          it will consider default value is 1

BUG: 1555154
Change-Id: I89e9a9e1026e37efb5c20f9ec62b1989ef644f35
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2018-03-19 19:00:03 +00:00
Sven Fischer
de52876407 cleanup: xlator_t structure's 'client_latency' variable is not used
- Removed unused struct member and its one time usage.
  - cleaned up wrong white space

member 'client_latency' was not used otherwise since it was added by

commit 07cc8679cdf3b29680f4f105d0222da168d8bfc1
Author: Kevin Vigor <kvigor@fb.com>
Date:   Tue Mar 21 08:23:25 2017 -0700

    Halo Replication feature for AFR translator

Change-Id: Ibb0ea828d4090bbe8897f6af326b317884162a00
BUG: 1495153
Signed-off-by: Sven Fischer <sven@fischer-abc.de>
2018-03-19 03:30:31 +00:00
Gaurav Yadav
97233b3f69 glusterd: glusterd crash in gd_mgmt_v3_unlock_timer_cbk
Memory cleanup of same pointer twice inside gd_mgmt_v3_unlock_timer_cbk
causing glusterd to crash.

Change-Id: I9147241d995780619474047b1010317a89b9965a
BUG: 1550339
2018-03-15 10:30:56 +05:30
Pranith Kumar K
346714305f cluster/afr: Make AFR eager-locking similar to EC
Problem:
1) Afr's eager-lock only works for data transactions.
2) When there are conflicting writes, write with conflicting region initiates
unlock of eager-lock leading to extra pre-ops and post-ops on the file. When
eager-lock goes off, it leads to extra fsyncs for random-write workload in afr.

Solution (that is modeled after EC):
In EC, when there is a conflicting write, it waits for the current write to
complete before it winds the conflicted write. This leads to better utilization
of network and disk, because we will not be doing extra xattrops and FSYNCs and
inodelk/unlock. Moved fd based counters to inode based counters.

I tried to model the solution based on EC's locking, but it is not similar to
AFR because we had to keep backward compatibility.

Lifecycle of lock:
==================
First transaction is added to inode->owners list and an inodelk will be sent on
the wire. All the next transactions will be put in inode->waiters list until
the first transaction completes inodelk and [f]xattrop completely.  Once
[f]xattrop also completes, all the requests in the inode->waiters list are
checked if it conflict with any of the existing locks which are in
inode->owners list and if not are added to inode->owners list and resumed with
doing transaction. When these transactions complete fop phase they will be
moved to inode->post_op list and resume the transactions that were paused
because of conflicts. Post-op and unlock will not be issued on the wire until
that is the last transaction on that inode. Last transaction when it has to
perform post-op can choose to sleep for deyed-post-op-secs value. During that
time if any other transaction comes, it will wake up the sleeping transaction
and takes over the ownership of the lock and the cycle continues. If the
dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping
transaction and it will set lock->release to true and starts doing post-op and
then unlock. During this time if any other transactions come, they will be put
in inode->frozen list. Once the previous unlock comes it will move the frozen
list to waiters list and moves the first element from this waiters-list to
owners-list and attempts the lock and the cycle continues. This is the general
idea.  There is logic at the time of dealying and at the time of new
transaction or in flush fop to wakeup existing sleeping transactions or
choosing whether to delay a transaction etc, which is subjected to change based
on future enhancements etc.

Fixes: #418
BUG: 1549606
Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-14 13:32:35 +00:00
Ashish Pandey
f32f85c4e6 cluster/ec: Change default read policy to gfid-hash
Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1554743
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
2018-03-14 11:52:05 +05:30
Varsha Rao
a4e34af38d tests/basic/namespace: Fix the namespace test failure
In the jenkins regression test brick multiplexing is enabled by
is_brick_mx_enabled function and not by setting cluster.brick-multiplex
option. Hence check the count of bricks and its logs, this fixes the
failure.

Change-Id: Ibb2ed8fbffd3765f283da741689304a5579d447c
BUG: 1555167
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-03-14 10:53:20 +05:30
Xavi Hernandez
7f81067f45 cluster/ec: avoid delays in self-heal
Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1547662
Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
2018-03-14 03:12:27 +00:00
Raghavendra G
fe52fc33d0 tests/bug-1110262.t: fix a race condition
This test does:

1. mount a volume
2. kill a brick in the volume
3. mkdir (/somedir)

In my local tests and in [1], I see that mkdir in step 3 fails because
there is no dht-layout on root directory.

The reason I think is by the time first lookup on "/" hit dht, a brick
was killed as per step 2. This means layout was not healed for "/" and
since this is a new volume, no layout is present on it. Note that the
first lookup done on "/" by fuse-bridge is not synchronized with
parent process of daemonized glusterfs mount completing. IOW, by the
time glusterfs cmd executed there is no guarantee that lookup on "/"
is complete. So, if step 2 races ahead of fuse_first_lookup on "/", we
end up with an invalid dht-layout on "/" resulting in failures.

Doint an operation like ls makes sure that lookup on "/" is completed
before we kill a brick

Change-Id: Ie0c4e442c4c629fad6f7ae850437e3d63fe4bea9
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1543279
2018-03-13 13:01:49 +00:00
Sven Fischer
c64fa14965 run-tests.sh: added dependency check for netstat
Because bug-924726.t depends on netstat, tests failed before. This got resolved
by adding respective check to run-tests.sh.

Enabled respective test again.

Change-Id: I70c9bff03379ed9ee8cd95842c3501dfb50b8e86
BUG: 1312830
Signed-off-by: Sven Fischer <sven@fischer-abc.de>
2018-03-12 23:19:24 +01:00
N Balachandran
a96c7e748f cluster/dht: Skipped files are not treated as errors
For skipped files, use a return value of 1 to prevent
error messages being logged.

Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1553598
Signed-off-by: N Balachandran <nbalacha@redhat.com>
2018-03-12 10:15:52 +00:00
Milind Changire
0c3d984287 rpcsvc: correct event-thread scaling
Problem:
Auto thread count derived from the number of attachs and detachs
was reset to 1 when server_reconfigure() was called.

Solution:
Avoid auto-thread-count reset to 1.

Change-Id: Ic00e86adb81ba3c828e354a6ccb638209ae58b3e
BUG: 1547888
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-03-12 08:48:42 +00:00
ShyamsundarR
ece3f0f669 protocol: Fix 4.0 client, parsing older iatt in dict
In a mixed mode cluster involving 4.0 and older 3.x bricks, if
clients are newer, then the iatt encoded in the dictionary can be
of the older iatt format, which a newer client will map incorrectly
to the newer structure.

This causes failures in FOPs that depend on this iatt for some
functionality (seen in mkdir operations failing as EIO, when DHT
hits its internal setxattr call).

The fix provided is to convert the iatt in the dict, based on which
RPC version is used to communicate with the server.

IOW, this is the reverse of change in commit "b966c7790e"

Tested using a mixed mode cluster (i.e bricks in 3.12 and 4.0 versions)
and a mixed set of clients, 3.12 and 4.0 clients.

There is no regression test provided, as this needs a mixed mode cluster
to test and validate.

Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
BUG: 1554053
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-03-10 23:12:48 -05:00
ShyamsundarR
b966c7790e protocol: Added iatt conversion to older format
Added iatt conversion to an older format, when dealing with
older RPC versions. This enables iatt structure conformance
when dealing with older clients.

This helps fix rolling upgrade from 3.x versions to 4.0 version
of gluster by sending the right iatt in the dictionary when DHT
requests the same.

Change-Id: Ieaf925f81f8c7798a8fba1e90a59fa9dec82856c
BUG: 1544699
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-03-10 18:08:53 +00:00
Xavi Hernandez
157e55fe43 protocol/client: fix memory corruption
There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).

Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.

Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1553129
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2018-03-09 23:31:29 +01:00
Amar Tumballi
940f870f47 core: provide infra to make any xlator pass-through
updates: #304

Change-Id: If6a13d2e56b195390a386d720103a882e077f66c
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-09 18:32:56 +00:00
Amar Tumballi
b2613c9eed tests: don't kill the process directly with KILL signal
Instead send the SIGTERM (default, 15) first, and at the end
send SIGKILL. If SIGKILL is sent directly, we miss many tests
like valgrind, lcov etc., not able to process the information
properly.

BUG: 1549000
Change-Id: I664de12ee7dbf47eb98b8141004cd51f6006b314
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-08 11:15:01 +01:00
Atin Mukherjee
2a1adc5c93 hooks: fix workdir in S13create-subdir-mounts.sh
Change-Id: Id3eff498091ad9fa4651e93b66903426e76776d6
BUG: 1549915
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
2018-03-07 07:39:28 +00:00
Ravishankar N
bd2c45fe31 glusterd: volume get fixes for client-io-threads & quorum-type
1. If a replica volume created on glusterfs-3.8 was upgraded to
glusterfs-3.12, `gluster vol get volname client-io-threads` displayed
'on' even though it wasn't and the xlator wasn't loaded on
the client-graph. This was due to removing certain checks in
glusterd_get_default_val_for_volopt as a part of commit
47604fad4c2a3951077e41e0c007ceb979bb2c24. Fix it.

2. Also, as a part of op-version bump-up, client-io-threads was being
loaded on the clients  during volfile regeneration. Prevent it.

3. AFR assumes quorum-type to be auto in newly created replic 3 (odd
replica in general) volumes but `gluster vol get quorum-type` displays
'none'. Fix it.

Change-Id: I19e586361ed1065c70fb378533d3b4dac1095df9
BUG: 1545056
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2018-03-07 04:48:09 +00:00
Amar Tumballi
685d4409f9 hooks: add a script to stat the subdirs in add-brick
The subdirectories are expected to be present for a subdir
mount to be successful. If not, the client_handshake()
itself fails to succeed. When a volume is about to get
mounted first time, this is easier to handle, as if the
directory is not present in one brick, then its mostly
not present in any other brick. In case of add-brick,
the directory is not present in new brick, and there is
no chance of healing it from the subdirectory mount, as
in those clients, the subdir itself will be 'root' ('/')
of the filesystem. Hence we need a volume mount to heal
the directory before connections can succeed.

This patch does take care of that by healing the directories
which are expected to be mounted as subdirectories from the
volume level mount point.

Change-Id: I2c2ac7b7567fe209aaa720006d09b68584d0dd14
BUG: 1549915
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-06 14:41:44 +00:00
Pranith Kumar K
51d3490798 cluster/afr: Remove unused code paths
Removed
1) afr-v1 self-heal locks related code which is not used anymore
2) transaction has some data types that are not needed, so removed them
3) Never used lock tracing available in afr as gluster's network tracing does
the job. So removed that as well.
4) Changelog is always enabled and afr is always used with locks, so
__changelog_enabled, afr_lock_server_count etc functions can be deleted.
5) transaction.fop/done/resume always call the same functions, so no need
to have these variables.

BUG: 1549606
Change-Id: I370c146fec2892d40e674d232a5d7256e003c7f1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-06 08:49:31 +00:00
Gaurav Yadav
4511b45bf4 glusterd : memory leak in mgmt_v3 lock functionality
In order to take care of stale lock issue, a timer was intrduced
in mgmt_v3 lock. This timer is not freeing the memory due to
which this leak got introduced

With this fix now memory cleanup in locking is handled properly

Change-Id: I2e1ce3ebba3520f7660321f3d97554080e4e22f4
BUG: 1550339
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
2018-03-06 08:05:24 +00:00
Pranith Kumar K
9be043159a cluster/afr: Remove compound-fops usage in afr
We are not seeing much improvement with this change. So removing the
feature so that it doesn't need to be maintained anymore.

Fixes: #414
Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-06 07:55:25 +00:00