11686 Commits

Author SHA1 Message Date
Pranith Kumar K
346714305f cluster/afr: Make AFR eager-locking similar to EC
Problem:
1) Afr's eager-lock only works for data transactions.
2) When there are conflicting writes, write with conflicting region initiates
unlock of eager-lock leading to extra pre-ops and post-ops on the file. When
eager-lock goes off, it leads to extra fsyncs for random-write workload in afr.

Solution (that is modeled after EC):
In EC, when there is a conflicting write, it waits for the current write to
complete before it winds the conflicted write. This leads to better utilization
of network and disk, because we will not be doing extra xattrops and FSYNCs and
inodelk/unlock. Moved fd based counters to inode based counters.

I tried to model the solution based on EC's locking, but it is not similar to
AFR because we had to keep backward compatibility.

Lifecycle of lock:
==================
First transaction is added to inode->owners list and an inodelk will be sent on
the wire. All the next transactions will be put in inode->waiters list until
the first transaction completes inodelk and [f]xattrop completely.  Once
[f]xattrop also completes, all the requests in the inode->waiters list are
checked if it conflict with any of the existing locks which are in
inode->owners list and if not are added to inode->owners list and resumed with
doing transaction. When these transactions complete fop phase they will be
moved to inode->post_op list and resume the transactions that were paused
because of conflicts. Post-op and unlock will not be issued on the wire until
that is the last transaction on that inode. Last transaction when it has to
perform post-op can choose to sleep for deyed-post-op-secs value. During that
time if any other transaction comes, it will wake up the sleeping transaction
and takes over the ownership of the lock and the cycle continues. If the
dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping
transaction and it will set lock->release to true and starts doing post-op and
then unlock. During this time if any other transactions come, they will be put
in inode->frozen list. Once the previous unlock comes it will move the frozen
list to waiters list and moves the first element from this waiters-list to
owners-list and attempts the lock and the cycle continues. This is the general
idea.  There is logic at the time of dealying and at the time of new
transaction or in flush fop to wakeup existing sleeping transactions or
choosing whether to delay a transaction etc, which is subjected to change based
on future enhancements etc.

Fixes: #418
BUG: 1549606
Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-14 13:32:35 +00:00
Ashish Pandey
f32f85c4e6 cluster/ec: Change default read policy to gfid-hash
Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.

Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.

In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.

Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.

Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.

Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1554743
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
2018-03-14 11:52:05 +05:30
Varsha Rao
a4e34af38d tests/basic/namespace: Fix the namespace test failure
In the jenkins regression test brick multiplexing is enabled by
is_brick_mx_enabled function and not by setting cluster.brick-multiplex
option. Hence check the count of bricks and its logs, this fixes the
failure.

Change-Id: Ibb2ed8fbffd3765f283da741689304a5579d447c
BUG: 1555167
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-03-14 10:53:20 +05:30
Xavi Hernandez
7f81067f45 cluster/ec: avoid delays in self-heal
Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.

When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.

This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.

Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.

This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.

Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1547662
Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
2018-03-14 03:12:27 +00:00
Raghavendra G
fe52fc33d0 tests/bug-1110262.t: fix a race condition
This test does:

1. mount a volume
2. kill a brick in the volume
3. mkdir (/somedir)

In my local tests and in [1], I see that mkdir in step 3 fails because
there is no dht-layout on root directory.

The reason I think is by the time first lookup on "/" hit dht, a brick
was killed as per step 2. This means layout was not healed for "/" and
since this is a new volume, no layout is present on it. Note that the
first lookup done on "/" by fuse-bridge is not synchronized with
parent process of daemonized glusterfs mount completing. IOW, by the
time glusterfs cmd executed there is no guarantee that lookup on "/"
is complete. So, if step 2 races ahead of fuse_first_lookup on "/", we
end up with an invalid dht-layout on "/" resulting in failures.

Doint an operation like ls makes sure that lookup on "/" is completed
before we kill a brick

Change-Id: Ie0c4e442c4c629fad6f7ae850437e3d63fe4bea9
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1543279
2018-03-13 13:01:49 +00:00
Sven Fischer
c64fa14965 run-tests.sh: added dependency check for netstat
Because bug-924726.t depends on netstat, tests failed before. This got resolved
by adding respective check to run-tests.sh.

Enabled respective test again.

Change-Id: I70c9bff03379ed9ee8cd95842c3501dfb50b8e86
BUG: 1312830
Signed-off-by: Sven Fischer <sven@fischer-abc.de>
2018-03-12 23:19:24 +01:00
N Balachandran
a96c7e748f cluster/dht: Skipped files are not treated as errors
For skipped files, use a return value of 1 to prevent
error messages being logged.

Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1553598
Signed-off-by: N Balachandran <nbalacha@redhat.com>
2018-03-12 10:15:52 +00:00
Milind Changire
0c3d984287 rpcsvc: correct event-thread scaling
Problem:
Auto thread count derived from the number of attachs and detachs
was reset to 1 when server_reconfigure() was called.

Solution:
Avoid auto-thread-count reset to 1.

Change-Id: Ic00e86adb81ba3c828e354a6ccb638209ae58b3e
BUG: 1547888
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-03-12 08:48:42 +00:00
ShyamsundarR
ece3f0f669 protocol: Fix 4.0 client, parsing older iatt in dict
In a mixed mode cluster involving 4.0 and older 3.x bricks, if
clients are newer, then the iatt encoded in the dictionary can be
of the older iatt format, which a newer client will map incorrectly
to the newer structure.

This causes failures in FOPs that depend on this iatt for some
functionality (seen in mkdir operations failing as EIO, when DHT
hits its internal setxattr call).

The fix provided is to convert the iatt in the dict, based on which
RPC version is used to communicate with the server.

IOW, this is the reverse of change in commit "b966c7790e"

Tested using a mixed mode cluster (i.e bricks in 3.12 and 4.0 versions)
and a mixed set of clients, 3.12 and 4.0 clients.

There is no regression test provided, as this needs a mixed mode cluster
to test and validate.

Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
BUG: 1554053
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-03-10 23:12:48 -05:00
ShyamsundarR
b966c7790e protocol: Added iatt conversion to older format
Added iatt conversion to an older format, when dealing with
older RPC versions. This enables iatt structure conformance
when dealing with older clients.

This helps fix rolling upgrade from 3.x versions to 4.0 version
of gluster by sending the right iatt in the dictionary when DHT
requests the same.

Change-Id: Ieaf925f81f8c7798a8fba1e90a59fa9dec82856c
BUG: 1544699
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-03-10 18:08:53 +00:00
Xavi Hernandez
157e55fe43 protocol/client: fix memory corruption
There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).

Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.

Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1553129
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2018-03-09 23:31:29 +01:00
Amar Tumballi
940f870f47 core: provide infra to make any xlator pass-through
updates: #304

Change-Id: If6a13d2e56b195390a386d720103a882e077f66c
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-09 18:32:56 +00:00
Amar Tumballi
b2613c9eed tests: don't kill the process directly with KILL signal
Instead send the SIGTERM (default, 15) first, and at the end
send SIGKILL. If SIGKILL is sent directly, we miss many tests
like valgrind, lcov etc., not able to process the information
properly.

BUG: 1549000
Change-Id: I664de12ee7dbf47eb98b8141004cd51f6006b314
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-08 11:15:01 +01:00
Atin Mukherjee
2a1adc5c93 hooks: fix workdir in S13create-subdir-mounts.sh
Change-Id: Id3eff498091ad9fa4651e93b66903426e76776d6
BUG: 1549915
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
2018-03-07 07:39:28 +00:00
Ravishankar N
bd2c45fe31 glusterd: volume get fixes for client-io-threads & quorum-type
1. If a replica volume created on glusterfs-3.8 was upgraded to
glusterfs-3.12, `gluster vol get volname client-io-threads` displayed
'on' even though it wasn't and the xlator wasn't loaded on
the client-graph. This was due to removing certain checks in
glusterd_get_default_val_for_volopt as a part of commit
47604fad4c2a3951077e41e0c007ceb979bb2c24. Fix it.

2. Also, as a part of op-version bump-up, client-io-threads was being
loaded on the clients  during volfile regeneration. Prevent it.

3. AFR assumes quorum-type to be auto in newly created replic 3 (odd
replica in general) volumes but `gluster vol get quorum-type` displays
'none'. Fix it.

Change-Id: I19e586361ed1065c70fb378533d3b4dac1095df9
BUG: 1545056
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
2018-03-07 04:48:09 +00:00
Amar Tumballi
685d4409f9 hooks: add a script to stat the subdirs in add-brick
The subdirectories are expected to be present for a subdir
mount to be successful. If not, the client_handshake()
itself fails to succeed. When a volume is about to get
mounted first time, this is easier to handle, as if the
directory is not present in one brick, then its mostly
not present in any other brick. In case of add-brick,
the directory is not present in new brick, and there is
no chance of healing it from the subdirectory mount, as
in those clients, the subdir itself will be 'root' ('/')
of the filesystem. Hence we need a volume mount to heal
the directory before connections can succeed.

This patch does take care of that by healing the directories
which are expected to be mounted as subdirectories from the
volume level mount point.

Change-Id: I2c2ac7b7567fe209aaa720006d09b68584d0dd14
BUG: 1549915
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-03-06 14:41:44 +00:00
Pranith Kumar K
51d3490798 cluster/afr: Remove unused code paths
Removed
1) afr-v1 self-heal locks related code which is not used anymore
2) transaction has some data types that are not needed, so removed them
3) Never used lock tracing available in afr as gluster's network tracing does
the job. So removed that as well.
4) Changelog is always enabled and afr is always used with locks, so
__changelog_enabled, afr_lock_server_count etc functions can be deleted.
5) transaction.fop/done/resume always call the same functions, so no need
to have these variables.

BUG: 1549606
Change-Id: I370c146fec2892d40e674d232a5d7256e003c7f1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-06 08:49:31 +00:00
Gaurav Yadav
4511b45bf4 glusterd : memory leak in mgmt_v3 lock functionality
In order to take care of stale lock issue, a timer was intrduced
in mgmt_v3 lock. This timer is not freeing the memory due to
which this leak got introduced

With this fix now memory cleanup in locking is handled properly

Change-Id: I2e1ce3ebba3520f7660321f3d97554080e4e22f4
BUG: 1550339
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
2018-03-06 08:05:24 +00:00
Pranith Kumar K
9be043159a cluster/afr: Remove compound-fops usage in afr
We are not seeing much improvement with this change. So removing the
feature so that it doesn't need to be maintained anymore.

Fixes: #414
Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-03-06 07:55:25 +00:00
Poornima G
e8446ef312 nl-cache: Fix coverity issue RESOURCE_LEAK
Change-Id: Ic552f31853e1886b8c76d45c8c66251f1fd6f97f
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-06 05:04:06 +00:00
Poornima G
5689a3c2d6 nl-cache: Fix coverity issue RETURN_LOCAL
Change-Id: Ic6fbd34aad2a5ae5e27d833300bcd1284cb98c24
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-06 05:02:51 +00:00
Csaba Henk
7f9c56dd38 fuse: enable proper "fgetattr"-like semantics
GETATTR FUSE message can carry a file handle
reference in which case it serves as a hint
for the FUSE server that the stat data is
preferably acquired in context of the given
filehandle (which we call '"fgetattr"-like
semantics').

So far FUSE ignored the GETTATTR provided
filehandle and grabbed a file handle
heuristically. This caused confusion in the
caching layers, which has been tracked down
as one of the reasons of referred BUG.

As of the BUG, this is just a partial fix.

BUG: 1512691
Change-Id: I67eebbf5407ca725ed111fbda4181ead10d03f6d
Signed-off-by: Csaba Henk <csaba@redhat.com>
2018-03-06 03:45:00 +00:00
Kaleb S. KEITHLEY
2bb17551a5 build: address linkage issues
We have the following undefined symbol error from protocol/server.so:

  glusterfs_mgmt_pmap_signout
  glusterfs_autoscale_threads

See https://review.gluster.org/19225 (bz#1532238)
and https://review.gluster.org/19657 (bz#1550895)

(why are there two different bzs for the same bug?)

IMO this is a cleaner solution. I.e. moving the above two functions
to libgfrpc (.../rpc/rpc-lib/...)

I would also, for (foolish) consistency sake, like to see
glusterfs_mgmt_pmap_signin() moved from glusterfsd to libgfrpc as
well.

This works on f28/rawhide, with its new, more restrictive run-time
link semantics. The smoke and regression tests on earlier fedora and
centos will confirm that it works on those platforms too.

Change-Id: I9cfbd1cc15e7ebd9fc31b56ac791287fa2c584de
BUG: 1550895
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
2018-03-05 09:25:17 -05:00
Krutika Dhananjay
2347debbaf features/shard: Upon FSYNC from upper layers, wind fsync on all changed shards
Change-Id: Ib74354f57a18569762ad45a51f182822a2537421
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
2018-03-05 10:23:03 +05:30
Poornima G
259448385c quick-read: Fix coverity issue CHECKED_RETURN
Change-Id: I989e8fe28c86f67b7e54692c01ae3ed6e729aa16
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-05 01:38:06 +00:00
Poornima G
220a687f9f upcall: Fix coverity issues NEGATIVE_RETURNS
Change-Id: I7d2e733192127ff4ae00ba718562b031f45b72b9
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-05 01:37:40 +00:00
Poornima G
f5fc2a3188 io-cache: Fix coverity issue NEGATIVE_RETURNS
Change-Id: I811225ad20e3bd9f05820212e6a843f05d96b246
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-05 01:37:27 +00:00
Kaleb S. KEITHLEY
0ec482b5d5 build: fix typo, spelling mistake
transistional -> transitional

Change-Id: I1eb7e063288384458c305afea6d6c46a358701ff
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
2018-03-02 19:40:37 +00:00
Poornima G
1369f313d1 libglusterfs: Fix coverity issue FORWARD_NULL
Change-Id: I1402046edb232ca9d23346db82a0cfd041c91e70
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-02 19:38:45 +00:00
Anoop C S
fecb0fc748 protocol/server: Insert dummy clnt-lk-version to avoid upgrade failure
This is required as we check for 'clnt-lk-version' in SETVOLUME callback
with older clients in place against newer servers. Change is similar to
what we have done via https://review.gluster.org/#/c/19560/.

Change-Id: If333c20cf9503f40687ec926c44c7e50222c05b5
BUG: 1544699
Signed-off-by: Anoop C S <anoopcs@redhat.com>
2018-03-02 18:54:33 +00:00
Kaushal M
fc35d400cb libglusterfs: Fix volume_options_t struct
The volume_options_t struct was modified and a new member was introduced
in the middle of the struct. This caused GD2 to crash when it tried to
read the volume options. The new member has been moved to the end of the
struct to correct this.

And a note has been added to notify developers on how to modify this
struct, and the xlator_api_t struct.

Updates: gluster/glusterfs#302

Change-Id: I2e9899ec10516be29c7e9d574da53be8ec17a99e
Signed-off-by: Kaushal M <kaushal@redhat.com>
2018-03-02 09:00:14 +00:00
Poornima G
7e7fd3595e md-cache: Fix coverity issue FORWARD_NULL
Change-Id: I6ace846c412d898c0bc024b5d2081b11a223372f
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-03-02 07:25:02 +00:00
karthik-us
11b3bbd649 cluster/afr: Make afr_fsync a transaction
Change-Id: I713401feb96393f668efb074f2d5b870d19e6fda
BUG: 1548361
Signed-off-by: karthik-us <ksubrahm@redhat.com>
2018-03-02 05:28:00 +00:00
Krutika Dhananjay
a42137eee3 features/shard: Fix shard inode refcount when it's part of priv->lru_list.
For as long as a shard's inode is in priv->lru_list, it should have a non-zero
ref-count. This patch achieves it by taking a ref on the inode when it
is added to lru list. When it's time for the inode to be evicted
from the lru list, a corresponding unref is done.

Change-Id: I289ffb41e7be5df7489c989bc1bbf53377433c86
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
2018-03-02 05:26:00 +00:00
Pranith Kumar K
e7b79c5959 cluster/afr: Fix dict-leak in pre-op
At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e

BUG: 1550078
Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2018-02-28 18:53:59 +05:30
Varsha Rao
e2766c3263 perfomance/io-threads: Add option to disable client disconnect feature
> Add options to disable new features
> Commit ID: c071992e8d
> https://review.gluster.org/#/c/18291/
> By Michael Goulet <mgoulet@fb.com>

This patch is required to forward port io-threads namespace patch.
Updates: #401

Change-Id: Ice477fdf4b8934f9fac0b4a2f6c93db97429a586
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-02-28 01:45:51 +00:00
Varsha Rao
07372c3729 tests/basic/namespace: Check if brick multiplex is enabled
This patch fixes the namespace test failure when brick multiplexing is enabled.
By changing the log file name, when brick multiplexing is enabled. As only one
log file generated for all bricks.

Change-Id: Ide941946e5e1b2676e7139e1b5bf6b93b93c0815
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-02-27 18:31:38 +05:30
Poornima G
5196bfa6e6 io-cache: Fix coverity issue
Coverity issue : FORWARD_NULL
fd is assigned within a condition, but the fd is used even outside
the condition.

Change-Id: I6548d605d8a8acc6a25f1657f9fb75586d513042
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-02-27 11:38:30 +00:00
Kaleb S. KEITHLEY
bb4343fb1a libglusterfs: move compat RPC/XDR #defines to eliminate warnings
Building with libtirpc (versus legacy glibc rpc) results in many
warnings about xdr macros that are redefined in libtirpc headers
because of the way compat.h and glusterfs.h are usually #included.

And these xdr macros in libglusterfs/src/compat.h - which were copied
from legacy glibc's rpc headers - are different than the same-name macros
in libtirpc. I haven't checked to see that any of the macros are
expanded (incorrectly) between the definition in compat.h and the
redefinition in tirpc/rpc/xdr.h; the risk seems pretty minimal. Regardless
it seems better, from a truth-and-beauty perspective to not have the
old, incorrect definitions in the first place.

Not to mention that any file that #includes compat.h and not glusterfs.h
does not need these xdr macro definitions at all. They're really only
needed when using really old glibc rpc, which would only be evident if
including glusterfs.h and/or glusterfs-fops.h. (Which by the way, nothing
currently #includes glusterfs-fops.h by itself. And maybe nothing ever
should?)

Change-Id: Ic11e4407d6ab7c498a8745a99379cbf4788a24e8
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
2018-02-27 10:55:10 +00:00
N Balachandran
745e522c3a options: framework for options levels
Framework in order to classify options.

Updates gluster/glusterfs#302

Change-Id: I3dd6ae27bd0eb8e0065ffca75838c801e4f3ac91
Signed-off-by: N Balachandran <nbalacha@redhat.com>
2018-02-27 10:05:57 +00:00
Mohit Agrawal
7c3cc48505 glusterfsd: Memleak in glusterfsd process while brick mux is on
Problem: At the time of stopping the volume while brick multiplex is
         enabled memory is not cleanup from all server side xlators.

Solution: To cleanup memory for all server side xlators call fini
          in glusterfs_handle_terminate after send GF_EVENT_CLEANUP
          notification to top xlator.

BUG: 1544090
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

Note: Run all test-cases in separate build (https://review.gluster.org/19574)
      with same patch after enable brick mux forcefully, all test cases are
      passed.

Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
2018-02-27 07:11:15 +00:00
Varsha Rao
430bff7dc3 performance/io-threads: nuke everything from a client when it disconnects
> io-threads: nuke everything from a client when it disconnects
> Commit ID: 4d8268d760
> https://review.gluster.org/#/c/18254/
> By Jeff Darcy <jdarcy@fb.com>

This patch is required to forward port io-threads namespace patch.
Updates: #401

Change-Id: I13d3a74862eea3d01e8dbc8736987c3dae6e8b2a
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-02-27 03:45:30 +00:00
Krutika Dhananjay
8e21ea3e4f features/shard: Leverage block_num info in inode-ctx in read callback
... instead of adding this information in fd_ctx in call path and
retrieving it again in the callback.

Change-Id: Ibbddbbe85baadb7e24aacf5ec8a1250d493d7800
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
2018-02-27 01:54:36 +00:00
Krutika Dhananjay
15afb4cf9f features/shard: Pass the correct block-num to store in inode ctx
Change-Id: Icf3a5d0598a081adb7d234a60bd15250a5ce1532
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
2018-02-27 01:53:16 +00:00
Poornima G
4a8255f772 write-behind: Make aggregate size configurable
Currently the aggregate size is by default 128K (page size).
From performance perspective small number of large writes is faster
than large number of small writes, especially in EC volumes. But identifying
the right aggregate size depends on multiple factors like the memcpy overhead,
network overhead etc. On local machine, combining 128k writes to 1M writes for
EC volumes yielded 30% improvement.

As a part of this patch, aggregate size is just made configurable and page_size
is modified accordingly.

Raghavendra Gowdappa had suggested that, while aggregating writes we should get
rid of memcpy of large write size, and instead add the pointer to existinf vector,
will be doing it as a part of another patch. Also, in EC volumes, the vectors are
merged into one vector, so even if we save memcopy in write_behind, EC would anyways
do memcopy for merging vectors into one vector.

Updates: #364

Change-Id: Ib67294b8577bea14dde1c84cd271012ecea99f09
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-02-26 17:28:26 +00:00
Milind Changire
7d641313f4 rpcsvc: scale rpcsvc_request_handler threads
Scale rpcsvc_request_handler threads to match the scaling of event
handler threads.

Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c51
for a discussion about why we need multi-threaded rpcsvc request
handlers.

Change-Id: Ib6838fb8b928e15602a3d36fd66b7ba08999430b
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-02-26 15:14:38 +05:30
Poornima G
a1e59bc8fd md-cache: Modify options to be gd2 compatible
Change-Id: I79d51fee8ec5d2d237de7dd21c2d28c18cfd7ce8
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-02-26 02:29:58 +00:00
Poornima G
55d804a62e nl-cache: Change the options to be gd2 compatible
Change-Id: Ib9d233df41b85c845643e3e6eb2d680e01859a43
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-02-26 02:29:08 +00:00
Raghavendra G
32f5bc7950 cluster/dht: store the 'reaction' on failures per lock
Currently its passed in dht_blocking_inode(entry)lk, which would be a
global value for all the locks passed in the argument. This would
be a limitation for cases where we want to ignore failures on only few
locks and fail for others.

Change-Id: I02cfbcaafb593ad8140c0e5af725c866b630fb6b
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
2018-02-23 03:19:58 +00:00
Varsha Rao
cfbc524239 performance/io-threads: Add threads to priority based stagnant queues
> performance/io-threads: Add watchdog to cover up a possible thread leak
> Commit ID: 8b6804f75c
> https://review.gluster.org/#/c/18239/
> By Shreyas Siravara <sshreyas@fb.com>

This patch is required to forward port io-threads namespace patch.
Updates: #401

Change-Id: Id057c34a2abb9fc6dfb4afcd5c7bbbfe5693bbb8
Signed-off-by: Varsha Rao <varao@redhat.com>
2018-02-22 17:25:13 +00:00