572 Commits

Author SHA1 Message Date
Nandaja Varma
26cbd3bdf5 rpc-lib: Fixing the coverity issues
Coverity CIDs:
1210973
1124887
1124888
1124682
1124849
1124503

Change-Id: I012f6cf9d14753f572ab94aae6d442d1ef8df79a
BUG: 789278
Signed-off-by: Nandaja Varma <nandaja.varma@gmail.com>
Reviewed-on: http://review.gluster.org/9600
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-04-10 11:29:42 +00:00
Krishnan Parthasarathi
d448fd187d rpc: fix deadlock when unref is inside conn->lock
In ping-timer implementation, the timer event takes a ref on the rpc
object. This ref needs to be removed after every timeout event.
ping-timer mechanism could be holding the last ref. For e.g, when a peer
is detached and its rpc object was unref'd. In this case, ping-timer
mechanism would try to acquire conn->mutex to perform the 'last' unref
while being inside the critical section already. This will result in a
deadlock.

Change-Id: I74f80dd08c9348bd320a1c6d12fc8cd544fa4aea
BUG: 1206134
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/9613
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-04-10 03:03:48 +00:00
Jeff Darcy
8830e90fa1 socket: use OpenSSL multi-threading interfaces
OpenSSL isn't thread-safe unless you register these locking and thread
ID functions.  Most often the crashes would occur around
X509_verify_cert, even though it's insane that the certificate parsing
functions wouldn't be thread-safe.  The bug for this was filed over
two years ago, but it didn't seem like a high priority because the bug
didn't bite anyone until it caused a spurious regression-test failure.
Ironically, that was on a test for a *different* spurious
regression-test failure, which I guess is just deserts[1] for leaving
this on the to-do list so long.

[1] Yes, it really is "deserts" in that phrase - not as in very dry
places, but from late Latin "deservire" meaning to serve well or
zealously.  Aren't commit messages educational?

Change-Id: I2a6c0e9b361abf54efa10ffbbbe071404f82b0d9
BUG: 906763
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/10075
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-04-09 09:55:37 +00:00
Niels de Vos
dc128c6bb0 build: add more files to .gitignore
Change-Id: Icef0d7f443f7caf3aa386d3a6978f98cf3a5a4af
BUG: 1198849
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/10132
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2015-04-06 06:00:39 -07:00
arao
cf5bf1863d rpc: Fixing dereferencing after null check
CID: 1124607
The pointer variable is checked for NULL and
logged accordingly.

Change-Id: Ied0d7f7ff33da22198eca65f14816b943cae5541
BUG: 789278
Signed-off-by: arao <arao@redhat.com>
Reviewed-on: http://review.gluster.org/9674
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-04-02 06:37:53 -07:00
Jeff Darcy
0934432c51 socket: use TLS 1.2 instead of 1.0
Change-Id: I96e9b37e4855f5e12b2dbecf1f0b0887b21ad5ad
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/9949
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-27 11:29:33 -07:00
Dan Lambright
6f71bc02df glusterd: CLI commands to create and manage tiered volumes.
A tiered volume is a normal volume with some number of new bricks
representing "hot" storage. The "hot" bricks can be attached or
detached dynamically to a normal volume. When this happens, a new graph
is constructed. The root of the new graph is an instance of the tier
translator. One subvolume of the tier translator leads to the old volume,
and another leads to the new hot bricks.

attach-tier <VOLNAME> [<replica> <COUNT>] <NEW-BRICK> ... [force]

volume detach-tier <VOLNAME> [replica <COUNT>] <BRICK>
... <start|stop|status|commit|force>

gluster volume rebalance <volume> tier start

gluster volume rebalance <volume> tier stop

gluster volume rebalance <volume> tier status

The "tier start" CLI command starts a server side daemon. The daemon
initiates file level migration based on caching policies. The daemon's
status can be monitored and stopped.

Note development on the "tier status" command is incomplete. It will be
added in a subsequent patch.

When the "hot" storage is detached, the tier translator is removed
from the graph and the tiered volume reverts to its original state as
described in the volume's info file.

For more background and design see the feature page [1].

[1]
http://www.gluster.org/community/documentation/index.php/Features/data-classification

Change-Id: Ic8042ce37327b850b9e199236e5be3dae95d2472
BUG: 1194753
Signed-off-by: Dan Lambright <dlambrig@redhat.com>
Reviewed-on: http://review.gluster.org/9753
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
2015-03-19 06:32:28 -07:00
Gaurav Kumar Garg
d236b01a8b cli/glusterd: cli command implementation for bitrot features
CLI command for bitrot features.

volume bitrot <volname> enable|disable

Above command will enable/disable bitrot feature for particular volume.

BUG: 1170075
Change-Id: Ie84002ef7f479a285688fdae99c7afa3e91b8b99
Signed-off-by: Gaurav Kumar Garg     <ggarg@redhat.com>
Signed-off-by: Anand nekkunti        <anekkunt@redhat.com>
Signed-off-by: Dominic P Geevarghese <dgeevarg@redhat.com>
Reviewed-on: http://review.gluster.org/9866
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-18 18:34:07 -07:00
vmallika
3e18f09397 features/quota : Introducing inode quota
==========================================================================
                             Inode quota
==========================================================================
= Currently, the only way to retrieve the number of files/objects in a   =
= directory or volume is to do a crawl of the entire directory/volume.   =
= This is expensive and is not scalable.                                 =
=                                                                        =
= The proposed mechanism will provide an easier alternative to determine =
= the count of files/objects in a directory or volume.                   =
=                                                                        =
= The new mechanism proposes to store count of objects/files as part of  =
= an extended attribute of a directory. Each directory's extended        =
= attribute value will indicate the number of files/objects present      =
= in a tree with the directory being considered as the root of the tree. =
=                                                                        =
= The count value can be accessed by performing a getxattr().            =
= Cluster translators like afr, dht and stripe will perform aggregation  =
= of count values from various bricks when getxattr() happens on the key =
= associated with file/object count.                                     =

A new interface is introduced:
------------------------------
        limit-objects  : limit the number of inodes at directory level
        list-objects   : list the directories where the limit is set
        remove-objects : remove the limit from the directory

==========================================================================

CLI COMMAND:
gluster volume quota <volname> limit-objects <path> <number> [<percent>]

* <number> is a hard-limit for number of objects limitation for path "<path>"
  If hard-limit is exceeded, creation of file/directory is no longer
permitted.

* <percent> is a soft-limit for number of objects creation for path "<path>"
  If soft-limit is exceeded, a warning is issued for each creation.

CLI COMMAND:
gluster volume quota <volname> remove-objects [path]

==========================================================================

CLI COMMAND:
gluster volume quota <volname> list-objects [path] ...

Sample output:
------------------
  Path                   Hard-limit Soft-limit   Used  Available
Soft-limit exceeded?
Hard-limit exceeded?
  ------------------------------------------------------------------------
--------------------------------------
  /dir                      10       80%          10       0
Yes
        Yes

==========================================================================

[root@snapshot-28 dir]# ls
a  b  file11  file12  file13  file14  file15  file16  file17
[root@snapshot-28 dir]# touch a1
touch: cannot touch `a1': Disk quota exceeded
* Nine files are created in directory "dir" and directory is included in
* the
count too. Hence the limit "10" is reached and further file creation
fails

==========================================================================

Note: We have also done some re-factoring in cli for volume name
validation. New function cli_validate_volname is created

==========================================================================

Change-Id: I1823497de4f790a2a20ebb1770293472ea33ee2b
BUG: 1190108
Signed-off-by: Sachin Pandit <spandit@redhat.com>
Signed-off-by: vmallika <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/9769
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-18 18:24:12 -07:00
Venky Shankar
4737584fff features/changelog: RPC'fy {libgf}changelog
This patch introduces RPC based communication between the changelog
translator and libgfchangelog. It replaces the old pathetic stream
based interaction that existed earlier (due to time constraints :-/).

Changelog, upon initialization starts a RPC server (rpcsvc) allowing
clients to invoke a probe API as a bootup mechanism to request for
event notifications. During probe, clients can choose an event
filter specifying the type(s) of events they are interested in. As
of now there is no way to change the event notification set once
the probe RPC call is made, but that is easier to implement.

The actual event notifications is done on a separate RPC session.
The client (libgfchangelog) itself starts and RPC server which the
changelog translator "connects back" during probe. Notifications
are dispatched by a bunch of threads from the server (translator)
and the client optionally orders them if ordered notifications
are requried. FOPs fill in their respective event details in a
buffer (rot-buffs to be particular) and a bunch of threads
(consumers) swap the buffers out of roatation and dispatch them
via RPC. To avoid writer starvation, then number of dispatcher
threads is one less than the number of buffer list in rot-buffs.x

libgfchangelog becomes purely callback based -- upon event
notification from the server (and re-ordering them if required)
invoke a callback routine specified by consumer(s).

A major part of the patch is also aimed at providing backward
compatibility for geo-replication, which was one of the main
consumer of the stream based API. Also, this patch does not\
"turn on" event notifications for all fops, just a bunch which
is currently in requirement. Another pain point is that the
server does not filter events before dispatching it to the
clients. That load is taken up by the client itself (although
it's done at the library layer rather than making it hard on
the callback implementor). This needs improvement and care
needs to be taken to not load the server up with expensive
filtering mechanisms.

Change-Id: Ibf60a432b68f2dfa60c6f9add2bcfd37a9c41395
BUG: 1170075
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-on: http://review.gluster.org/9708
Reviewed-by: Jeff Darcy <jdarcy@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-18 18:22:36 -07:00
Meghana Madhusudhan
38ccaaf9d1 CLI : GLobal option for NFS-Ganesha
A new global CLI option has been introduced for NFS-Ganesha.
gluster features.ganesha enable/disable.
This option is persistent and shall be inherited
by new volumes created after this option is set.

gluster features.ganesha enable

It carries out the following functions:
1. Disables gluster-nfs across the cluster
2. Starts NFS-Ganesha server on a subset of nodes and exports  '/'.
3. Creates the HA cluster for NFS-Ganesha.
4. Writes the option into the global config file.

gluster features.ganesha disable

1. Stops NFS-Ganesha server.
2. Tears down the HA cluster for NFS-Ganesha

With this change the older volume set
options with keys "nfs-ganesha.host"
and "nfs-ganesha.enable" will no longer
be supported. This commit has only has the
CLI related changes. Another patch will
be submitted to support this feature entirely.

Change-Id: Ie4b66a16c23b33b795738654b9a68f8e2c34efe3
BUG: 1188184
Signed-off-by: Meghana Madhusudhan <mmadhusu@redhat.com>
Reviewed-on: http://review.gluster.org/9538
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2015-03-18 04:33:13 -07:00
Mohammed Rafi KC
260a694384 Snapshot/clone: clone of a snapshot that will act as a regular volume
snapshot clone will allow us to take a snpahot of a snapshot.
Newly created clone volume will be a regular volume with read/write
permissions.

CLI command
snapshot clone <clonename> <snapname>

Change-Id: Icadb993fa42fff787a330f8f49452da54e9db7de
BUG: 1199894
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9750
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-18 00:07:55 -07:00
Soumya Koduri
2a4561ef08 gfapi: APIs to store and process upcall notifications received
In case of any upcall cbk events received by the protocol/client,
gfapi will be notified which queues them up in a list (<gfapi_cbk_upcall>).

Applicatons are responsible to provide APIs to process & notify them in case
of any such upcall events queued.

Added a new API which will be used by Ganesha to repeatedly poll for any
such upcall event notified (<glfs_h_poll_upcall>).

A new test-file has been added to test the cache_invalidation upcall events.

Below link has a writeup which explains the code changes done -
        URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/

Change-Id: Iafc6880000c865fd4da22d0cfc388ec135b5a1c5
BUG: 1200262
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/9536
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
2015-03-17 14:01:21 -07:00
Jeff Darcy
0d2bed70fa every/where: add GF_FOP_IPC for inter-translator communication
Several features - e.g. encryption, erasure codes, or NSR - involve
multiple cooperating translators which sometimes need a "private" means
of communication amongst themselves.  Historically we've used virtual or
synthetic xattrs, but that's not very elegant and clutters up the
getxattr/setxattr path which must also handle real xattr requests.  This
new fop should address that.

The only argument is an int32_t "op" which should be recognized by the
target translator.  It is recommended that translators using these
feature follow some convention regarding the ops that they define, to
avoid conflicts.  Using a hash of the target translator's type string as
a base for a series of ops would probably be a good start.  Any other
information can be passed in both directions using xdata.

The default behavior for this fop, as with any other, is to pass through
to FIRST_CHILD.  That makes use of this fop "transparent" to other
translators that were written before it existed, but it also means that
it only really works with pass-through translators.  If a routing
translator (such as DHT) or a fan-out translator (such as AFR) is
involved, the IPC might not reach its intended destination unless those
translators are modified to forward IPC fops along all paths.

If an IPC gets all the way to storage/posix it is considered an error,
much like an uncaught exception.  We don't actually *do* anything in
that case, but we do log it send back an EOPNOTSUPP error.  This makes
the "unrecognized opcode" condition distinguishable from the "no IPC
support" condition (which would yield an RPC error instead) so clients
can probe for the presence of a handler for their own favorite opcode
and either use that or use old-school xattrs depending on the result.

BUG: 1158628
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Change-Id: I84af1b17babe5b30ec03ecf027ae37d09b873968
Reviewed-on: http://review.gluster.org/8812
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-17 07:02:15 -07:00
Niels de Vos
6b37049902 socket: use TCP_USER_TIMEOUT to detect client failures quicker
Use the network.ping-timeout to set the TCP_USER_TIMEOUT socket option
(see 'man 7 tcp'). The option sets the transport.tcp-user-timeout option
that is handled in the rpc/socket layer on the protocol/server side.
This socket option makes detecting unclean disconnected clients more
reliable.

When the socket gets closed, any locks that the client held are been
released. This makes it possible to reduce the fail-over time for
applications that run on systems that became unreachable due to
a network partition or general system error client-side (kernel panic,
hang, ...).

It is not trivial to create a test-case for this at the moment. We need
a client that unclean disconnects and an other client that tries to take
over the lock from the disconnected client.

URL: http://supercolony.gluster.org/pipermail/gluster-devel/2014-May/040755.html
Change-Id: I5e5f540a49abfb5f398291f1818583a63a5f4bb4
BUG: 1129787
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: http://review.gluster.org/8065
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: soumya k <skoduri@redhat.com>
Reviewed-by: Santosh Pradhan <santosh.pradhan@gmail.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
2015-03-17 05:10:17 -07:00
Soumya Koduri
2b97b57cd8 Upcall: New xlator to store various states and send cbk events
Framework on the server-side, to handle certain state of the files
accessed and send notifications to the clients connected.

A generic and extensible framework, used to maintain states in
the glusterfsd process for each of the files accessed
(including the clients info doing the fops) and send
notifications to the respective glusterfs clients incase of
any change in that state.

This patch handles "Inode Update/Invalidation" upcall event.

Feature page:
        URL: http://www.gluster.org/community/documentation/index.php/Features/Upcall-infrastructure

Below link has a writeup which explains the code changes done -
        URL: https://soumyakoduri.wordpress.com/2015/02/25/glusterfs-understanding-upcall-infrastructure-and-cache-invalidation-support/

Change-Id: Ie3d724be9a3419fcf18901a753e8ec2df2ac802f
BUG: 1200262
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/9535
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2015-03-17 05:08:07 -07:00
Mohammed Rafi KC
b3f63120e8 rdma:changing list iteration to safe mode
Change-Id: I2299378f02a5577a8bf2874664ba79e92c3811b5
BUG: 1201621
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9872
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-15 09:07:04 -07:00
Mohammed Rafi KC
219512c501 rdma: Free resources related to iobuf in fini
If rdma transport is destroyed because of any reason,
then rdma.so will be unloaded. But we are not setting
iobuf registeration function to null. After this, if
an iobuf request is came, then we will try to call a
function which is not loaded.

Change-Id: I3293f9974e16d8e865131785ee697ea02be8cdfc
BUG: 1187456
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9697
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-10 19:28:29 -07:00
Mohammed Rafi KC
1a4e7362af rdma:enhance logging when a connection error occur
Change-Id: I6146307949a3d852d3af5f8b273004ad6b27451b
BUG: 1196584
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9756
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Humble Devassy Chirammal <humble.devassy@gmail.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-10 19:22:34 -07:00
Humble Devassy Chirammal
a38faffd2c rdma: return proper data type.
Change-Id: I9bb0898af96cfcfaba0f0c976a7808bc6ea08e6a
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/9838
Reviewed-by: mohammed rafi  kc <rkavunga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-03-09 21:36:28 -07:00
Pranith Kumar K
a70231c78a cluster/ec: Add self-heal-daemon command handlers
This patch introduces the changes required in ec xlator to handle
index/full heal.

Index healer threads:
Ec xlator start an index healer thread per local brick. This thread keeps
waking up every minute to check if there are any files to be healed based on
the indices kept in index directory. Whenever child_up event comes, then also
this index healer thread wakes up and crawls the indices and triggers heal.
When self-heal-daemon is disabled on this particular volume then the healer
thread keeps waiting until it is enabled again to perform heals.

Full healer threads:
Ec xlator starts a full healer thread for the local subvolume provided by
glusterd to perform full crawl on the directory hierarchy to perform heals.
Once the crawl completes the thread exits if no more full heals are issued.

Changed xl-op prefix GF_AFR_OP to GF_SHD_OP to make it more generic.

Change-Id: Idf9b2735d779a6253717be064173dfde6f8f824b
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9787
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-03-09 15:36:31 -07:00
Humble Devassy Chirammal
854383198b rdma: 'list','wr' and 'new' memory has to be verified.
Change-Id: I29a8825107b8f4cefe4f4c59296e98fe675ee943
BUG: 1199053
Signed-off-by: Humble Devassy Chirammal <hchiramm@redhat.com>
Reviewed-on: http://review.gluster.org/9811
Reviewed-by: mohammed rafi  kc <rkavunga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-03-09 03:23:55 -07:00
Mohammed Rafi KC
e08aea2fd6 rdma:setting wrong remote memory.
when we send more than one work request in a single call,
the remote addr is always setting as the first address of
the vector. 

Change-Id: I55aea7bd6542abe22916719a139f7c8f73334d26
BUG: 1197548
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9794
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-03-04 22:08:05 -08:00
Mark Lipscombe
33214ef836 rdma: segfault trying to call ibv_dealloc_pd on a null pointer
if ibv_alloc_pd failed

If creating an ib protection domain fails, during the cleanup
a segfault will occur because trav->pd is null.

Bug: 1197260
Change-Id: I21b867c204c4049496b1bf11ec47e4139610266a
Signed-off-by: Mark Lipscombe <mlipscombe@gmail.com>
Reviewed-on: http://review.gluster.org/9774
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
2015-03-03 04:46:34 -08:00
Shyam
c48cbccfaf epoll: Fix broken RPC throttling due to MT epoll
The RPC throttle which kicks in by setting the poll-in event on a
socket to false, is broken with the MT epoll commit. This is due
to the event handler of poll-in attempting to read as much out of
the socket till it receives an EAGAIN. Which may never happen and
hence we would be processing far more RPCs that we want to.

This is being fixed by changing the epoll from ET to LT, and
reading request by request, so that we honor the throttle.

The downside is that we do not drain the socket, but go back to
epoll_wait before reading the next request, but when kicking in
throttle, we need to anyway and so a busy connection would degrade
to LT anyway to maintain the throttle. As a result this change
should not cause deviation in the performance much for busy
connections.

Change-Id: I522d284d2d0f40e1812ab4c1a453c8aec666464c
BUG: 1192114
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/9726
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-03-01 22:50:07 -08:00
Soumya Koduri
56488efe3c rpcsvc: New rpc routines defined to send callback requests
Change-Id: I7f95682faada16308314bfbf84298b02d1198efa
BUG: 1188184
Signed-off-by: Poornima G <pgurusid@redhat.com>
Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Reviewed-on: http://review.gluster.org/9534
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-03-01 21:30:43 -08:00
Krishnan Parthasarathi
b117d4d84b socket: allow only one epoll thread to read msg fragments
__socket_read_reply function releases sock priv->lock briefly for
notifying higher layers of message's xid. This could result in other
epoll threads that are processing events on this socket to read further
fragments of the same message. This may lead to incorrect fragment
processing and result in a crash.

Change-Id: I915665b2e54ca16f2ad65970e51bf76c65d954a4
BUG: 1197118
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/9742
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-27 21:16:11 -08:00
Mark Lipscombe
0e3fd04e93 rdma: Fix failure to call rdma_bind_addr if unable to bind privileged port.
When unable to bind a privileged port, rdma_bind_addr is not called.

This patch fixes that.

Change-Id: I175884a5d6a08b93dc62653ee0a6622bfc06e618
Bug: 1195907
Signed-off-by: Mark Lipscombe <mlipscombe@gmail.com>
Reviewed-on: http://review.gluster.org/9737
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: mohammed rafi  kc <rkavunga@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-26 09:12:09 -08:00
Atin Mukherjee
9d842f9656 glusterd: nfs,shd,quotad,snapd daemons refactoring
This patch ports nfs, shd, quotad & snapd with the approach suggested in
http://www.gluster.org/pipermail/gluster-devel/2014-December/043180.html

Change-Id: I4ea5b38793f87fc85cc9d2cf873727351dedffd2
BUG: 1191486
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Signed-off-by:  Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/9428
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Nekkunti <anekkunt@redhat.com>
2015-02-20 04:04:08 -08:00
Atin Mukherjee
bdb5ca2339 rdma: free rdma priv data if init fails
Change-Id: I57b38c8783666e806836dacf3f74cf9f6876070a
BUG: 1164079
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: http://review.gluster.org/9687
Reviewed-by: mohammed rafi  kc <rkavunga@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-19 01:49:55 -08:00
Mohammed Rafi KC
abcb2017b0 rdma: pre-register iobuf_pool with rdma devices.
registring buffers with rdma device is a time consuming
operation. So performing registration in code path will
decrease the performance.
Using a pre registered memory will give a bettor performance,
ie, register iobuf_pool during rdma initialization. For
dynamically created arena, we can register with all the
device.

Change-Id: Ic79183e2efd014c43faf5911fdb6d5cfbcee64ca
BUG: 1187456
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9506
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-17 20:09:54 -08:00
Mohammed Rafi KC
eebc3c0669 rdma: reduce log level from E to W
glusterd process, when try to initialize default vol file, will
always through an error if there is no rdma device. Changing the
log levels and log messages to more appropriately.

Change-Id: I75b919581c6738446dd2d5bddb7b7658a91efcf4
BUG: 1188232
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9559
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-17 06:56:04 -08:00
Mohammed Rafi KC
55ce0ef667 rdma:read multiple wr from cq and ack them in one call
we are reading one work completion request at a time
from cq, though we can read multiple work completion
requests from cq. Also we can acknowledge them in
one call itself. Both will give a better performance
because of less mutual exclusion locks are being performed.

Change-Id: Ib5664cab25c87db7f575d482eee4dcd2b5005c04
BUG: 1164079
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9329
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-17 06:29:23 -08:00
Mohammed Rafi KC
b8ed8da853 rdma: post multiple work request in a single call.
ibv_post-send will allow to send multiple work request
in a single call posting as linked list.

So if the payload count > 1, we can perform the data
operation in a single call to ibv_post_send.

Change-Id: Ib2e485cbbe6887919109e73e17d4fab595d5e65e
BUG: 1164079
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9327
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-12 05:03:50 -08:00
Mohammed Rafi KC
60361d84a0 rdma : agregate a vectored read as one
For a vectored read with payload count>1 will make two read
requests and to hold that a single contiguous memory is allocated.
So after completing the read request, instead of sending as vector
we will aggregate all the reads one.

Change-Id: I15e7d7bddc1a62d5097a39392575f47cfff3d3a8
BUG: 1164079
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9321
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-12 04:44:24 -08:00
Vijaikumar M
c61074400a epoll: edge triggered and multi-threaded epoll
- edge triggered (oneshot) polling with epoll
- pick one event to avoid multiple events getting picked up by same
  thread
  and so get better distribution of events against multiple threads
- wire support for multiple poll threads to epoll_wait in parallel
- evdata to store absolute index and not hint for epoll
- store index and gen of slot instead of fd and index hint
- perform fd close asynchronously inside event.c for multithread safety
- poll is still single threaded

Change-Id: I536851dda0ab224c5d5a1b130a571397c9cace8f
BUG: 1104462
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Vijaikumar M <vmallika@redhat.com>
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Signed-off-by: Shyam <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/3842
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-02-07 13:17:30 -08:00
Krishnan Parthasarathi
b3b4f9d81a rpc: fix ref leak in ping timer
Change-Id: I4ddc371d01ec763706a168a215410015ee2a3787
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/9578
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-02-04 22:12:09 -08:00
Aravinda VK
52765ad94f geo-rep: Adding Slave user field to georep status
New column introduced in Status output, "SLAVE USER",
Slave user is not "root" in non root Geo-replication setup.

Added additional tag in XML output <slave_user>

BUG: 1180459
Change-Id: Ia48a5a8eb892ce883b9ec114be7bb2d46eff8535
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: http://review.gluster.org/9409
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
Reviewed-by: Avra Sengupta <asengupt@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
2015-02-02 12:19:07 -08:00
Jeff Darcy
0b9a6a63b5 socket: fix segfaults when TLS management connections fail
Change-Id: I1fd085b04ad1ee68c982d3736b322c19dd12e071
BUG: 1160900
Signed-off-by: Jeff Darcy <jdarcy@redhat.com>
Reviewed-on: http://review.gluster.org/9059
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2015-01-27 06:03:58 -08:00
Pranith Kumar K
7510d8edf4 mgmt/glusterd: Implement Volume heal enable/disable
For volumes with replicate, disperse xlators, self-heal daemon should do
healing. This patch provides enable/disable functionality for the xlators to be
part of self-heal-daemon. Replicate already had this functionality with
'gluster volume set cluster.self-heal-daemon on/off'. But this patch makes it
uniform for both types of volumes. Internally it still does 'volume set' based
on the volume type.

Change-Id: Ie0f3799b74c2afef9ac658ef3d50dce3e8072b29
BUG: 1177601
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/9358
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
2015-01-20 02:24:24 -08:00
Ravishankar N
8beaf169e3 cluster/afr: split-brain resolution CLI
Extend the AFR heal command to include automated split-brain resolution.

This patch [3/3] is the final patch for afr automated split-brain resolution
implementation.

"gluster volume heal <VOLNAME> [full | statistics [heal-count [replica
<HOSTNAME:BRICKNAME>]] |info [healed | heal-failed | split-brain]| split-brain
{bigger-file <FILE> |source-brick <HOSTNAME:BRICKNAME> [<FILE>]}]"

The new additions being:
1.gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
Locates the replica containing the FILE, selects bigger-file as source and
completes heal.

2.gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
<FILE>
Selects <FILE> present in <HOSTNAME:BRICKNAME> as source and completes heal.

3.gluster volume heal <VOLNAME> split-brain <HOSTNAME:BRICKNAME>
Selects all split-brained files in <HOSTNAME:BRICKNAME> as source and completes
heal.

Note: <FILE> can be either the full file name as seen from the root of the
volume (or) the gfid-string representation of the file, which sometimes gets
displayed in the heal info command's output.

Entry/gfid split-brain resolution is not supported.

Example can be found in the test case.

Change-Id: I4649733922d406f14f28ee9033a5cb627b9538b3
BUG: 1136769
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/9377
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2015-01-15 01:28:37 -08:00
Krishnan Parthasarathi
c4561b6fd9 rpc: initialise transport's list on creation
Initialising the transport's list, meant to hold clients connected to
it, on the first connection event is prone to race, especially with the
introduction of multi-threaded event layer.

BUG: 1181203
Change-Id: I6a20686a2012c1f49a279cc9cd55a03b8c7615fc
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/9413
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2015-01-15 00:06:34 -08:00
Harshavardhana
12c15f47fe build: FreeBSD 11-Current causes libtool to fail with '-shared'
Thanks for Markiyan Kushnir <markiyan.kushnir@gmail.com> for
reporting this

Change-Id: I7f637295c7c2d54c33a4c16e29daf0b518874911
BUG: 1111774
Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
Reviewed-on: http://review.gluster.org/9251
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2014-12-12 01:21:49 -08:00
Krishnan Parthasarathi
757394c1d7 rpc/rpcsvc: add peername to log messages
This would allow users/developers to associate rpc layer log messages
to the corresponding connection.

Change-Id: I040f79248dced7174a4364d9f995612ed3540dd4
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-on: http://review.gluster.org/8535
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-12-10 22:35:03 -08:00
Mohammed Rafi KC
92a2932201 rdma:vectored write fails for rdma.
For rdma write with payload count greater than one
will fail due to insuffient memory to hold the
buffers in rpc transport layer. It was expecting
only one vector in payload, So it can only able
to decode the first iovec from payload, and the
rest will be discarded.

Thnaks to Raghavendra Gowdappa for fixing the
same.

Change-Id: I82a649a34abe6320d6216c8ce73e69d9b5e99326
BUG: 1171142
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9247
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-12-07 20:37:43 -08:00
Mohammed Rafi KC
4a3c36ba00 rdma:client process will hang if server is started to send
the request before completing connection establishment

in rdma, client and server will interchange their available
buffers during the handshake to post incoming messages.
Initially the available buffer is set to one, for the first message
during handshake,when first message is received, quota for
the buffer will set to proper value. So before receiving the message
if server started to send the message, then the reserverd buffer for
handshake will be utilised, then the handshake will fail because
of lack of buffers. So we should block sending messages by server
before proper connection establishment.

Change-Id: I68ef44998f5df805265d3f42a5df7c31cb57f136
BUG: 1158746
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/9003
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-11-18 02:38:33 -08:00
Mohammed Rafi KC
85e92d160b rdma: client connection establishment takes more time
For rdma type only volume client connection establishment
with server takes more than three seconds. Because for
tcp,rdma type volume, will have 2 ports one for tcp and
one for rdma, tcp port is stored with brickname and rdma
port is stored as "brickname.rdma" during pamap_sighin.
During the handshake when trying to get the brick port
for rdma clients, since we are not aware of server
transport type, we will append '.rdma' with brick name.
So for tcp,rdma volume there will be an entry with
'.rdma', but it will fail for rdma type only volume.
So we will try again, this time without appending '.rdma'
using a flag variable need_different_port, and it will succeed,
but the reconnection happens only after 3 seconds.
In this patch for rdma only type volume 
we will append '.rdma' during the pmap_signin. So during the
handshake we will get the correct port for first try itself.
Since we don't need to retry , we can remove the
need_different_port flag variable.

Change-Id: Ie8e3a7f532d4104829dbe995e99b35e95571466c
BUG: 1153569
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/8934
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-11-18 00:50:13 -08:00
Mohammed Rafi KC
43800dedb5 rdma:rdma fuse mount hangs for tcp,rdma volumes if brick is down.
When we try to mount a tcp,rdma volume as rdma
transport using FUSE protocol, then mount will
hang if the brick is down. When we kill a process,
signal will be received in glusterfsd process and
it will call pmap_signout with port listening on tcp only.
In case of the tcp,rdma there will be two ports,
and port which is listening for rdma will not
called for sign out. 
So the mount process will try to connect to a port
which is not open and it will keep trying to connect.
This patch will call pmap_signout for rdma port also,
So when mount tries to get the brick port,it will fail.

Change-Id: I23676f65f96eb90b69b76478f7a21412a6aba70f
BUG: 1143886
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: http://review.gluster.org/8762
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-11-17 23:30:55 -08:00
Mohammed Rafi KC
f645655c65 rdma: glusterd crash if rdma_disconnect is called as soon as connect a request.
we are initializing connection in server side immediately after
rdma_accept is called. But we are delaying adding the transport
to listener list until getting RDMA_CM_EVENT_ESTABLISHED event.
Before getting this event if disconnect is called glusterd will
try to remove the transport from list which is not added. So if
the list is empty it causes a glusterd crash . In this patch we
will call the function to initialize the connection as soon as
rdma_accept is called.

Change-Id: I019480297a85349ede3101ee9c7c1596dc5c73e2
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
BUG: 1164079
Reviewed-on: http://review.gluster.org/8925
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
2014-11-14 04:03:59 -08:00
Emmanuel Dreyfus
77a6917a65 Build fix: xdrgen
As discovered in https://review.gluster.org/8762, BSD systems fail to
run xdrgen during glusterfs build. This seems to be caused by a
difference between BSD make and GNU make whith implcit targets. The
former seems to use absolute path here, which means we should not
prepend it with the current directory path, otherwise we have the
directory path twice and the files cannot be found my make.

This is a second attempt after I178123bf6f3d9e963ff5b78839d498f530c05a97
which was broken and reverted in I3c8966288f66d0eafa2e94490e3b64a057b4f2c0

BUG: 1157839
Change-Id: I797c536c319a156b71a42c82cbaf80bbf17b7234
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/9046
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-11-13 22:16:17 -08:00