Commit Graph

887 Commits

Author SHA1 Message Date
Raghavendra G
b1186532c7 transport/socket: log shutdown msg occasionally
Change-Id: If3fc0884e7e2f45de2d278b98693b7a473220a5e
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Fixes: bz#1679904
(cherry picked from commit ec1b84300fe267dd12c1e42e7e91905db935f1e2)
2019-04-16 13:38:58 +00:00
Kaleb S. KEITHLEY
27a96f1f34 rpclib: slow floating point math and libm
In release-6 rpc/rpc-lib (libgfrpc) added the function
get_rightmost_set_bit() which calls log2(3), a call that takes
a floating point parameter and returns a floating point.

It's used thusly:
    right_most_unset_bit = get_rightmost_set_bit(...);

(So is it really the right-most unset bit, or the right-most set bit?)

It's unclear to me whether this is in the data path or not. If it is,
it's rather scary to think about integer-to-float and float-to-integer
conversions and slow calls to libm functions in the data path.

gcc and clang have __builtin_ctz() which returns the same result as
get_rightmost_set_bit(), and does it substantially faster. Approx
20M iterations of get_rightmost_set_bit() took ~33sec of wall clock
time on my devel machine, while 20M iterations of __builtin_ctz()
took < 9sec; get_rightmost_set_bit() is 3x slower than __builtin_ctz().

And as a side benefit, we can again eliminate the need to link libgfrpc
with libm.

Change-Id: If9e7e80874577c52223f8125b385fc930de20699
fixes: bz#1692957
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
2019-04-16 10:48:06 +00:00
Amar Tumballi
9c441360ac dict: handle STR_OLD data type in xdr conversions
Currently a dict conversion on wire for 3.x protocol happens using
`dict_unserialize()`, which sets the type of data as STR_OLD. But the
new protocol doesn't send it over the wire as its not considered as a
valid format in new processes.

But considering we deal with old and new protocol when we do a rolling
upgrade, it will allow us to get all the information properly with new
protocol.

Credits: Krutika Dhananjay

Fixes: bz#1686364
Change-Id: I165c0021fb195b399790b9cf14a7416ae75ec84f
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2019-03-08 14:08:40 +00:00
Milind Changire
4cb1d6d94a socket: socket event handlers now return void
Problem:
Returning any value from socket event handlers to the event sub-system
doesn't make sense since event sub-system cannot handle socket
sub-system errors.

Solution:
Change return type of all socket event handlers to 'void'

mainline:
> Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
> Fixes: bz#1651246
> Signed-off-by: Milind Changire <mchangir@redhat.com>
> Reviewed-on: https://review.gluster.org/c/glusterfs/+/22221

Change-Id: I70dc2c57f12b7ea2fae41120f71aa0d7fe0b2b6f
Fixes: bz#1683900
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 776ba851c6ee6c265253d44cf1d6e4e3d4a21772)
2019-03-02 11:54:24 +00:00
Yaniv Kaul
c7d1aee76d Multiple files: reduce work while under lock.
Mostly, unlock before logging.
In some cases, moved different code that was not needed
to be under lock (for example, taking time, or malloc'ing)
to be executed before taking the lock.

Note: logging might be slightly less accurate in order, since it may
not be done now under the lock, so order of logs is racy. I think
it's a reasonable compromise.

Compile-tested only!
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: I2438710016afc9f4f62a176ef1a0d3ed793b4f89
2019-01-29 09:27:22 +00:00
Mohit Agrawal
04f84756e1 core: heketi-cli is throwing error "target is busy"
Problem: At the time of deleting block hosting volume
         through heketi-cli , it is throwing an error "target is busy".
         cli is throwing an error because brick is not detached successfully
         and brick is not detached due to race condition to cleanp xprt
         associated with detached brick

Solution: To avoid xprt specifc race condition introduce an atomic flag
          on rpc_transport

Change-Id: Id4ff1fe8375a63be71fb3343f455190a1b8bb6d4
fixes: bz#1668190
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
2019-01-24 06:54:41 +00:00
Amar Tumballi
990d6a99d4 core: move logs which are only developer relevant to DEBUG level
We had only changed the log level to DEBUG in release branch earlier.
But considering 90%+ of our deployments happen in same env, we can look
at these specific logs on need basis. With this change, the master
branch will be easier to debug with lesser logs.

Change-Id: I4157a7ec7d5ec9c2948b2bbc1e4cb8317f28d6b8
Updates: bz#1666833
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2019-01-23 16:05:52 +00:00
Milind Changire
b6c417785e rpc: use address-family option from vol file
This patch helps enable IPv6 connections in the cluster.
The default address-family is IPv4 without using this option explicitly.

When address-family is set to "inet6" in the /etc/glusterfs/glusterd.vol
file, the mount command-line also needs to have
-o xlator-option="transport.address-family=inet6" added to it.

This option also gets added to the brick command-line.
Snapshot and gfapi use-cases should also use this option to pass in the
inet6 address-family.

Change-Id: I97db91021af27bacb6d7578e33ea4817f66d7270
fixes: bz#1635863
Signed-off-by: Milind Changire <mchangir@redhat.com>
2019-01-22 13:47:19 +00:00
Zhang Huan
cd57145546 socket: don't pass return value from protocol handler to event handler
Event handler handles socket level error only, while protocol handler
handles in protocol level error. If protocol handler decides to
disconnect on error in any case, it should call disconnect instead of
return an error back to event handler.

Change-Id: I9375be98cc52cb969085333f3c7229a91207d1bd
updates: bz#1666143
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
2019-01-22 06:59:37 +00:00
Zhang Huan
4d9935a4db socket: fix issue when socket read return with EAGAIN
In the case socket read returns EAGAIN, positive value about remaining
vector to send is returned. This return value will be passed all the way
back to event handler, making it complains.

[2018-12-29 08:02:25.603199] T [socket.c:1640:__socket_read_simple_payload] 0-test-client-0-extra.0: partial read on non-blocking socket.
[2018-12-29 08:02:25.603201] T [rpc-clnt.c:654:rpc_clnt_reply_init] 0-test-client-2-extra.1: received rpc message (RPC XID: 0xfa6 Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 12) from rpc-transport (test-client-2-extra.1)
[2018-12-29 08:02:25.603207] T [socket.c:3129:socket_event_handler] 0-test-client-0-extra.0: (sock:32) socket_event_poll_in returned 1

Formerly, in socket_proto_state_machine, return value of socket_readv is
used to check if message is all read-in. In this commit, it is checked
whether size of bytes indicated in header are all read in. In this way,
only 0 and -1 will be returned from socket_proto_state_machine(),
indicating whether there is error in the underlying socket.

Change-Id: I8be0d178b049f0720d738a03aec41c4b375d2972
updates: bz#1666143
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
2019-01-22 06:52:30 +00:00
Zhang Huan
0301a66bda socket: fix issue when socket write return with EAGAIN
In the case socket write return with EAGAIN, the remaining vector count
is return all way back to event handler, making followup pollin event to
skip handling and dispatch loop complains about failure. Even thought
temporary write failure is not an error.

[2018-12-29 07:31:41.772310] E [MSGID: 101191] [event-epoll.c:674:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

Change-Id: Idf03d120b5f7619eda19720a583cbcc3e7da2504
updates: bz#1666143
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
2019-01-17 08:29:48 +00:00
Zhang Huan
22778ca889 socket: fix counting of socket total_bytes_read and total_bytes_write
Change-Id: If35d0dbae963facf00ab6bcf07c6e4d1706ed982
updates: bz#1666143
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
2019-01-17 08:29:32 +00:00
Amar Tumballi
37653efdc7 Revert "iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool"
This reverts commit b87c397091.

There seems to be some performance regression with the patch and hence recommended to have it reverted.

Updates: #325
Change-Id: Id85d6203173a44fad6cf51d39b3e96f37afcec09
2019-01-08 11:16:03 +00:00
Kinglong Mee
8b4822d457 rpc-clnt: reduce transport connect log for EINPROGRESS
quotad and ganesha.nfsd prints many logs as,

[rpc-clnt.c:1739:rpc_clnt_submit ] 0-<VOLUME_NAME>-quota: error returned while attempting to connect to host: (null), port 0

Change-Id: Ic0c815400619e4a87a772a51b19822920228c1ef
Updates: bz#1596787
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
2019-01-07 03:19:34 +00:00
Pranith Kumar K
4a15ea1fd3 rpcsvc: Don't expect dictionary values to be available
When reconfigure happens, string values from one dictionary
are directly set in another dictionary. This can lead to
invalid memory when the first dictionary is freed up.
So do dict_set_dynstr_with_alloc instead of dict_set_str

updates bz#1650403
Change-Id: Id53236467521cfdeb07e7178d87ba6cf88d17003
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
2019-01-07 03:17:49 +00:00
Sheetal Pamecha
8d38c5b733 rpc/rpc-lib: fix coverity issue
Defect: Code can never be reached because of the
condition queue_index > 1024 cannot be true.

CID: 1398471 Logically dead code
updates: bz#789278

Change-Id: I367cda7e734f6d774900a58d8664cffcab69126f
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
2018-12-28 10:40:49 +00:00
Sunny Kumar
de1fb17ac3 rpc : fix coverity in rpc/rpc-lib/src/rpcsvc.c
This patch fixes newly introduced coverity.

CID: 1398472: Dereference before null check.
updates: bz#789278

Change-Id: Ie9b13084097de8f24b138acd7608c3e15b3bba9c
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
2018-12-28 10:31:53 +00:00
Krutika Dhananjay
a1e7acc93a socket: Remove redundant in_lock in incoming message handling
A given epoll thread can handle only one incoming (POLLIN) request.
And until the socket is rearmed for listening, it is guaranteed that
there won't be any new incoming requests. As a result, the priv->in_lock
which guards the socket proto state machine seems redundant.

This patch removes priv->in_lock.

Change-Id: I26b6ddd852aba8c10385833b85ffd2e53e46cb8c
updates: bz#1467614
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
2018-12-26 05:15:08 +00:00
Mohit Agrawal
0c7425d431 rpc: Use adaptive mutex in rpcsvc_program_register
Adaptive mutexes are used to protect critical/shared data items that
are held for short periods.It provides a balance between spin locks
and traditional mutex.We have observed after use adaptive mutex in
rpcsvc_program_register got some improvement.

Change-Id: I7905744b32516ac4e4ca3c83c2e8e5e306093add
fixes: bz#1660701
2018-12-20 16:44:26 +00:00
Rinku Kothiya
e3ec41af9a rdma: fix possible buffer overflow
used snprintf instead of sprintf and if the source string is bigger
than destination then logged a warning message.

clang warning: ‘%s’ directive writing up to 1024 bytes into a region
of size 108.

updates: bz#1622665

Change-Id: Ia5e7c53d35d8178dd2c75708698599fe8bded5de
Signed-off-by: Rinku Kothiya <rkothiya@redhat.com>
2018-12-19 11:24:57 +00:00
Poornima G
b87c397091 iobuf: Get rid of pre allocated iobuf_pool and use per thread mem pool
The current implementation of iobuf_pool has two problems:
- prealloc of 12.5MB memory, this limits the scale factor of the gluster
  processes due to RAM requirements
- lock contention, as the current implementation has one global
  iobuf_pool lock. Credits for debugging and addressing the same goes to
  Krutika Dhananjay <kdhananj@redhat.com>. Issue: #410

Hence changing the iobuf implementation to use per thread mem pool.
This may theoritically appear to cause perf dip as there is no preallocation.
But per thread mem pool will not have significant perf impact as the last
allocated memory is kept alive for subsequent allocs, for some time.
The worst case would be if iobufs requested are of random sizes each time.
The best case is, if we get iobuf request of the same size. From the perf
tests, this patch did not seem to cause any perf decrease.

Note that, with this patch, the rdma performance is going to degrade
drastically. In one of the previous patchsets we had fixes to not
degrade rdma perf, but rdma is not supported and also not tested [1].
Hence the decision was to not have code in rdma that is not tested
and not supported.

[1] https://lists.gluster.org/pipermail/gluster-users.old/2018-July/034400.html

Updates: #325
Change-Id: Ic2ef3bd498f9250dea25f25ba0c01fde19584b27
Signed-off-by: Poornima G <pgurusid@redhat.com>
2018-12-18 09:35:24 +00:00
ShyamsundarR
bfe2b5e153 clang: Fix various missing checks for empty list
When using list_for_each_entry(_safe) functions, care needs
to be taken that the list passed in are not empty, as these
functions are not empty list safe.

clag scan reported various points where this this pattern
could be caught, and this patch fixes the same.

Additionally the following changes are present in this patch,
- Added an explicit op_ret setting in error case in the
macro MAKE_INODE_HANDLE to address another clang issue reported
- Minor refactoring of some functions in quota code, to address
possible allocation failures in certain functions (which in turn
cause possible empty lists to be passed around)

Change-Id: I1e761a8d218708f714effb56fa643df2a3ea2cc7
Updates: bz#1622665
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-12-14 04:33:15 +00:00
Mohit Agrawal
fb917bf10b [geo-rep]: Worker still ACTIVE after killing bricks
Problem: In changelog xlator after destroying listener it call's
         unlink to delete changelog socket file but socket file
         reference is not cleaned up from process memory

Solution: 1) To cleanup reference completely from process memory
             serialize transport cleanup for changelog and then
             unlink socket file
          2) Brick xlator will notify GF_EVENT_PARENT_DOWN to next
             xlator only after cleanup all xprts

Test: To test the same run below steps
      1) Setup some volume and enable brick mux
      2) kill anyone brick with gf_attach
      3) check changelog socket for specific to killed brick
         in lsof, it should cleanup completely

fixes: bz#1600145

Change-Id: Iba06cbf77d8a87b34a60fce50f6d8c0d427fa491
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
2018-12-13 04:46:50 +00:00
Mohit Agrawal
607bbd935f rpc: Resolve memory leak in mgmt_pmap_signout_cbk
Problem: At the time of submit signout request to mgmt
         rpc_clnt_mgmt_pmap_signout create a frame but in cbk
         frame is not destroyed

Solution: cleanup frame in mgmt_pmap_signout_cbk to avoid leak

Change-Id: I9961cacb2e02c8023c4c99e22e299b8729c2b09f
fixes: bz#1658045
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
2018-12-12 22:58:49 +00:00
Raghavendra Bhat
7dadea15c5 copy_file_range support in GlusterFS
* libglusterfs changes to add new fop

    * Fuse changes:
      - Changes in fuse bridge xlator to receive and send responses

    * posix changes to perform the op on the backend filesystem

    * protocol and rpc changes for sending and receiving the fop

    * gfapi changes for performing the fop

    * tools: glfs-copy-file-range tool for testing copy_file_range fop

      - Although, copy_file_range support has been added to the upstream
	    fuse kernel module, no release has been made yet of a kernel
        which contains the support. It is expected to come in the
        upcoming release of linux-4.20

        So, as of now, executing copy_file_range fop on a fused based
        filesystem results in fuse kernel module sending read on the
	    source fd and write on the destination fd.

	    Therefore a small gfapi based tool has been written to be able
        test the copy_file_range fop. This tool is similar (in functionality)
	    to the example program given in copy_file_range man page.

	    So, running regular copy_file_range on a fuse mount point and
	    running gfapi based glfs-copy-file-range tool gives some idea about
	    how fast, the copy_file_range (or reflink) can be.

	    On the local machine this was the result obtained.

	    mount -t glusterfs workstation:new /mnt/glusterfs
	    [root@workstation ~]# cd /mnt/glusterfs/
	    [root@workstation glusterfs]# ls
	    file
	    [root@workstation glusterfs]# cd
	    [root@workstation ~]# time /tmp/a.out /mnt/glusterfs/file /mnt/glusterfs/new
	    real  0m6.495s
	    user  0m0.000s
	    sys   0m1.439s
	    [root@workstation ~]# time glfs-copy-file-range $(hostname) new /tmp/glfs.log /file /rrr
	    OPEN_SRC: opening /file is success
	    OPEN_DST: opening /rrr is success
	    FSTAT_SRC: fstat on /rrr is success
	    copy_file_range successful

        real  0m0.309s
        user  0m0.039s
        sys   0m0.017s

        This tool needs following arguments
         1) hostname
         2) volume name
         3) log file path
         4) source file path (relative to the gluster volume root)
         5) destination file path (relative to the gluster volume root)

        "glfs-copy-file-range <hostname> <volume> <log file path> <source> <destination>"

      - Added a testcase as well to run glfs-copy-file-range tool

    * io-stats changes to capture the fop for profiling

    * NOTE:

      - Added conditional check to see whether the copy_file_range syscall
        is available or not. If not, then return ENOSYS.

      - Added conditional check for kernel minor version in fuse_kernel.h
        and fuse-bridge while referring to copy_file_range. And the kernel
        minor version is kept as it is. i.e. 24. Increment it in future
        when there is a kernel release which contains the support for
        copy_file_range fop in fuse kernel module.

    * The document which contains a writeup on this enhancement can be found at
      https://docs.google.com/document/d/1BSILbXr_knynNwxSyyu503JoTz5QFM_4suNIh2WwrSc/edit

Change-Id: I280069c814dd21ce6ec3be00a884fc24ab692367
updates: #536
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
2018-12-12 15:56:55 +00:00
ShyamsundarR
20ef211cfa libglusterfs: Move devel headers under glusterfs directory
libglusterfs devel package headers are referenced in code using
include semantics for a program, this while it works can be better
especially when dealing with out of tree xlator builds or in
general out of tree devel package usage.

Towards this, the following changes are done,
- moved all devel headers under a glusterfs directory
- Included these headers using system header notation <> in all
code outside of libglusterfs
- Included these headers using own program notation "" within
libglusterfs

This change although big, is just moving around the headers and
making it correct when including these headers from other sources.

This helps us correctly include libglusterfs includes without
namespace conflicts.

Change-Id: Id2a98854e671a7ee5d73be44da5ba1a74252423b
Updates: bz#1193929
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2018-12-05 21:47:04 +00:00
Xie Changlong
ad446dabb8 protocol/server: support server.all-squash
We still use gnfs on our side, so do a little work to support
server.all-squash. Just like server.root-squash, it's also a
volume wide option. Also see bz#1285126

$ gluster volume set <VOLNAME> server.all-squash on

Note: If you enable server.root-squash and server.all-squash
at the same time, only server.all-squash works. Please refer
to following table

+---------------+-----------------+---------------------------+
|               |all_squash       | no_all_squash             |
+-------------------------------------------------------------+
|               |                 |anonuid/anongid for root   |
|root_squash    |anonuid/anongid  |useruid/usergid for no-root|
+-------------------------------------------------------------+
|no_root_squash |anonuid/anongid  |useruid/usergid            |
+-------------------------------------------------------------+

Updates bz#1285126
Signed-off-by: Xie Changlong <xiechanglong@cmss.chinamobile.com>
Signed-off-by: Xue Chuanyu <xuechuanyu@cmss.chinamobile.com>
Change-Id: Iea043318fe6e9a75fa92b396737985062a26b47e
2018-12-05 21:45:49 +00:00
Sheetal Pamecha
65dc176e7c rpc-transport/socket: NULL pointer dereferencing clang fix
Problem: res->ai_addr could be NULL

Added a check to address this issue

Change-Id: Iac88a8d6dc1f009836554448afbc228df93decd6
Updates: bz#1622665
Signed-off-by: Sheetal Pamecha <sheetal.pamecha08@gmail.com>
2018-12-05 09:14:27 +00:00
Raghavendra Bhat
9fc6cf898b rpc: check if fini is there before calling it
The rpc_transport_t structure is allocated and filled in the
rpc_transport_load function. If filling the fileds of the rpc
structure fails, then in the failure handling the structure is
freed by rpc_transport_cleanup. There, it unconditionally calls
fini. But, if the failure handling was invoked because of any
failure in between the allocation of rpc_transport_t and filling
the transport->fini (including the failure to fill fini ()), then
rpc_transport_cleanup can lead to a segfault.

Change-Id: I8be9b84cd6b19933c559c9736198a6e440373f68
fixes: bz#1654917
Signed-off-by: Raghavendra Bhat <raghavendra@redhat.com>
2018-12-04 02:27:03 +00:00
Mohit Agrawal
46c15ea8fa server: Resolve memory leak path in server_init
Problem: 1) server_init does not cleanup allocate resources
            while it is failed before return error
         2) dict leak at the time of graph destroying

Solution: 1) free resources in case of server_init is failed
          2) Take dict_ref of graph xlator before destroying
             the graph to avoid leak

Change-Id: I9e31e156b9ed6bebe622745a8be0e470774e3d15
fixes: bz#1654917
Signed-off-by: Mohit Agrawal <moagrawal@redhat.com>
2018-12-03 11:34:35 +00:00
Yaniv Kaul
b56bf714c1 rpc *.h fles: align structs
Make an effort to slightly better align the structures.

Change-Id: I6f80a451f2ffbf15adfb986cedc24c2799787b49
updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
2018-12-03 05:53:34 +00:00
Yaniv Kaul
98a672f504 Multiple xlator .h files: remove unused private gf_* memory types.
It seems there were quite a few unused enums (that in turn
cause unndeeded memory allocation) in some xlators.
I've removed them, hopefully not causing any damage.

Compile-tested only!

updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: I8252bd763dc1506e2d922496d896cd2fc0886ea7
2018-11-30 11:51:18 +00:00
Raghavendra Gowdappa
95e380eca1 rpcsvc: provide each request handler thread its own queue
A single global per program queue is contended by all request handler
threads and event threads. This can lead to high contention. So,
reduce the contention by providing each request handler thread its own
private queue.

Thanks to "Manoj Pillai"<mpillai@redhat.com> for the idea of pairing a
single queue with a fixed request-handler-thread and event-thread,
which brought down the performance regression due to overhead of
queuing significantly.

Thanks to "Xavi Hernandez"<xhernandez@redhat.com> for discussion on
how to communicate the event-thread death to request-handler-thread.

Thanks to "Karan Sandha"<ksandha@redhat.com> for voluntarily running
the perf benchmarks to qualify that performance regression introduced
by ping-timer-fixes is fixed with this patch and patiently running
many iterations of regression tests while RCAing the issue.

Thanks to "Milind Changire"<mchangir@redhat.com> for patiently running
the many iterations of perf benchmarking tests while RCAing the
regression caused by ping-timer-expiry fixes.

Change-Id: I578c3fc67713f4234bd3abbec5d3fbba19059ea5
Fixes: bz#1644629
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
2018-11-29 01:19:12 +00:00
Raghavendra Gowdappa
18b6d7ce7d libglusterfs: rename macros roof and floor to not conflict with math.h
Change-Id: I666eeb63ebd000711b3f793b948d4e0c04b1a242
Signed-off-by: Raghavendra Gowdappa <rgowdapp@redhat.com>
Updates: bz#1644629
2018-11-28 15:11:59 +00:00
Xavi Hernandez
a0fdc9202c core: create a constant for default network timeout
A new constant named GF_NETWORK_TIMEOUT has been defined and all
references to the hard-coded timeout of 42 seconds have been
replaced with this constant.

Change-Id: Id30f5ce4f1230f9288d9e300538624bcf1a6da27
fixes: bz#1652852
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
2018-11-23 11:05:02 +01:00
Milind Changire
da81c9938e rpc: stop log flooding about ENODATA
Problem:
Logs are being flooded with ENODATA errors.
This log was introduced via https://review.gluster.org/c/glusterfs/+/21481

Solution:
Add a flag to remember that ENODATA error was logged for a
socket/transport

Change-Id: I54c10b87e46c2592339cc8b966333b8d08331750
fixes: bz#1650389
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-11-23 04:52:09 +00:00
Kaleb S. KEITHLE
4a4ba1f2eb core: fix strncpy warnings
Since gcc-8.2.x (fedora-28 or so) gcc has been emitting warnings
about buggy use of strncpy.

Most uses that gcc warns about in our sources are exactly backwards;
the 'limit' or len is the strlen/size of the _source param_, giving
exactly zero protection against overruns. (Which was, after all, one
of the points of using strncpy in the first place.)

IOW, many warnings are about uses that look approximately like this:
    ...
    char dest[8];
    char src[] = "this is a string longer than eight chars";
    ...
    strncpy (dest, src, sizeof(src)); /* boom */
    ...

The len/limit should be sizeof(dest).

Note: the above example has a definite over-run. In our source the
overrun is typically only theoretical (but possibly exploitable.)

Also strncpy doesn't null-terminate on truncation; snprintf does; prefer
snprintf over strncpy.

Mildly surprising that coverity doesn't warn/isn't warning about this.

Change-Id: I022d5c6346a751e181ad44d9a099531c1172626e
updates: bz#1193929
Signed-off-by: Kaleb S. KEITHLE <kkeithle@redhat.com>
2018-11-15 05:06:59 +00:00
Yaniv Kaul
8a5adc8116 rpc/rpc-lib/src/rpc-clnt.c: unlock sooner, if we fail to connect.
Previously, we did not go to unlock the mutex if we failed
to connect. This patch fixes it.

Compile-tested only!

updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: I0fcca066a2601dba6bc3e9eb8b3c9fc757ffe4db
2018-11-15 05:04:13 +00:00
Prashanth Pai
54a5f0a078 glusterfsd: Make each multiplexed brick sign in
NOTE: This change will be consumed by brick mux implementation of
glusterd2 only. No corresponsing change in glusterd1 has been made.

When a multiplexed brick process is shutting down, it sends sign out
requests to glusterd for all bricks that it contains. However, sign in
request is only sent for a single brick. Consequently, glusterd has to
use some tricky means to repopulate pmap registry with information of
multiplexed bricks during glusterd restart.

This change makes each multiplexed brick send a sign in request to
glusterd2 which ensures that glusterd2 can easily repopulate pmap
registry with port information.

As a bonus, sign in request will now also contain PID of the brick
sending the request so that glusterd2 can rely on this instead of
having to read/manage brick pidfiles.

Change-Id: I409501515bd9a28ee7a960faca080e97cabe5858
updates: bz#1193929
Signed-off-by: Prashanth Pai <ppai@redhat.com>
2018-11-12 04:09:51 +00:00
Yaniv Kaul
e134ef2493 rpc-clnt*: several code changes to reduce conn lock times
Assorted code refactoring to reduce lock contention.
Also, took the opportunity to reorder structs more properly.
Removed dead code.

Hopefully, no functional changes.
Compile-tested only!

updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: I5de6124ad071fd5e2c31832364d602b5f6d6fe28
2018-11-12 03:25:02 +00:00
Yaniv Kaul
cac2dba48b libglusterfs multiple files: remove dead initilization
Per newer GCC releases and clang-scan, some trivial
dead initialization (values that were set but were never
read) were removed.

Compile-tested only!

updates: bz#1193929
Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

Change-Id: Ia9959b2ff87d2e9cb46864e68ffe7dccb984db34
2018-11-11 16:06:29 +00:00
Amar Tumballi
74e8328d3f all: fix the format string exceptions
Currently, there are possibilities in few places, where a user-controlled
(like filename, program parameter etc) string can be passed as 'fmt' for
printf(), which can lead to segfault, if the user's string contains '%s',
'%d' in it.

While fixing it, makes sense to make the explicit check for such issues
across the codebase, by making the format call properly.

Fixes: CVE-2018-14661

Fixes: bz#1644763
Change-Id: Ib547293f2d9eb618594cbff0df3b9c800e88bde4
Signed-off-by: Amar Tumballi <amarts@redhat.com>
2018-11-05 18:50:59 +00:00
Shwetha Acharya
0c835893fd rpc-transport/socket: NULL pointer dereferencing clang fix
Problem: ctx and res can be NULL.

Solution: introduced a VALIDATE_OR_GOTO statement, hence removed
the null check for ctx; added a check for res.

Updates: bz#1622665

Change-Id: Ifee4c73e260530ab44c0a34c5ff5568f38f92c94
Signed-off-by: Shwetha Acharya <sacharya@redhat.com>
2018-10-30 05:08:04 +00:00
Milind Changire
196a2258ac transport: log socket closures more verbose
Problem:
Intentional and unintentional socket closures cannot be identified

Solution:
Log intentional socket closures with at least INFO log level

Change-Id: Ic02c882b16ab2193e57f8c3e6c3a82c4fe0f6875
fixes: bz#1642800
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-10-26 04:07:01 +00:00
Harpreet Lalwani
874ce6ef6e rpc/rpc-lib: Uninitialized argument value of a function
trav->saved_at.tv_sec is not initialized.

Calling "list_empty" function before initializing "trav".

Updates: bz#1622665

Change-Id: Ib5c2703a07a9c56ccd115001aca500f7a23c4a2e
Signed-off-by: Harpreet Lalwani <hlalwani@redhat.com>
2018-10-23 08:03:54 +00:00
Bhumika Goyal
c5f5ce2a9b rdma: coverity fixes
Fixes CID: 1382442 1382415 1382379 1382355

Change-Id: Ia712e37cb5a6db452d3178386394f87f83b85d38
updates: bz#789278
Signed-off-by: Bhumika Goyal <bgoyal@redhat.com>
2018-10-21 05:51:41 +00:00
Krishnan Parthasarathi
62faf7d37b socket: use accept4/paccept for nonblocking socket
This reduces the no. of syscalls on Linux systems from 2, accept(2) and
fcntl(2) for setting O_NONBLOCK, to a single accept4(2).  On NetBSD, we
have paccept(2) that does the same, if we leave signal masking aside.

Added sys_accept which accepts an extra flags argument than accept(2).
This would opportunistically use accept4/paccept as available.  It would
fallback to accept(2) and fcntl(2) otherwise.

While at this, the patch sets FD_CLOEXEC flag on the accepted socket fd.

BUG: 1236272
Change-Id: I41e43fd3e36d6dabb07e578a1cea7f45b7b4e37f
fixes: bz#1236272
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
2018-10-12 02:11:38 +00:00
Krishnan Parthasarathi
23e96fd93c socket: set FD_CLOEXEC on all sockets
For more information, see http://udrepper.livejournal.com/20407.html

BUG: 1236272
Change-Id: I25a645c10bdbe733a81d53cb714eb036251f8129
fixes: bz#1236272
Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com>
2018-10-11 05:15:28 +00:00
Kinglong Mee
0f2113cb0c socket: clear return value if error is going to be handled in event thread
Change-Id: Ibce94f282b0aafaa1ca60ab927a469b70595e81f
updates: bz#1626313
Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>
2018-10-10 05:51:57 +00:00
Milind Changire
3108fb24e7 rpc: coverity fixes
CID: [1] 1394646 Unchecked return value from library
CID: [2] 1394633 Unused value
CID:     1382443 Sleeping while holding a lock [This is intentional]

[1] https://scan6.coverity.com/reports.htm#v40014/p10714/fileInstanceId=86159112&defectInstanceId=26360786&mergedDefectId=1394646
[2] https://scan6.coverity.com/reports.htm#v40014/p10714/fileInstanceId=86159365&defectInstanceId=26360919&mergedDefectId=1394633

Change-Id: I03086f7a9672c9f50a2bc44cdbce0006c887357b
updates: bz#789278
Signed-off-by: Milind Changire <mchangir@redhat.com>
2018-10-09 02:25:51 +00:00