1601 Commits

Author SHA1 Message Date
Kaleb S. KEITHLEY
a14475fa83 core: assorted typos and spelling mistakes from Debian lintian
Plus minor readability improvements.

Reported-by: pmatthaei@debian.org

Change-Id: I5393819a2fc9f240a19811143bb57b127df717cf
BUG: 1466785
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/17660
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-07-03 12:47:13 +00:00
Jeff Darcy
5a67bc07db libglusterfs: add mem_pools_fini
This also makes mem_pools_init and mem_pools_fini re-callable, so GFAPI
can go through infinite init/fini cycles if they want to.  Not saying
that's a good idea, but at least it's safe.

Change-Id: I617913410bcff54568b802cb653f48bdd533bd65
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
Reviewed-on: https://review.gluster.org/17662
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-07-01 02:56:14 +00:00
Jeff Darcy
0be8e038d9 multiple: fix struct/typedef inconsistencies
The most common pattern, both in our code and elsewhere, is this:

   struct _xyz {
      ...
   };
   typedef struct _xyz xyz_t;

These exceptions - especially call_frame/call_stack - have been slowing
down code navigation for years.  By converging on a single pattern,
navigating from xyz_t in code to the actual definition of struct _xyz
(i.e. without having to visit the typedef first) might even be
automatable.

Change-Id: I0e5dd1f51f98e000173c62ef4ddc5b21d9ec44ed
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
Reviewed-on: https://review.gluster.org/17650
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2017-06-30 10:31:02 +00:00
Kotresh HR
292b4e42fd contrib/xxhash: Add xxhash library
xxhash is a faster non-cryptographic hash.
https://github.com/Cyan4973/xxHash

Release Taken: "xxHash v0.6.2"
--------------

Files added:
  contrib/xxhash/xxhash.c
  contrib/xxhash/xxhash.h
  contrib/xxhash/xxhsum.c

Modifications to source:
------------------------
Following functions and data types got 'GF_' prefix
as below to avoid any form of name collisions in future.

    ---- Functions ----
    GF_XXH_versionNumber
    GF_XXH32
    GF_XXH32_createState
    GF_XXH32_freeState
    GF_XXH32_copyState
    GF_XXH32_reset
    GF_XXH32_update
    GF_XXH32_digest
    GF_XXH32_canonicalFromHash
    GF_XXH32_hashFromCanonical
    GF_XXH64
    GF_XXH64_createState
    GF_XXH64_freeState
    GF_XXH64_copyState
    GF_XXH64_reset
    GF_XXH64_update
    GF_XXH64_digest
    GF_XXH64_canonicalFromHash
    GF_XXH64_hashFromCanonical

    ---- Data Types ----
    GF_XXH_errorcode
    GF_XXH32_state_t*
    GF_XXH32_canonical_t*
    GF_XXH32_hash_t
    GF_XXH64_state_t*
    GF_XXH64_canonical_t*
    GF_XXH64_hash_t

It is linked with libglusterfs.so. A wrapper
funtion is also added for the easy usage in
common-utils.c.

xxhash can be used for the all the usecases where
a faster non-cryptographic hash is required.
gfid to path infra would be using this for now.

NOTE:
----
The gluster coding guidelines check is ignored
as maintaining it further would be difficult.

Updates: #253
Change-Id: Ib143f90d91d4ee99864a10246d5983e92900173b
Signed-off-by: Kotresh HR <khiremat@redhat.com>
Reviewed-on: https://review.gluster.org/17641
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2017-06-30 08:16:57 +00:00
Aravinda VK
c0f1db29a2 logging: Support for Structured logging format
Wrapper for `gf_log` and `gf_msg` to add support for
structured logging format.

Two new wrappers available `gf_slog` and `gf_smsg`

Example 1: All static details

    gf_slog ("cli", GF_LOG_INFO, "Volume Set",
             "name=gv1",
             "option=changelog.changelog",
             "value=on",
             NULL);

    gf_smsg ("cli", GF_LOG_INFO, 0, MSGID_VOLUME_SET,
             "Volume Set",
             "name=gv1",
             "option=changelog.changelog",
             "value=on",
             NULL);

Example 2: Using Format chars in key values

    gf_slog ("cli", GF_LOG_INFO, "Volume Set",
             "name=%s", volume_name,
             "option=%s", option_name,
             "value=%s", option_value,
             NULL);

    gf_smsg ("cli", GF_LOG_INFO, 0, MSGID_VOLUME_SET,
             "Volume Set",
             "name=%s", volume_name,
             "option=%s", option_name,
             "value=%s", option_value,
             NULL);

Formats as,

    <EVENT><TAB><KEY1=VALUE1><TAB><KEY2=VALUE2>...

Example:

    Volume Set	name=gv1	option=changelog.changelog	value=on

Updates: #240
Change-Id: I871727be16a39f681d41f363daa0029b8066fb52
Signed-off-by: Aravinda VK <avishwan@redhat.com>
Reviewed-on: https://review.gluster.org/17543
Reviewed-by: MOHIT AGRAWAL <moagrawa@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Amar Tumballi <amarts@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
2017-06-29 11:55:39 +00:00
Jeff Darcy
e9c5b61861 libglusterfs: fix disable-mempool
Change-Id: I55f707ae1e7c3ad7fc0545f7aa657584cead58f9
BUG: 1465214
Signed-off-by: Jeff Darcy <jdarcy@fb.com>
Reviewed-on: https://review.gluster.org/17636
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ji-Hyeon Gim
Reviewed-by: Amar Tumballi <amarts@redhat.com>
2017-06-28 23:27:15 +00:00
Ji-Hyeon Gim
68cb9356d1 libglusterfs: build failed with GF_DISABLE_MEMPOOL
When we build GlusterFS with GF_DISBLE_MEMPOOL, it is failed due to macro
condition in mem-pool.c:mem_get().

Change-Id: I03fe804f93d761ea3bfdc3b20f0253a03350a68f
BUG: 1465214
Signed-off-by: Ji-Hyeon Gim <potatogim@potatogim.net>
Reviewed-on: https://review.gluster.org/17633
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-by: jiffin tony Thottan <jthottan@redhat.com>
Tested-by: Ji-Hyeon Gim
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2017-06-27 12:48:54 +00:00
Susant Palai
4700c5be55 cluster/rebalance: Use GF_XATTR_LIST_NODE_UUIDS_KEY to figure out local subvols.
Afr has introduced a new key GF_XATTR_LIST_NODE_UUIDS_KEY,
through which rebalance will figure out its local subvolumes.(Reference
bugid=1463250)

key: GF_XATTR_NODE_UUID_KEY will continue to serve it's old
purpose of returning the first afr chiild.

test: prove tests/basic/distribute/rebal-all-nodes-migrate.t

Change-Id: I4d602feda2a05b29d2210c712a07a4ac6b8bc112
BUG: 1463648
Signed-off-by: Susant Palai <spalai@redhat.com>
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Reviewed-on: https://review.gluster.org/17595
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
2017-06-26 10:47:00 +00:00
Sunil Kumar Acharya
0c0bc42ddf cluster/ec: Node uuid xattr support update for EC
Problem:
The change in EC to return list of node uuids for
GF_XATTR_NODE_UUID_KEY was causing problems with
geo-rep.

Fix:
This patch will allow to get the single node uuid
as it was doing before with the key
"GF_XATTR_NODE_UUID_KEY", and will also allow to get
the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve
the problem with geo-rep and any other features which
were depending on this.

BUG: 1462790
Change-Id: I2d9214a9658d4a41a3d6de08600884d2bda5f3eb
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/17594
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-06-23 02:56:35 +00:00
Jiffin Tony Thottan
a052b41324 gfapi : Resolve "." and ".." only for named lookups
The patch https://review.gluster.org/#/c/17177 resolves "." and ".."
to corrosponding inodes and names before sending the request to the
backend server. But this will only work if inode and its parent is
linked properly. Incase of nameless lookup(applications like ganesha)
the inode of parent can be NULL(only gfid is send). So this patch will
resolve "." and ".." only if proper parent is available

Change-Id: I4c50258b0d896dabf000a547ab180b57df308a0b
BUG: 1460514
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Reviewed-on: https://review.gluster.org/17502
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Poornima G <pgurusid@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: soumya k <skoduri@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-06-20 12:32:20 +00:00
karthik-us
475ec9928e cluster/afr: Returning single and list of node uuids from AFR
Problem:
The change in afr to return list of node uuids was causing problems
with geo-rep.

Fix:
This patch will allow to get the single node uuid as it was doing
before with the key "GF_XATTR_NODE_UUID_KEY", and will also allow
to get the list of node uuids by using a new key
"GF_XATTR_LIST_NODE_UUIDS_KEY". This will solve the problem with
geo-rep and any other feature which were depending on this.

Change-Id: I09885dac6dfca127be94b708470c8c2941356f9a
BUG: 1462790
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Reviewed-on: https://review.gluster.org/17576
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Kotresh HR <khiremat@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-06-20 12:32:08 +00:00
Krutika Dhananjay
b9fb7ea63d debug/io-stats: Provide option to select stats output format
... as opposed to hardcoding it to "json" always.

Change-Id: I5e79473a514373145ad764f24bb6219a6983a4c6
BUG: 1458197
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Reviewed-on: https://review.gluster.org/17451
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
2017-06-15 19:18:05 +00:00
Mohammed Rafi KC
0a20e56d07 protocol/server: make listen backlog value as configurable
problem:

When we call listen from protocol/server, we are giving a
hard coded valie of 10 if it is not manually given.
With multiplexing, especially when glusterd restarts all
clients may try to connect to the server at a time.
Which will result in overflowing the queue, and kernel
will complain about the errors.

Solution:

This patch will introduce a volume set command to make backlog
value as a configurable. This patch also changes the default
values for backlog from 10 to 128. This changes is only applicable
for sockets listening from protocol.

Example:

gluster volume set <volname> transport.listen-backlog 1024

Note: 1 Brick has to be restarted to get this value in effect
      2 This changes won't be reflected in glusterd, or other
        xlators which calls listen. If you need, you have to
        add this option to the volfile.

Change-Id: I0c5a2bbf28b5db612f9979e7560e05dd82b41477
BUG: 1456405
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: https://review.gluster.org/17411
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-06-08 15:32:30 +00:00
Amar Tumballi
d7105ba165 core: add more information on dictionary usage
when you take the 'statedump', it shows the output like below

-----
[dict]
max-number-of-dict-pairs=13
total-pairs-used=41613
total-dict-used=12629
average-pairs-per-dict=3
------

Updates #220

Change-Id: I71a7eda3a3cd23edf4483234f22f983923bbb081
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: https://review.gluster.org/4035
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-06-05 12:44:28 +00:00
Amar Tumballi
244daeb77b dict: add a simple hash comparision of keys before strcmp for performance
Updates #220

Change-Id: I03b1d2fac2dfcdd21bdf4e4fff19d49425699931
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: https://review.gluster.org/6450
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Jeff Darcy <jeff@pl.atyp.us>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-06-01 11:26:33 +00:00
Mohit Agrawal
dba55ae364 glusterfs: Not able to mount running volume after enable brick mux and stopped any volume
Problem: After enabled brick mux if any volume has down and then try ot run mount
         with running volume , mount command is hung.

Solution: After enable brick mux server has shared one data structure server_conf
          for all associated subvolumes.After down any subvolume in some
          ungraceful manner (remove brick directory) posix xlator sends
          GF_EVENT_CHILD_DOWN event to parent xlatros and server notify
          updates the child_up to false in server_conf.When client is trying
          to communicate with server through mount it checks conf->child_up
          and it is FALSE so it throws message "translator are not yet ready".
          From this patch updated structure server_conf to save child_up status
          for xlator wise. Another improtant correction from this patch is
          cleanup threads from server side xlators after stop the volume.

BUG: 1453977
Change-Id: Ic54da3f01881b7c9429ce92cc569236eb1d43e0d
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/17356
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra Talur <rtalur@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-05-31 20:43:53 +00:00
Ji-Hyeon Gim
6544347166 libglusterfs: updates old comment for 'arena_size'
the comment (libglusterfs/src/iobuf.h:88-90) for 'arena_size' field
in 'struct iobuf_arena' is not valid anymore. According to line 190 in
__iobuf_arena_alloc() no longer follows that equation.

Change-Id: I68558164b309123cf19093e2da89bc156df294fd
BUG: 1455831
Signed-off-by: Ji-Hyeon Gim <potatogim@gluesys.com>
Reviewed-on: https://review.gluster.org/17393
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Shyamsundar Ranganathan <srangana@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-28 13:30:55 +00:00
Gaurav Yadav
23930326e0 libglusterfs : Fix crash in glusterd while peer probing
glusterd crashes when port is being set explcitly to a
range which is outside greater than short data type range.
Eg. sysctl net.ipv4.ip_local_reserved_ports="49152-49156"
In above case glusterd crashes while parsing the port.

With this fix glusterd will be able to handle port range
between INT_MIN to INT_MAX

Change-Id: I7c75ee67937b0e3384502973d96b1c36c89e0fe1
BUG: 1454418
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
Reviewed-on: https://review.gluster.org/17359
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-05-26 12:08:36 +00:00
Csaba Henk
98db583e9b libglusterfs: extract some functionality to functions
- code in run.c to close all file descriptors,
  except for specified ones is extracted to

    int close_fds_except (int *fdv, size_t count);

- tokenizing and editing a string that consists
  of comma-separated tokens (as done eg. in
  mount_param_to_flag() of contrib/fuse/mount.c
  is abstacted into the following API:

    char *token_iter_init (char *str, char sep, token_iter_t *tit);
    gf_boolean_t next_token (char **tokenp, token_iter_t *tit);
    void drop_token (char *token, token_iter_t *tit);

Updates #153

Change-Id: I7cb5bda38f680f08882e2a7ef84f9142ffaa54eb
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: https://review.gluster.org/17229
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
2017-05-23 13:21:20 +00:00
Sunil Kumar Acharya
acf8cc3a22 cluster/ec: Implement FALLOCATE FOP for EC
FALLOCATE file operations is not implemented in the
existing EC code. This change set implements it
for EC.

BUG: 1448293
Change-Id: Id9ed914db984c327c16878a5b2304a0ea461b623
Signed-off-by: Sunil Kumar Acharya <sheggodu@redhat.com>
Reviewed-on: https://review.gluster.org/15200
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-23 07:13:06 +00:00
Poornima G
284cd8851b nl-cache: In case of nameless operations do not cache
Issue:
In nameless lookup/other fops, parent inode will be NULL, when we try
to add the cache to the NULL inode, it causes a crash.

Hence handle the scenario of nameless fops, and do not cache/serve
the nameless fops.

Change-Id: I3b90f882ac89e6aaf3419db89e6f890797f37700
BUG: 1451588
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/17316
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-22 12:39:59 +00:00
Atin Mukherjee
86ad032949 glusterfsd: send PARENT_UP on brick attach
With brick multiplexing being enabled, if a brick is instance attached to a
process then a PARENT_UP event is needed so that it reaches right till
posix layer and then from posix CHILD_UP event is sent back to all the
children.

Change-Id: Ic341086adb3bbbde0342af518e1b273dd2f669b9
BUG: 1447389
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-on: https://review.gluster.org/17225
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-05-14 21:10:29 +00:00
Mohammed Rafi KC
269e2ccf45 gfapi: fix handling of dot and double dot in path
This patch is to handle "." and ".." in file path. Which means
this special dentry names will be resolved before sending fops
on the path.

Change-Id: I5e92f6d1ad1412bf432eb2488e53fb7731edb013
BUG: 1447266
Signed-off-by: Mohammed Rafi KC <rkavunga@redhat.com>
Reviewed-on: https://review.gluster.org/17177
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-05-14 13:13:59 +00:00
Raghavendra G
cea8b70250 event/epoll: Add back socket for polling of events immediately after
reading the entire rpc message from the wire

Currently socket is added back for future events after higher layers
(rpc, xlators etc) have processed the message. If message processing
involves signficant delay (as in writev replies processed by Erasure
Coding), performance takes hit. Hence this patch modifies
transport/socket to add back the socket for polling of events
immediately after reading the entire rpc message, but before
notification to higher layers.

credits: Thanks to "Kotresh Hiremath Ravishankar"
         <khiremat@redhat.com> for assitance in fixing a regression in
         bitrot caused by this patch.

Change-Id: I04b6b9d0b51a1cfb86ecac3c3d87a5f388cf5800
BUG: 1448364
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: https://review.gluster.org/15036
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
2017-05-12 05:26:42 +00:00
Zhou Zhengping
333474e0d6 libglusterfs: fix race condition in client_ctx_set
follow procedures:
	1.thread1 client_ctx_get return NULL
	2.thread 2 client_ctx_set ctx1 ok
	3.thread1 client_ctx_set ctx2 ok

	thread1 use ctx1, thread2 use ctx2 and ctx1 will leak

Change-Id: I990b02905edd1b3179323ada56888f852d20f538
BUG: 1449232
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17219
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-05-12 04:47:44 +00:00
Kinglong Mee
4b03314e14 libglusterfs: include sys/time.h avoid compiling error on MacOSX
Change-Id: I3d30bacc3d5d085220dd85a3919207deef8bd1dd
Signed-off-by: Kinglong Mee <mijinlong@open-fs.com>
Reviewed-on: https://review.gluster.org/17114
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Tested-by: Prashanth Pai <ppai@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-09 15:20:25 +00:00
Zhou Zhengping
0f552c509f libglusterfs/graph: fix potential endless loop in options validate work
Change-Id: Icb71ded6051afe44e07480e0499d2a39f05fac71
BUG: 1447826
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17171
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
2017-05-09 12:57:05 +00:00
Mohit Agrawal
21c7f7bacc glusterd: socketfile & pidfile related fixes for brick multiplexing feature
Problem: While brick-muliplexing is on after restarting glusterd, CLI is
         not showing pid of all brick processes in all volumes.

Solution: While brick-mux is on all local brick process communicated through one
          UNIX socket but as per current code (glusterd_brick_start) it is trying
          to communicate with separate UNIX socket for each volume which is populated
          based on brick-name and vol-name.Because of multiplexing design only one
          UNIX socket is opened so it is throwing poller error and not able to
          fetch correct status of brick process through cli process.
          To resolve the problem write a new function glusterd_set_socket_filepath_for_mux
          that will call by glusterd_brick_start to validate about the existence of socketpath.
          To avoid the continuous EPOLLERR erros in  logs update socket_connect code.

Test:     To reproduce the issue followed below steps
          1) Create two distributed volumes(dist1 and dist2)
          2) Set cluster.brick-multiplex is on
          3) kill glusterd
          4) run command gluster v status
          After apply the patch it shows correct pid for all volumes

BUG: 1444596
Change-Id: I5d10af69dea0d0ca19511f43870f34295a54a4d2
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Reviewed-on: https://review.gluster.org/17101
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
2017-05-09 01:30:01 +00:00
Csaba Henk
0d8923d6d7 libglusterfs: stop special casing "cache-size" in size_t validation
The original situation was as follows:

The function that validates xlator options indicating a size,
xlator_option_validate_sizet(), handles the case when the name
of the option is "cache-size" in a special way.

- Xlator options (things of type volume_option_t) has a
  min and max attribute of type double.
- An xlator option is endowed with a gluster specific type (not
  C type). An instance of an xlator option goes through a validation
  process by a type specific validator function (which are collected
  in option.c).
- Validators of numeric types - size being one of them - make use the
  min and max attributes to perform a range check, except in one case:
  if an option is defined with min = max = 0, then this option will be
  exempt of range checking. (Note: the volume_option_t definition
  features the following comments along the min, max fields:

   double                  min;  /* 0 means no range */
   double                  max;  /* 0 means no range */

  which is slightly misleading as it lets one to conclude that
  zeroing min or max buys exemption from low or high boundary check,
  which is not true -- only *both* being zero buys exemption.)
- Besides this, the validator for options of size type,
  xlator_option_validate_sizet() special cases options
  named "cache-size" so that only min is enforced. (The only consequence
  of a value exceeding max is that glusterd logs a warning about it, but
  the cli user who makes such a setting gets no feedback on it.)
- This was introduced because a hard coded limit is not useful for
  io-cache and quick-read. They rather use a runtime calculated
  upper limit. (See changes
  I7dd4d8c53051b89a293696abf1ee8dc237e39a20
  I9c744b5ace10604d5a814e6218ca0d83c796db80
  about the last two points.)
- As an unintended consequence, the upper limit check of
  cache-size of write-behind, for which a conventional hard coded limit
  is specified, is defeated.

What we do about it:

- Remove the special casing clause for cache-size in
  xlator_option_validate_sizet. Thus the general range
  check policy (as described above) will apply to
  cache-size too.
- To implement a lower bound only check by the validator
  for cache-size of io-cache and quick-read, change the
  max attribute of these options to INFINITY.

The only behavioral difference is the omission of the warnings
about cache-size of io-cache and quick-read exceeding the former max
values. (They were rather heuristic anyway.)

BUG: 1445609
Change-Id: I0bd8bd391fa7d926f76e214a2178833fe4673b4a
Signed-off-by: Csaba Henk <csaba@redhat.com>
Reviewed-on: https://review.gluster.org/17125
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-08 13:37:54 +00:00
Kaushal M
de80a9153e rpc: Remove accidental IPV6 changes
They snuck in with the HALO patch (07cc8679c)

Change-Id: I8ced6cbb0b49554fc9d348c453d4d5da00f981f6
BUG: 1447953
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: https://review.gluster.org/17174
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
2017-05-05 11:39:28 +00:00
Zhou Zhengping
9f930646ee fd: We should use dict_set_uint64 to store fd->pid
Change-Id: I136372b9929d3ecf243649b6945571a67bfd80eb
BUG: 1447828
Signed-off-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-on: https://review.gluster.org/17172
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Tested-by: Amar Tumballi <amarts@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-05-04 15:42:08 +00:00
Manikandan Selvaganesh
6484558c75 SELinux : implementation of SELinux translator
The patch implement a part of SELinux translator to support setting
SELinux contexts on files in a glusterfs volume.

URL: https://github.com/gluster/glusterfs-specs/blob/master/accepted/SELinux-client-support.md

Change-Id: Id8916bd8e064ccf74ba86225ead95f86dc5a1a25
BUG: 1318100
Fixes : #55
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/13762
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Manikandan Selvaganesh <manikandancs333@gmail.com>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
2017-05-03 09:34:11 +00:00
Kevin Vigor
07cc8679cd Halo Replication feature for AFR translator
Summary:
Halo Geo-replication is a feature which allows Gluster or NFS clients to write
locally to their region (as defined by a latency "halo" or threshold if you
like), and have their writes asynchronously propagate from their origin to the
rest of the cluster.  Clients can also write synchronously to the cluster
simply by specifying a halo-latency which is very large (e.g. 10seconds) which
will include all bricks.

In other words, it allows clients to decide at mount time if they desire
synchronous or asynchronous IO into a cluster and the cluster can support both
of these modes to any number of clients simultaneously.

There are a few new volume options due to this feature:
  halo-shd-latency:  The threshold below which self-heal daemons will
  consider children (bricks) connected.

  halo-nfsd-latency: The threshold below which NFS daemons will consider
  children (bricks) connected.

  halo-latency: The threshold below which all other clients will
  consider children (bricks) connected.

  halo-min-replicas: The minimum number of replicas which are to
  be enforced regardless of latency specified in the above 3 options.
  If the number of children falls below this threshold the next
  best (chosen by latency) shall be swapped in.

New FUSE mount options:
  halo-latency & halo-min-replicas: As descripted above.

This feature combined with multi-threaded SHD support (D1271745) results in
some pretty cool geo-replication possibilities.

Operational Notes:
- Global consistency is gaurenteed for synchronous clients, this is provided by
  the existing entry-locking mechanism.
- Asynchronous clients on the other hand and merely consistent to their region.
  Writes & deletes will be protected via entry-locks as usual preventing
  concurrent writes into files which are undergoing replication.  Read operations
  on the other hand should never block.
- Writes are allowed from _any_ region and propagated from the origin to all
  other regions.  The take away from this is care should be taken to ensure
  multiple writers do not write the same files resulting in a gfid split-brain
  which will require resolution via split-brain policies (majority, mtime &
  size).  Recommended method for preventing this is using the nfs-auth feature to
  define which region for each share has RW permissions, tiers not in the origin
  region should have RO perms.

TODO:
- Synchronous clients (including the SHD) should choose clients from their own
  region as preferred sources for reads.  Most of the plumbing is in place for
  this via the child_latency array.
- Better GFID split brain handling & better dent type split brain handling
  (i.e. create a trash can and move the offending files into it).
- Tagging in addition to latency as a means of defining which children you wish
  to synchronously write to

Test Plan:
- The usual suspects, clang, gcc w/ address sanitizer & valgrind
- Prove tests

Reviewers: jackl, dph, cjh, meyering

Reviewed By: meyering

Subscribers: ethanr

Differential Revision: https://phabricator.fb.com/D1272053

Tasks: 4117827

Change-Id: I694a9ab429722da538da171ec528406e77b5e6d1
BUG: 1428061
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Reviewed-on: http://review.gluster.org/16099
Reviewed-on: https://review.gluster.org/16177
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
2017-05-02 10:23:53 +00:00
Niels de Vos
73fcf3a874 core: make the per glusterfs_ctx_t timer-wheel refcounted
xlators can use a 'global' timer-wheel for scheduling events. This
timer-wheel is managed per glusterfs_ctx_t, but does not need to be
allocated for every graph. When an xlator wants to use the timer-wheel,
it will be instanciated on demand, and provided to xlators that request
it later on.

By adding a reference counter to the glusterfs_ctx_t for the
timer-wheel, the threads and structures can be cleaned up when the last
xlator does not have a need for it anymore. In general, the xlators
request the timer-wheel in init(), and they should return it in fini().

Because the timer-wheel is managed per glusterfs_ctx_t, the functions
can be added to ctx.c and do not need to live in their very minimal
tw.[ch] files.

Change-Id: I19d225b39aaa272d9005ba7adc3104c3764f1572
BUG: 1442788
Reported-by: Poornima G <pgurusid@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/17068
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Zhou Zhengping <johnzzpcrystal@gmail.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
2017-05-01 09:30:14 +00:00
Susant Palai
8b2ef50762 dht: send lookup on old name inside rename with bname and pargfid
Inside rename, a lookup is done on the source name to make sure that
the file is there. But we used to do a gfid based lookup and hence,
even if the source name was renamed to a new name from some other client,
lookup will be successful as server3_3_lookup will fetch the new path
based on the gfid.

So even if the source file does not exist any more rename will carry on,
and as server3_3_link(destination is hashed to a different brick other
than source cached scenario) also does gfid based resolve, it wont
detect that the source name does not exist and hardlink creation will be
successful (since gfid based resolve will get the new dentry).

To solve this problem, do a name based lookup inside rename. So that
rename will fail right away if the source does not exist.

Change-Id: Ieba8bdd6675088dbf18de90ed4622df043d163bd
BUG: 1412135
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16375
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
2017-04-29 14:28:38 +00:00
Susant Palai
bff6b7b1d7 cluster/dht: rebalance perf enhancement
Problem: Throttle settings "normal" and "aggressive" for rebalance
did not have performance difference.

normal mode spawns $(no. of cores - 4)/2 threads and aggressive
spawns $(no. of cores - 4) threads. Though aggressive mode has twice
the number of threads compared to that of normal mode, there was no
performance gain when switched to aggressive mode from normal mode.

RCA:
During the course of debugging the above problem, we tried assigning
migration job to migration threads spawned by rebalance, rather than
synctasks(as there is more overhead associated to manage the task
queue and threads). This gave us a significant improvement over rebalance
under synctasks. This patch does not really gurantee that there will be a
clear performance difference between normal and aggressive mode, but this
patch certainly maximized the disk utilization for 1GBfiles run.

Results:

Test enviroment:
Gluster Config:
Number of Bricks: 2 (one brick per disk(RAID-6 12 disk))
Bricks:
Brick1: server1:/brick/test1/1
Brick2: server2:/brick/test1/1
Options Reconfigured:
performance.readdir-ahead: on
server.event-threads: 4
client.event-threads: 4

1000 files with 1GB each were created/renamed such that all files will have
server1 as cached and server2 as hashed, so that all files will be migrated.

Test machines had 24 cores each.

Results  with/without synctask based migration:
-----------------------------------------------

mode                    normal(10threads)          aggressive(20threads)

timetaken               0:55:30 (hⓂ️s)            0:56:3 (hⓂ️s)
withsynctask

timetaken
with migrator           0:38:3 (hⓂ️s)             0:23:41 (hⓂ️s)
threads

From above table it can be seen that, there is a clear 2x perf gain between
rebalance with synctask vs rebalance with migrator threads.

Additionally this patch modifies the code so that caller will have the exact error
number returned by dht_migrate_file(earlier the errno meaning was overloaded). This
will help avoiding scenarios where migration failure due to ENOENT, can result in
rebalance abort/failure.

Change-Id: I8904e2fb147419d4a51c1267be11a08ffd52168e
BUG: 1420166
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: https://review.gluster.org/16427
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: N Balachandran <nbalacha@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-04-29 14:28:19 +00:00
Amar Tumballi
c6b635d5b2 libglusterfs/stack.h: reduce duplication of code
* Use STACK_UNWIND_STRICT everywhere.
* Provide STACK_WIND_COMMON as both STACK_WIND_COOKIE
  and STACK_WIND differ by just 1 line and 1 option.

Updates gluster/glusterfs#137

Change-Id: Ifbb6b9c4702b02f4a02834824f509fd10c78f0ce
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: https://review.gluster.org/16915
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-04-27 14:54:40 +00:00
Kotresh HR
4076b73b2f feature/dht: Directory synchronization
Design doc: https://review.gluster.org/16876

Directory creation is now synchronized with blocking inodelk of the
parent on the hashed subvolume followed by the entrylk on the hashed
subvolume between dht_mkdir, dht_rmdir, dht_rename_dir and lookup
selfheal mkdir.

To maintain internal consistency of directories across all subvols of
dht, we need locks. Specifically we are interested in:

 1. Consistency of layout of a directory. Only one writer should modify
    the layout at a time. A writer (layout setting during directory heal
    as part of lookup) shouldn't modify the layout while there are
    readers (all other fops like create, mkdir etc., which consume
    layout) and readers shouldn't read the layout while a writer is in
    progress. Readers can read the layout simultaneously. Writer takes
    a WRITE inodelk on the directory (whose layout is being modified)
    across ALL subvols. Reader takes a READ inodelk on the directory
    (whose layout is being read) on ANY subvol.

 2. Consistency of directory namespace across subvols. The path and
    associated gfid should be same on all subvols. A gfid should not be
    associated with more than one path on any subvol. All fops that can
    change directory names (mkdir, rmdir, renamedir, directory creation
    phase in lookup-heal) takes an entrylk on hashed subvol of the
    directory.

 NOTE1: In point 2 above, since dht takes entrylk on hashed subvol of a
        directory, the transaction itself is a consumer of layout on
        parent directory. So, the transaction is a reader of parent
        layout and does an inodelk on parent directory just like any
        other layout reader. So a mkdir (dir/subdir) would:

     > Acquire a READ inodelk on "dir" on any subvol.
     > Acquire an entrylk (dir, "subdir") on hashed subvol of "subdir".
     > creates directory on hashed subvol and possibly on non-hashed subvols.
     > UNLOCK (entrylk)
     > UNLOCK (inodelk)

 NOTE2: mkdir fop while setting the layout of the directory being created
        is considered as a reader, but NOT a writer. The reason is for
        a fop which can consume the layout of a directory to come either
        of the following conditions has to be true:

     > mkdir syscall from application has to complete. In this case no
       need of synchronization.
     > A lookup issued on the directory racing with mkdir has to complete.
       Since layout setting by a lookup is considered as a writer, only
       one of either mkdir or lookup will set the layout.

Code re-organization:
   All the lock related routines are moved to "dht-lock.c" file.
   New wrapper function is introduced to take blocking inodelk
   followed by entrylk 'dht_protect_namespace'

Updates #191
Change-Id: I01569094dfbe1852de6f586475be79c1ba965a31
Signed-off-by: Kotresh HR <khiremat@redhat.com>
BUG: 1443373
Reviewed-on: https://review.gluster.org/15472
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
2017-04-26 09:00:34 +00:00
Niels de Vos
1538c98f5e libglusterfs: accept random volname in glusterfs_graph_prepare()
When the call to glfs_new("volname") passes a name for the volume and it
does not match the name of the subvolume in the graph, glfs_init() will
fail. This is easily reproducible by a gfapi program that loads the
volume from a .vol file, and not from a GlusterD server.

Change-Id: I33e77fbee7d12eaefe7c384fad6aecfa3582ea5a
BUG: 1425623
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16796
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Prashanth Pai <ppai@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
2017-04-26 01:18:56 +00:00
Jungyeon Yoon
6d021a2810 remove redundant function definition
function definition is duplicated

erase second sys_ftruncate() definition

No test cases

Change-Id: I3eead1380b527b8b0e480f45b39e0c4bc9b2b92a
Signed-off-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
Reviewed-on: https://review.gluster.org/17106
Smoke: Gluster Build System <jenkins@build.gluster.org>
Tested-by: Prashanth Pai <ppai@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-04-25 15:00:04 +00:00
Poornima G
561766e45a Implement negative lookup cache
Before creating any file negative lookups(1 in Fuse, 4 in SMB etc.)
are sent to verify if the file already exists. By serving these
lookups from the cache when possible, increases the create
performance by multiple folds in SMB access and some percentage
in Fuse/NFS access.

Feature page: https://review.gluster.org/#/c/16436

Updates #82
Change-Id: Ib1c0e7ac7a386f943d84f6398c27f9a03665b2a4
BUG: 1442569
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/16952
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
2017-04-20 00:18:52 -04:00
Poornima G
94196dee1f dht: Add readdir-ahead in rebalance graph if parallel-readdir is on
Issue:
The value of linkto xattr is generally the name of the dht's
next subvol, this requires that the next subvol of dht is not
changed for the life time of the volume. But with parallel
readdir enabled, the readdir-ahead loaded below dht, is optional.
The linkto xattr for first subvol, when:
- parallel readdir is enabled : "<volname>-readdir-head-0"
- plain distribute volume : "<volname>-client-0"
- distribute replicate volume : "<volname>-afr-0"

The value of linkto xattr is "<volname>-readdir-head-0" when
parallel readdir is enabled, and is "<volname>-client-0" if
its disabled. But the dht_lookup takes care of healing if it
cannot identify which linkto subvol, the xattr points to.

In dht_lookup_cbk, if linkto xattr is found to be "<volname>-client-0"
and parallel readdir is enabled, then it cannot understand the
value "<volname>-client-0" as it expects "<volname>-readdir-head-0".
In that case, dht_lookup_everywhere is issued and then the linkto file
is unlinked and recreated with the right linkto xattr. The issue is
when parallel readdir is enabled, mount point accesses the file
that is currently being migrated. Since rebalance process doesn't
have parallel-readdir feature, it expects "<volname>-client-0"
where as mount expects "<volname>-readdir-head-0". Thus at some point
either the mount or rebalance will fail.

Solution:
Enable parallel-readdir for rebalance as well and then do not
allow enabling/disabling parallel-readdir if rebalance is in
progress.

Change-Id: I241ab966bdd850e667f7768840540546f5289483
BUG: 1436090
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: https://review.gluster.org/17056
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
2017-04-18 02:16:11 -04:00
Milind Changire
6b8df081b4 rpc: add options to manage socket keepalive lifespan
Problem:
Default values for handling socket timeouts for brick responses are
insufficient for aggressive applications such as databases.

Solution:
Add 1:1 gluster options for keepalive, keepalive-idle,
keepalive-interval and keepalive-timeout as per the socket level options
available as per tcp(7) man page.

Default values for options are NOT agressive and continue to be values
which result in default timeout when only the keep alive option is
turned on.

These options are Linux specific and will not be applicable to the
*BSDs.

Change-Id: I2a08ecd949ca8ceb3e090d336ad634341e2dbf14
BUG: 1426059
Signed-off-by: Milind Changire <mchangir@redhat.com>
Reviewed-on: https://review.gluster.org/16731
Smoke: Gluster Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
2017-04-12 05:44:01 -04:00
Niels de Vos
ec0e1176d4 mem-pool: use gf_atomic_t for atomic counters
Reduce the usage of __sync_fetch_and_add() builtins in mem-pool. The new
gf_atomic_t type can be used instead, so that the architecture and
compiler specific builtins are hidden from the mem-pool implementation.

BUG: 1437037
Change-Id: Icbeeb187dd2b835b35f32f54f821ceddfc7b2638
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/17012
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
2017-04-10 11:05:56 -04:00
Niels de Vos
ef36ac0d1b xlator: do not call dlclose() when debugging
Valgrind can not show the symbols if a .so after calling dlclose(). The
unhelpful ??? in the output gets resolved properly with this change:

  ==25170== 344 bytes in 1 blocks are definitely lost in loss record 233 of 324
  ==25170==    at 0x4C29975: calloc (vg_replace_malloc.c:711)
  ==25170==    by 0x52C7C0B: __gf_calloc (mem-pool.c:117)
  ==25170==    by 0x12B0638A: ???
  ==25170==    by 0x528FCE6: __xlator_init (xlator.c:472)
  ==25170==    by 0x528FE16: xlator_init (xlator.c:498)
  ==25170==    by 0x52DA8D6: glusterfs_graph_init (graph.c:321)
  ==25170==    by 0x52DB587: glusterfs_graph_activate (graph.c:695)
  ==25170==    by 0x5046407: glfs_process_volfp (glfs-mgmt.c:79)
  ==25170==    by 0x5043B9E: glfs_volumes_init (glfs.c:281)
  ==25170==    by 0x5044FEC: glfs_init_common (glfs.c:986)
  ==25170==    by 0x50451A7: glfs_init@@GFAPI_3.4.0 (glfs.c:1031)

By not calling dlclose(), the dynamically loaded .so is still available
upon program exit, and Valgrind is able to resolve the symbols. This
will add an additional leak, so dlclose() is called for normal builds,
but skipped when configuring with "./configure --enable-valgrind" or
passing the "run-with-valgrind" xlator option.

URL: http://valgrind.org/docs/manual/faq.html#faq.unhelpful
Change-Id: I2044e21b1b8fcce32ad1a817fdd795218f967731
BUG: 1425623
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16809
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Samikshan Bairagya <samikshan@gmail.com>
Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
2017-04-07 13:17:12 -04:00
Niels de Vos
93e3c9abce libglusterfs: provide standardized atomic operations
The current macros ATOMIC_INCREMENT() and ATOMIC_DECREMENT() expect a
lock as first argument. There are at least two issues with this
approach:

  1. this lock is unused on architectures that have atomic operations
  2. some structures use a single lock for multiple variables

By defining a gf_atomic_t type, the unused lock can be removed, saving a
few bytes on modern architectures.

Because the gf_atomic_t type locates the lock for the variable (in case
of older architectures), each variable is protected the same on all
architectures. This makes the behaviour across all architectures more
equal (per variable locking, by a gf_lock_t or compiler optimization).

BUG: 1437037
Change-Id: Ic164892b06ea676e6a9566f8a98b7faf0efe76d6
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reviewed-on: https://review.gluster.org/16963
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Amar Tumballi <amarts@redhat.com>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-04-05 09:14:26 -04:00
Kaleb S. KEITHLEY
1063efeb22 build: clang has __builtin_popcount() and __builtin_ffs()
Note: Even though gcc(1) will automatically treat ffs() and popcount()
as built-in, calling them explicitly as __builtin in the source helps
make it easier to find them. (And if no __builtin_ffs use the one in
libc.)

Change-Id: Ib74d9b221ff03a01df5ad05907024da1a83a7a88
BUG: 1438772
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Reviewed-on: https://review.gluster.org/16993
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
2017-04-05 02:12:53 -04:00
Manikandan Selvaganesh
4c3aa910e7 libglusterfs: add dict_rename_key()
The dict_rename_key() function will be used for converting the
"security.selinux" xattr to "trusted.gluster.selinux" in the upcoming
SELinux xlator.

BUG: 1318100
Change-Id: Ic5d0b9127e2c360d355f02e200a820597e83fa2c
Signed-off-by: Manikandan Selvaganesh <mselvaga@redhat.com>
Signed-off-by: Jiffin Tony Thottan <jthottan@redhat.com>
[ndevos: split from change Id8916bd8e064ccf74ba86225ead95f86dc5a1a25]
Reviewed-on: https://review.gluster.org/16616
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Tested-by: Niels de Vos <ndevos@redhat.com>
Smoke: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Jeff Darcy <jeff@pl.atyp.us>
2017-03-31 09:07:00 -04:00
Ravishankar N
0f98f5c807 syncop: don't wake task in synctask_wake unless really needed
Problem:

In EC and AFR, we launch synctasks during self-heal.

(i) These tasks usually stackwind a FOP to all its children and call
synctask_yield() which does a swapcontext to synctask_switchto() and puts the
task in syncenv's waitq by calling __wait(task). This happends as long as the
FOP ckbs from all children haven't been received.

(ii) For each FOP cbk, we call synctask_wake() which again does a swapcontext
to synctask_switchto() which now puts the task in syncenv's runq by calling
__run(task). When the task runs and the conext switches back to the FOP path,
it puts the task in waitq because we haven't heard from all children as
explained in (i).

Thus we are unnecessarily using the swapcontext syscalls to just toggle
the task back and forth between the waitq and runq.

Fix:
Store the stackwind count in new variable 'syncbarrier->waitfor' before
winding the fop. In each cbk when we call synctask_wake(),  perform an actual
wake only if the cbk count == stackwind count.

Change-Id: Id62d3b6ffed5a8c50f8b79267fb34e9470ba5ed5
BUG: 1434274
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
Reviewed-on: https://review.gluster.org/16931
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
2017-03-28 18:34:48 -04:00
Pranith Kumar K
df189a8644 syncop: Fix args for makecontext
Passing 64bit arguments to makecontext is not portable and manpage says the following:

<snip>
On  architectures where int and pointer types are the same size (e.g., x86-32, where
both types are 32 bits), you may be able to get away with passing pointers as  argu‐
ments  to makecontext() following argc.  However, doing this is not guaranteed to be
portable, is undefined according to the standards, and won't work  on  architectures
where pointers are larger than ints.  Nevertheless, starting with version 2.8, glibc
makes some changes to makecontext(), to permit this  on  some  64-bit  architectures
(e.g., x86-64).
</snip>

Since we do not depend on the arguments, it is better to change makecontext to not
take any arguments.

BUG: 1434274
Change-Id: Ic46c9e9faaeb2f78e4efde353ef861466515b1ec
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: https://review.gluster.org/16951
Smoke: Gluster Build System <jenkins@build.gluster.org>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
2017-03-28 01:55:41 -04:00