1081 Commits

Author SHA1 Message Date
Luis Pabon
c817c21403 build: GlusterFS Unit Test Framework
This patch will allow for developers to create unit tests for
their code.  Documentation has been added to the patch and
is available here:

doc/hacker-guide/en-US/markdown/unittest.md

Also, unit tests are run when RPM is created.

BUG: 1067059
Change-Id: I95cf8bb0354d4ca4ed4476a0f2385436a17d2369
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Signed-off-by: Luis Pabon <lpabon@redhat.com>
Reviewed-on: http://review.gluster.org/7145
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Rajesh Joseph <rjoseph@redhat.com>
Reviewed-by: Justin Clift <justin@gluster.org>
Tested-by: Justin Clift <justin@gluster.org>
2014-03-06 04:10:46 -08:00
Anand Avati
0cab34b3a5 core: add @xdata parameter to syncop_[f]removexattr()
To be used in afr metadata self-heal

Change-Id: I8dac4b19d61e331702427eeb5b606aab3d20b328
BUG: 1021686
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6941
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
2014-02-13 11:17:05 -08:00
Poornima
43f71899c4 cluster/stripe: Fix the possible resource leaks.
Change-Id: Ic6fbc8c843f80edd7458d15229eb72a5609973a5
BUG: 789278
Signed-off-by: Poornima <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6986
Reviewed-by: Amar Tumballi <amarts@gmail.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2014-02-12 21:49:43 -08:00
Poornima
db3b2149ee cluster/stripe: Fix the resource leaks.
Change-Id: Ieb1fe112686f4932a6272a0117c1373e736d5b4e
BUG: 789278
Signed-off-by: Poornima <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6951
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-02-12 16:36:16 -08:00
Venkatesh Somyajulu
408d50a64b dht: Modified dht-statedump to print all subvolume_status
Change-Id: I1aae33472bd15fc2bd7a170544f2994534fdf246
BUG: 1058204
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6800
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-02-11 02:45:37 -08:00
Christopher R. Hertel
c1c3ab9e4c cluster/afr: goto statements may cause exit before memory is freed.
Memory is allocated for pump_priv and for pump_priv->resume_path, but if
an error is detected the references to that memory go out of scope and
the memory is never freed.

This patch assures that the memory is freed on error.

Patchset 2: These are Kaleb's recommended changes which, compared to my
            original fix, are more comprehensive and provide a more
            complete resolution to the memory leakage bugs in this
            function.  The bug reported by Coverity was limited to a
            single memory allocation.

BUG: 789278
CID: 1124737

Change-Id: Ie239e3b5d28d97308bf948efec6a92f107bc648b
Signed-off-by: Christopher R. Hertel <crh@redhat.com>
Reviewed-on: http://review.gluster.org/6929
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-02-10 08:38:57 -08:00
Susant Palai
14792bd894 cluster/dht: If hashed_subvol is NULL, do not fail
Problem: With the current implementation we are allowing unlink
of a file if hashed subvol is down and cached subvol is up.
For the above op to work we should have the info of hashed_subvol.
But incase we do remount of the volume we will have a zeroed layout
for the disconnected subvol(start=0, stop=0, err=ENOTCONN) which will
result  into hashed_subvol being NULL and failing unlink op.

Solution: Dont fail if hashed_subvol is NULL. Check cached subvol
and unlink in cached subvol. The linkto file in the hashed subvol
can be remove later.

Change-Id: Ic1982c15c8942a1adcb47ed0017d2d5ace5c9241
BUG: 983416
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/6851
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-02-08 11:29:18 -08:00
Poornima
f32e00692e cluster/afr: Fix memory leak.
Change-Id: I811d104684905a5a9a794cde8e925bd1a97f6546
BUG: 789278
Signed-off-by: Poornima <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6906
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-02-08 08:21:21 -08:00
ShyamsundarR
0c721c4c04 cluster/dht: Set restrictive open flags for files under rebalance
Files that are being rebalanced are created in the new volume
and access path needs to open these files to write changing
data in parallel to both the old and new locations. While opening
the file in the new location, we need to restrict the open flags
to not use truncate or create and fail if exist flags, to prevent
open failures or inadvertently truncate the file under rebalance.

Change-Id: I12130e0377adc393f1925c45585200ad991fd0d5
BUG: 1058569
Signed-off-by: ShyamsundarR <srangana@redhat.com>
Reviewed-on: http://review.gluster.org/6830
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-02-04 09:56:57 -08:00
Poornima G
120235d6f5 cluster/dht: Fix layout sorting
The layout was not being sorted in the ascending order leading
to the wrong detection of holes/overlaps.

From looking at the previous git commits it appears that the initial version itself had the err comparison code.
Deductions from the current dht_layout_sort():
1. The zero'ed out layouts should be in the from of list, if needed
2. The layout should be sorted in the ascending order of layout error value.
3. The layout should be sorted in the ascending order of the layout 'start'.
But In some cases, with the err comparison code its not sorted in the ascending order. Example: If the input is as below for dht_layout_sort(), the sorting doesn't happen in ascending order.
Input:
0-1 err:0    2-3 err:0   6-7 err:0    0-0 err:20   4-5 err:0
          
With the current sort, Output:
4-5 err:0    0-0 err:0    0-1 err:0    2-3 err:0    6-7 err:0
Expected: 0-0 err:20 0-1 err:0 2-3 err:0 4-5 err:0 6-7 err:0
Looking at dht_layout_anomalies() it appears that, it doesn't require the layout to be sorted based on error value.
The other solution was to replace line 468 with:
 if ((layout->list[i].err || layout->list[j].err) && (layout->list[i].start > layout->list[j].start))
Since dht_layout_anomalies() didn't expect the layout to be sorted based on the error, removed the err comparison.

Change-Id: I1215f6cd53efc7dba01c0958ba6cc7609dab6ff5
BUG: 1056406
Signed-off-by: Poornima G <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6757
Reviewed-by: Anand Avati <avati@redhat.com>
Tested-by: Anand Avati <avati@redhat.com>
2014-02-03 17:24:13 -08:00
Christopher R. Hertel
d2b0a016e7 cluster/dht: Abandoned memory if a call fails
If the call to dict_set_dynstr() fails, the memory indicated by
xattr_buf will not have been stored in the dictionary, so it must be
freed.

Patch set 2: Added a missed call to GF_FREE().  Fixed a formatting
             consistency issue.

Patch set 3: Cleaned a minor style nit.

BUG: 789278
CID: 1124786

Change-Id: Id1f85bd2cbfac0b8727a3f6901f0a50ba921817d
Signed-off-by: Christopher R. Hertel <crh@redhat.com>
Reviewed-on: http://review.gluster.org/6826
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-02-03 17:22:15 -08:00
Vijaykumar M
3023a50c14 dht: do not remove linkfile if file exist in cached sub volume
Currently with rmdir, if a directory contains only the linkfiles
we remove all the linkfiles and this is causing the problem when the cached
sub volume is down and end-up with duplicate files showing on the mount point.

Solution: Before removing a linkfile check if the
files exists in cached subvolume.

Change-Id: Iedffd0d9298ec8bb95d5ce27c341c9ade81f0d3c
BUG: 1042725
Signed-off-by: Vijaykumar M <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/6500
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-02-02 23:10:38 -08:00
Vijay Bellur
e435b4b059 cluster/afr: Change default_value for option self-heal-daemon
Change-Id: Ic3c8e179a63e82a4e416aea620796f8bb3236c7c
BUG: 1052759
Signed-off-by: Vijay Bellur <vbellur@redhat.com>
Reviewed-on: http://review.gluster.org/6706
Reviewed-by: Kaushal M <kaushal@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2014-01-27 07:47:06 -08:00
Raghavendra G
c0c1210ffd cluster/dht: set op_errno correctly during migration.
Change-Id: I65acedf92c1003975a584a2ac54527e9a2a1e52f
BUG: 1010241
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6219
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-01-25 00:03:26 -08:00
Christopher R. Hertel
4ac61e7354 cluster/dht: goto statements may cause loop exit before memory is freed.
Memory is allocated at the top of the while loop via a call to
gf_strdup(), but there are several goto calls that exit the loop, and
the memory is not freed before each of those calls to goto.  This fix
moves the final call to GF_FREE() higher in the loop so that the memory
is correctly freed.

Two variables, dup_str and str_tmp1, point to portions of the allocated
memory.  Neither are used past the final call to GF_FREE( dup_str ).

BUG: 789278
CID: 1124780

Change-Id: Id24b80cdbfd8b8855c80fffec63d7fce98cbed4a
Signed-off-by: Christopher R. Hertel <crh@redhat.com>
Reviewed-on: http://review.gluster.org/6771
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-24 01:36:31 -08:00
Christopher R. Hertel
3340a896a1 cluster/stripe: Remove redundant code blocks
This appears to have been a cut&paste error.  The same set of 12 lines
was repeated three times, causing a pointer to allocated memory to be
overwritten twice resulting in a memory leak.

This patch removes the redundant code.

BUG: 789278
CID: 1128915

Change-Id: I3e4a3703b389c00e2a4e99e0a7368c5a3dda74d0
Signed-off-by: Christopher R. Hertel <crh@redhat.com>
Reviewed-on: http://review.gluster.org/6769
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-24 01:29:45 -08:00
Varun Shastry
6c9fe8fa88 cluster/dht: Set quota limit key in dht_selfheal of dirs.
Also fixed check in dht_is_subvol_in_layout to check if the
layouts are zero'ed out.

Change-Id: I4bf8ebf66d3ef1946309b6c9aac9e79bf8a6d495
BUG: 969461
Signed-off-by: shishir gowda <sgowda@redhat.com>
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/6392
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-22 21:39:57 -08:00
Susant Palai
79683794c2 quota: filter glusterfs quota xattrs
Change-Id: I86ebe02735ee88598640240aa888e02b48ecc06c
BUG: 1040423
Signed-off-by: Susant Palai <spalai@redhat.com>
Reviewed-on: http://review.gluster.org/6490
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Raghavendra G <rgowdapp@redhat.com>
2014-01-22 20:43:04 -08:00
Pranith Kumar K
8d55c25f15 syncop: Change return value of syncop
Problem:
We found a day-1 bug when syncop_xxx() infra is used inside a synctask with
compilation optimization (CFLAGS -O2).

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                        dict_t *migrate_data)
{
.....
code section - [ Loop ]
        while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                       &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a
thread)
                /* Need to keep track of ENOENT errno, that means, there is no
                   need to send more readdirp() */
                readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                        ret = syncop_getxattr (this, &entry_loc, &dict,
                                               GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This
may not always be executed in same thread which executed [SYNCOP-1])
                        if (ret < 0) {
                                if (errno != ENODATA) {
                                        loglevel = GF_LOG_ERROR;
                                        defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code
from [ERRNO-2] can be executed by a different thread(t2) because of the way
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that
is generated:
 [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded
as *(__errno_location())
   0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 <address@hidden>
   0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address
returned by __errno_location() is stored in a special location in stack for
later use.
   0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
   0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
 [ERRNO-2]
1281                                        if (errno != ENODATA) {
   0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because
it already stored the address returned by __errno_location(), it just
dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE
EXECUTED BY SAME THREAD!!!
   0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
   0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
   0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
   0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300
<address@hidden><<--- As it is calling __errno_location() again it gets the
location from t2 so it works as intended.
   0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
   0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
   0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1
<gf_defrag_migrate_data+2287>

Fix:
Make syncop_xxx() return (-errno) value as the return value in
case of errors and all the functions which make syncop_xxx() will need to use
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Change-Id: I314d20dabe55d3e62ff66f3b4adb1cac2eaebb57
BUG: 1040356
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6475
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-01-19 23:05:15 -08:00
Vijaykumar M
dd1f4a4805 dht: Ignore directory with missing xattrs, which have err == 0, and start == stop
From the history (Patch: http://review.gluster.org/4668/)

When subvols-per-directory is < available subvols, then there are layouts
which are not populated. This leads to incorrect identification of holes or
overlaps. We need to ignore layouts, which have err == 0, and start == stop.
In the current scenario (start == stop == 0).

Additionally, in layout-merge, treat missing xattrs as err = 0. In case of
missing layouts, anomalies will reset them.

For any other valid subvoles, err != 0 in case of layouts being zeroed out.
Also reverted back dht_selfheal_dir_xattr, which does layout calculation only
on subvols which have errors.

Change-Id: Idb72a869f1a6f103046bb7e6fe0019f6ac853fd4
BUG: 1047331
Signed-off-by: Vijaykumar M <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/6618
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-01-15 17:20:09 -08:00
Venkatesh Somyajulu
ce86c13234 cluster/afr: Unable to self heal symbolic links
Problem:
Under the entry self heal, readlink is done at the
source and sink. When readlink is done at the sink,
because link is not present at the sink, afr expects
ENOENT. AFR translator takes decisions for new link
creation based on ENOENT but server translator is modified
to return ESTALE because of which afr xlator is not able
to heal.

Fix:
The check for inode absence at server includes ESTALE as
well.

Change-Id: I319e4cb4156a243afee79365b7b7a5a7823e9a24
BUG: 1046624
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6599
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2014-01-13 20:42:05 -08:00
Varun Shastry
f7fcbc0ffe mgmt/glusterd: Improve the description in volume set help output
Change-Id: I785648970f53033a69922c23110b5eea9e47feb3
BUG: 1046030
Signed-off-by: Varun Shastry <vshastry@redhat.com>
Reviewed-on: http://review.gluster.org/6573
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-12 23:46:15 -08:00
Pranith Kumar K
88816bf4b2 cluster/afr: Stop index crawl on pending full crawl
Full crawl is executed when index self-heal is useless, like
disk replacement. So if there are on-going index crawls, they
should be stopped inorder to start full self-heals.

Change-Id: I9a1545f1ec4ad9999dc08523ce859e4fa152e214
BUG: 1049355
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6659
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-08 09:25:06 -08:00
Pranith Kumar K
c0767852b3 cluster/afr: Don't accept heal commands until graph is up
Change-Id: Icca6c23b6a5965f462db8b65af3eb2e141c7cd39
BUG: 1049355
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6658
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2014-01-08 04:20:02 -08:00
Ravishankar N
f9698036fc cluster/afr: avoid race due to afr_is_transaction_running()
Problem:
------------------------------------------
afr_lookup_perform_self_heal() {
        if(afr_is_transaction_running())
                goto out
        else
                afr_launch_self_heal();
}
------------------------------------------

When 2 clients simultaneously access a file in split-brain, one of them
acquires the inode lock and proceeds with afr_launch_self_heal (which
eventually fails and sets "sh-failed" in the callback.)

The second client meanwhile bails out of afr_lookup_perform_self_heal()
because afr_is_transaction_running() returns true due to the lock obtained by
client-1. Consequetly in client-2, "sh-failed" does not get set in the dict,
causing quick-read translator to *not* invalidate the inode, thereby serving
data randomly from one of the bricks.

Fix:
If a possible split-brain is detected on lookup, forcefully traverse the
afr_launch_self_heal() code path in afr_lookup_perform_self_heal().

Change-Id: I316f9f282543533fd3c958e4b63ecada42c2a14f
BUG: 870565
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
Reviewed-on: http://review.gluster.org/6578
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Varun Shastry <vshastry@redhat.com>
2013-12-24 02:24:20 -08:00
James Shubin
879be83614 Fix typos, and spacing issues.
Change-Id: I459ba4e87e9bc4f1c373f7abe8701bfa8450253c
BUG: 1045690
Signed-off-by: James Shubin <james@shubin.ca>
Reviewed-on: http://review.gluster.org/6556
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-23 10:12:41 -08:00
Venkatesh Somyajulu
3991b0d0e9 cluster/afr: For entry self heal, mark all source bricks
Problem:
Whenever a new brick is added into a replicate volume, all
source bricks are not marked as source. Only one of them is
marked as source. Here marked as source refers to adding
extended attribute at the backend of a file corresponding to
the newly added brick. As well as source bricks should point
to the newly added brick so that heal can be triggered.

Fix:
All source bricks will now point to newly added bricks and heal
can be triggered based on the extended attributes.

Change-Id: I318e1f779a380c16c448a2d05c0140d8e4647fd4
BUG: 1037501
Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com>
Reviewed-on: http://review.gluster.org/6540
Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-19 09:34:00 -08:00
Pranith Kumar K
9031a90613 cluster/afr: Add foreground self-heal launch capability through lookup
Also renamed allow-sh-for-running-transaction -> attempt-self-heal

Change-Id: I134cc79e663b532e625ffc342c59e49e71644ab3
BUG: 1039544
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6463
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: venkatesh somyajulu <vsomyaju@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-16 21:37:30 -08:00
Vijaykumar M
a9623ada6f pathinfo: Provide user namespace access.
Locality can be now queried by unprivileged users with
key "glusterfs.pathinfo".

Setting both "glusterfs.pathinfo" and "trusted.glusterfs.pathinfo"
on disk is prevented with this patch.

Original Author: Vijay Bellur <vbellur@redhat.com>

Change-Id: I4f7a0db8ad59165c4aeda04b23173255157a8b79
Signed-off-by: Vijaykumar M <vmallika@redhat.com>
Reviewed-on: http://review.gluster.org/5101
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-16 06:54:26 -08:00
Anand Avati
ea89a25b0b dht: handle ESTALE/ENOENT in dht_access
Had misssed out dht_access in the previous round of cleanup

Change-Id: Ib255b9ad13ca62a8bc2eea225c46632aff8e820f
BUG: 1032894
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6496
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@gmail.com>
2013-12-13 02:19:18 -08:00
Ajeet Jha
b198e072cd glusterd/geo-rep: more glusterd and cli fixes for geo-rep.
-> handle option validation cases in reset case.
    -> Creating valid conf path when glusterd restarts.
    -> Reading the gsyncd worker thread status and displaying it.
    -> Displaying status-detail per worker.
    -> Fetch checkpoint info in geo-rep status.
    -> use-tarssh value validation added.

misc: misc geo-rep fixes based on cluster, logrotate etc..
    -> cluster/dht: fix 'stime' getxattr getting overwritten.
    -> cluster/afr: return max of 'stime' values in subvol.
    -> geo-rep-logrotate: Sending SIGHUP to geo-rep auxiliary.
    -> cluster/dht: fix convoluted logic while aggregating.
    -> cluster/*: fix 'stime' min/max fetch logic.

Change-Id: I811acea0bbd6194797a3e55d89295d1ea021ac85
BUG: 1036552
Signed-off-by: Ajeet Jha <ajha@redhat.com>
Reviewed-on: http://review.gluster.org/6405
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@gmail.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-12-12 00:16:25 -08:00
Pranith Kumar K
493008a299 cluster/dht: Make sure gf_defrag_migrate_data is not optimized
Problem:
Whenever there syncop_xxx() is used inside a synctask and gcc
optimizes it when compiled with -O2 there is a problem where
'errno' would not work as expected.

Fix:
Until http://review.gluster.com/6475 is reviewed and merged
we are making sure the function is not going to be optimized.

Change-Id: I504c18c8a7789f0c776a56f0aa60db3618b21601
BUG: 1040356
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6481
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-12-11 22:23:00 -08:00
Kaushal M
6a163b2214 dht: Set status to FAILED when rebalance stops due to brick going down
Change-Id: I98da41342127b1690d887a5bc025e4c9dd504894
BUG: 1038452
Signed-off-by: Kaushal M <kaushal@redhat.com>
Reviewed-on: http://review.gluster.org/6435
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <gowda.shishir@gmail.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-10 22:49:36 -08:00
Poornima
8a0f744c46 cluster/afr: handle NULL check before strlen/strcmp in fgetxattr
xattr name can legally be NULL. Handle that case without crashing.

Change-Id: Ie214cb05ccd52565dc247a9234ad83ae799d3866
BUG: 1036879
Signed-off-by: Poornima <pgurusid@redhat.com>
Reviewed-on: http://review.gluster.org/6412
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2013-12-03 15:17:12 -08:00
Vijay Bellur
cb78328952 cluster/afr: Fix description string for option 'self-heal-daemon'
Change-Id: I267b935a16a6fdc72a4e791f681289e6868baee6
BUG: 1010834
Reviewed-on: http://review.gluster.org/6385
Reviewed-by: Ravishankar N <ravishankar@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-02 21:48:15 -08:00
Anand Avati
be380f35a2 cluster/dht: handle NULL check before strlen/strcmp in fgetxattr
@key can legally be NULL. Handle that case without crashing.

Change-Id: Iaae293caa7eeb24afc9cd2580799173e2ce00911
BUG: 1036879
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6395
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-12-02 21:47:07 -08:00
Pranith Kumar K
2f218e1335 cluster/dht: Handle Link-info getxattr failure in rebalance
When getxattr fails with errno other than ENODATA fail rebalance
on that file. Log the reason for error.

Change-Id: Ia519870b88e6e6dd464d1c0415411aa999f80bc9
BUG: 1032927
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6341
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Shishir Gowda <sgowda@redhat.com>
Reviewed-by: Shyamsundar Ranganathan <srangana@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-11-26 10:37:41 -08:00
Anand Avati
9f793d70ba cluster/dht: set layout in inode ctx even if linkfile fails
Creating linkfile could have failed, but we dont care about linkfile
for setting layout in the inode ctx (could be EEXIST etc.)

So ignore @inode in cbk and pick it up from local->loc.inode

Change-Id: I2952799d7ae0d3441b84b2ca2981afd75d7576e2
BUG: 1032859
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6319
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-11-26 10:30:33 -08:00
Anand Avati
d1879d04e3 core: fix errno for non-existent GFID
When clients refer to a GFID which does not exist, the errno to
be returned in ESTALE (and not ENOENT). Even though ENOENT might
look "proper" most of the time, as the application eventually expects
ENOENT even if a parent directory does not exist, not returning
ESTALE results in resolvers (FUSE and GFAPI) to not retry resolution
in uncached mode. This can result in spurious ENOENTs during
concurrent path modification operations.

Change-Id: I7a06ea6d6a191739f2e9c6e333a1969615e05936
BUG: 1032894
Signed-off-by: Anand Avati <avati@redhat.com>
Reviewed-on: http://review.gluster.org/6318
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Amar Tumballi <amarts@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-11-26 10:29:23 -08:00
Raghavendra G
33e0df30cb cluster/dht: instruct marker whenever it shouldn't do accounting
This is needed for two reasons:

* since dht-linkfiles are internal, they shouldn't be accounted.
* hardlink handling in marker is broken. link/unlink of hardlinks
  present in same directory can break marker accounting. Hence, if src
  and dst are in same directory in case of rename, dht - if it breaks
  rename into link/unlink operations - should instruct marker to not to
  do accounting.

Change-Id: I9c9f7384569f75a2792f6450ee7a5279bf751ae7
BUG: 1022995
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6203
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-26 10:27:21 -08:00
Raghavendra G
1a2f51144f core: add dht_is_linkfile helper procedure.
components other than distribute (like marker to exclude linkfiles
from being accounted) also need awareness of what constitutes a
linkfile. Hence its good to separate out this functionality into
core.

Change-Id: Ib944eeacc991bb1de464c9e73ee409fc7a689ff1
BUG: 1022995
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/6152
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-26 10:26:44 -08:00
Raghavendra G
ab3ab1978a features/quota: Improvements to quota
* Two stages of quota enforcement is done:

  Soft and hard quota Upon reaching soft quota limit on the directory
  it logs/alerts in the quota daemon log (ie DEFAULT_LOG_DIR/quotad.log)
  and no more writes allowed after hard
  quota limit. After reaching the soft-limit the daemon alerts the
  user/admin repeatively for every 'alert-time', which is
  configurable.

* Quota enforcer is moved to server-side.

  It  takes care of enforcing quota. Since enforcer doesn't have the
  cluster view, it relies on another service called
  quota-aggregator. Aggregator, on query can return the size of a
  directory based on the cluster view.

  Enforcer is always loaded in the server graph and is by passed if
  the feature is not enabled.

  Options specific to enforcer:

  server-quota - Specifies whether the feature is on/off. It is used
  to by pass the quota if turned off.

  deem-statfs - If set to on, it takes quota limits into consideration
  while estimating fs size. (df command). The algorithm followed is,
  i.   Adjust statvfs based on limit configured on root.
  ii.  If limit is set on the inode passed, use size/limits on that inode to
       populate statvfs. Otherwise, use size/limits configured on root.
  iii. Upon statvfs, update the ctx->size on the inode.
  iv.  Don't let DHT aggregate, instead take the maximum of the usages from the
       subvols of the DHT, since each of it contains the complete information.

  Enforcer also makes use of gfid-to-path conversion functionality to
  work correctly when a client like nfs predominently relies on
  nameless lookups.

* Quota Aggregator acts as a thin client to provide cluster view

  Its a lightweight *gluster client* process with no mount point,
  started upon enabling quota or restarting the volume. This is a
  single process run on each brick, which can answer queries on all
  volumes in the cluster. Its volfile stored in
  GLUSTERD_DEFAULT_WORKING_DIR/quotad/quotad.vol.

Credits:
Raghavendra Bhat        <rabhat@redhat.com>
Varun Shastry           <vshastry@redhat.com>
Shishir Gowda           <sgowda@redhat.com>
Kruthika Dhananjay      <kdhananj@redhat.com>
Brian Foster            <bfoster@redhat.com>
Krishnan Parthasarathi  <kparthas@redhat.com>

Change-Id: Id1cb25b414951da34c665a55f77385d482e0f9de
BUG: 969461
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
Reviewed-on: http://review.gluster.org/5952
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-26 10:24:02 -08:00
Pranith Kumar K
1d554b179f cluster/afr: Provide HA for pathinfo getxattr
Problem:
afr_[f]getxattr_pathinfo_cbks fail the fop even when it succeeded on
one of the bricks. This can happen if the last response to pathinfo
[f]getxattr is a failure.

Fix:
Remember if any of the [f]getxattr_pathinfos are successful and send
that as the op_ret/op_errno value to the xlators above.

Note:
Winding fop to a client xlator that is not connected to server produces
an error log. Preventing that by not even winding fop when client xlator
is DOWN.

Change-Id: I846e8c47423ffcfa2eabffe8924534781a36841a
BUG: 1032927
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6332
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2013-11-26 00:34:14 -08:00
M. Mohan Kumar
1ef8a597db Fixes for ZF reported by coverity
BUG: 1028673
Change-Id: I7c75738cca22c81c5629d579ef5bea24000e622e
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/6291
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-11-19 09:57:37 -08:00
Emmanuel Dreyfus
f9443a3f14 Have #include <signal.h> for kill(2)
BUG: 764655
Change-Id: I4d18c9a6c00cb4696645fcb437398562f00b9d24
Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org>
Reviewed-on: http://review.gluster.org/6284
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Gluster Build System <jenkins@build.gluster.com>
2013-11-18 01:36:45 -08:00
Bharata B Rao
884a668a9c zerofill: Change the type of len argument of glfs_zerofill() to off_t
glfs_zerofill() can be potentially called to zero-out entire file and
hence allow for bigger value of length parameter.

Change-Id: I75f1d11af298915049a3f3a7cb3890a2d72fca63
BUG: 1028673
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-on: http://review.gluster.org/6266
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com>
Tested-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-14 23:29:48 -08:00
Amar Tumballi
98e796e501 cluster/dht - rebalance: handle the rebalance @ inode level (!fd level)
* migrate all the fd's on an inode to newer subvol after rebalance
* use the migration in progress flag in inode, so all the operations
  on the inode can make use of it

Change-Id: Ib807a46e927a1062688fc15119c916797c52a350
BUG: 1013456
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Reviewed-on: http://review.gluster.org/5891
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-13 11:45:18 -08:00
M. Mohan Kumar
6ec9c4599e bd: Add BD support to other xlators
Make changes to distributed xlator to work with BD xlator. Unlike files,
a block device can't be removed when its opened. So some part of the
code were moved down to avoid this situation. Also before truncating a
BD file its BD_XATTR should be set otherwise truncate will result in
truncating posix file. So file is created with needed BD_XATTR and
truncate is invoked. Also enables BD xlator in stripe volume type.

Change-Id: If127516e261fac5fc5b137e7fe33e100bc92acc0
BUG: 1028672
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5235
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Anand Avati <avati@redhat.com>
2013-11-13 11:38:55 -08:00
M. Mohan Kumar
c8fef37c5d glusterfs: zerofill support
Add support for a new ZEROFILL fop. Zerofill writes zeroes to a file in
the specified range. This fop will be useful when a whole file needs to
be initialized with zero (could be useful for zero filled VM disk image
provisioning or  during scrubbing of VM disk images).

Client/application can issue this FOP for zeroing out. Gluster server
will zero out required range of bytes ie server offloaded zeroing. In
the absence of this fop,  client/application has to repetitively issue
write (zero) fop to the server, which is very inefficient method because
of the overheads involved in RPC calls  and acknowledgements.

WRITESAME is a  SCSI T10 command that takes a block of data as input and
writes the same data to other blocks and this write is handled
completely within the storage and hence is known as offload . Linux ,now
has support for SCSI WRITESAME command which is exposed to the user in
the form of BLKZEROOUT ioctl.  BD Xlator can exploit BLKZEROOUT ioctl to
implement this fop. Thus zeroing out operations can be completely
offloaded to the storage device , making it highly efficient.

The fop takes two arguments offset and size. It zeroes out 'size' number
of bytes in an opened file starting from 'offset' position.

This patch adds zerofill support to the following areas:
	- libglusterfs
	- io-stats
	- performance/md-cache,open-behind
	- quota
	- cluster/afr,dht,stripe
	- rpc/xdr
	- protocol/client,server
	- io-threads
	- marker
	- storage/posix
	- libgfapi

Client applications can exloit this fop by using glfs_zerofill introduced in
libgfapi.FUSE support to this fop has not been added as there is no system call
for this fop.

Changes from previous version 3:
* Removed redundant memory failure log messages

Changes from previous version 2:
* Rebased and fixed build error

Changes from previous version 1:
* Rebased for latest master

TODO :
     * Add zerofill support to trace xlator
     * Expose zerofill capability as part of gluster volume info

Here is a performance comparison of server offloaded zeofill vs zeroing
out using repeated writes.

[root@llmvm02 remote]# time ./offloaded aakash-test log 20

real	3m34.155s
user	0m0.018s
sys	0m0.040s
[root@llmvm02 remote]# time ./manually aakash-test log 20

real	4m23.043s
user	0m2.197s
sys	0m14.457s
[root@llmvm02 remote]# time ./offloaded aakash-test log 25;

real	4m28.363s
user	0m0.021s
sys	0m0.025s
[root@llmvm02 remote]# time ./manually aakash-test log 25

real	5m34.278s
user	0m2.957s
sys	0m18.808s

The argument log is a file which we want to set for logging purpose and
the third argument is size in GB .

As we can see there is a performance improvement of around 20% with this
fop.

Change-Id: I081159f5f7edde0ddb78169fb4c21c776ec91a18
BUG: 1028673
Signed-off-by: Aakash Lal Das <aakash@linux.vnet.ibm.com>
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-on: http://review.gluster.org/5327
Tested-by: Gluster Build System <jenkins@build.gluster.com>
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
2013-11-10 21:25:49 -08:00
Pranith Kumar K
5033d450ca cluster/afr: Remove 'max' from the log
This patch avoids giving more info to the user about the
internal heuristic employed in afr, for quota sizes.

Change-Id: Ice3a164399f09b6967500ec0c17dc340e7ae9aba
BUG: 1016683
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Reviewed-on: http://review.gluster.org/6098
Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Tested-by: Vijay Bellur <vbellur@redhat.com>
2013-10-17 02:26:26 -07:00