IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Problem:
1) Afr's eager-lock only works for data transactions.
2) When there are conflicting writes, write with conflicting region initiates
unlock of eager-lock leading to extra pre-ops and post-ops on the file. When
eager-lock goes off, it leads to extra fsyncs for random-write workload in afr.
Solution (that is modeled after EC):
In EC, when there is a conflicting write, it waits for the current write to
complete before it winds the conflicted write. This leads to better utilization
of network and disk, because we will not be doing extra xattrops and FSYNCs and
inodelk/unlock. Moved fd based counters to inode based counters.
I tried to model the solution based on EC's locking, but it is not similar to
AFR because we had to keep backward compatibility.
Lifecycle of lock:
==================
First transaction is added to inode->owners list and an inodelk will be sent on
the wire. All the next transactions will be put in inode->waiters list until
the first transaction completes inodelk and [f]xattrop completely. Once
[f]xattrop also completes, all the requests in the inode->waiters list are
checked if it conflict with any of the existing locks which are in
inode->owners list and if not are added to inode->owners list and resumed with
doing transaction. When these transactions complete fop phase they will be
moved to inode->post_op list and resume the transactions that were paused
because of conflicts. Post-op and unlock will not be issued on the wire until
that is the last transaction on that inode. Last transaction when it has to
perform post-op can choose to sleep for deyed-post-op-secs value. During that
time if any other transaction comes, it will wake up the sleeping transaction
and takes over the ownership of the lock and the cycle continues. If the
dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping
transaction and it will set lock->release to true and starts doing post-op and
then unlock. During this time if any other transactions come, they will be put
in inode->frozen list. Once the previous unlock comes it will move the frozen
list to waiters list and moves the first element from this waiters-list to
owners-list and attempts the lock and the cycle continues. This is the general
idea. There is logic at the time of dealying and at the time of new
transaction or in flush fop to wakeup existing sleeping transactions or
choosing whether to delay a transaction etc, which is subjected to change based
on future enhancements etc.
Fixes: #418
BUG: 1549606
Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
Problem:
Whenever we read data from file over NFS, NFS reads
more data then requested and caches it. Based on the
stat information it makes sure that the cached/pre-read
data is valid or not.
Consider 4 + 2 EC volume and all the bricks are on
differnt nodes.
In EC, with round-robin read policy, reads are sent on
different set of data bricks. This way, it balances the
read fops to go on all the bricks and avoid heating UP
(overloading) same set of bricks.
Due to small difference in clock speed, it is possible
that we get minor difference for atime, mtime or ctime
for different bricks. That might cause a different stat
returned to NFS based on which NFS will discard
cached/pre-read data which is actually not changed and
could be used.
Solution:
Change read policy for EC as gfid-hash. That will force
all the read to go to same set of bricks.
Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84
BUG: 1554743
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
In the jenkins regression test brick multiplexing is enabled by
is_brick_mx_enabled function and not by setting cluster.brick-multiplex
option. Hence check the count of bricks and its logs, this fixes the
failure.
Change-Id: Ibb2ed8fbffd3765f283da741689304a5579d447c
BUG: 1555167
Signed-off-by: Varsha Rao <varao@redhat.com>
Self-heal creates a thread per brick to sweep the index looking for
files that need to be healed. These threads are started before the
volume comes online, so nothing is done but waiting for the next
sweep. This happens once per minute.
When a replace brick command is executed, the new graph is loaded and
all index sweeper threads started. When all bricks have reported, a
getxattr request is sent to the root directory of the volume. This
causes a heal on it (because the new brick doesn't have good data),
and marks its contents as pending to be healed. This is done by the
index sweeper thread on the next round, one minute later.
This patch solves this problem by waking all index sweeper threads
after a successful check on the root directory.
Additionally, the index sweep thread scans the index directory
sequentially, but it might happen that after healing a directory entry
more index entries are created but skipped by the current directory
scan. This causes the remaining entries to be processed on the next
round, one minute later. The same can happen in the next round, so
the heal is running in bursts and taking a lot to finish, specially
on volumes with many directory levels.
This patch solves this problem by immediately restarting the index
sweep if a directory has been healed.
Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e
BUG: 1547662
Signed-off-by: Xavi Hernandez <jahernan@redhat.com>
This test does:
1. mount a volume
2. kill a brick in the volume
3. mkdir (/somedir)
In my local tests and in [1], I see that mkdir in step 3 fails because
there is no dht-layout on root directory.
The reason I think is by the time first lookup on "/" hit dht, a brick
was killed as per step 2. This means layout was not healed for "/" and
since this is a new volume, no layout is present on it. Note that the
first lookup done on "/" by fuse-bridge is not synchronized with
parent process of daemonized glusterfs mount completing. IOW, by the
time glusterfs cmd executed there is no guarantee that lookup on "/"
is complete. So, if step 2 races ahead of fuse_first_lookup on "/", we
end up with an invalid dht-layout on "/" resulting in failures.
Doint an operation like ls makes sure that lookup on "/" is completed
before we kill a brick
Change-Id: Ie0c4e442c4c629fad6f7ae850437e3d63fe4bea9
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
BUG: 1543279
Because bug-924726.t depends on netstat, tests failed before. This got resolved
by adding respective check to run-tests.sh.
Enabled respective test again.
Change-Id: I70c9bff03379ed9ee8cd95842c3501dfb50b8e86
BUG: 1312830
Signed-off-by: Sven Fischer <sven@fischer-abc.de>
For skipped files, use a return value of 1 to prevent
error messages being logged.
Change-Id: I18de31ac1a64d4460e88dea7826c3ba03c895861
BUG: 1553598
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Problem:
Auto thread count derived from the number of attachs and detachs
was reset to 1 when server_reconfigure() was called.
Solution:
Avoid auto-thread-count reset to 1.
Change-Id: Ic00e86adb81ba3c828e354a6ccb638209ae58b3e
BUG: 1547888
Signed-off-by: Milind Changire <mchangir@redhat.com>
In a mixed mode cluster involving 4.0 and older 3.x bricks, if
clients are newer, then the iatt encoded in the dictionary can be
of the older iatt format, which a newer client will map incorrectly
to the newer structure.
This causes failures in FOPs that depend on this iatt for some
functionality (seen in mkdir operations failing as EIO, when DHT
hits its internal setxattr call).
The fix provided is to convert the iatt in the dict, based on which
RPC version is used to communicate with the server.
IOW, this is the reverse of change in commit "b966c7790e"
Tested using a mixed mode cluster (i.e bricks in 3.12 and 4.0 versions)
and a mixed set of clients, 3.12 and 4.0 clients.
There is no regression test provided, as this needs a mixed mode cluster
to test and validate.
Change-Id: I454e54651ca836b9f7c28f45f51d5956106aefa9
BUG: 1554053
Signed-off-by: ShyamsundarR <srangana@redhat.com>
Added iatt conversion to an older format, when dealing with
older RPC versions. This enables iatt structure conformance
when dealing with older clients.
This helps fix rolling upgrade from 3.x versions to 4.0 version
of gluster by sending the right iatt in the dictionary when DHT
requests the same.
Change-Id: Ieaf925f81f8c7798a8fba1e90a59fa9dec82856c
BUG: 1544699
Signed-off-by: ShyamsundarR <srangana@redhat.com>
There was an issue when some accesses to saved_fds list were
protected by the wrong mutex (lock instead of fd_lock).
Additionally, the retrieval of fdctx from fd's context and any
checks done on it have also been protected by fd_lock to avoid
fdctx to become outdated just after retrieving it.
Change-Id: If2910508bcb7d1ff23debb30291391f00903a6fe
BUG: 1553129
Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>
Instead send the SIGTERM (default, 15) first, and at the end
send SIGKILL. If SIGKILL is sent directly, we miss many tests
like valgrind, lcov etc., not able to process the information
properly.
BUG: 1549000
Change-Id: I664de12ee7dbf47eb98b8141004cd51f6006b314
Signed-off-by: Amar Tumballi <amarts@redhat.com>
1. If a replica volume created on glusterfs-3.8 was upgraded to
glusterfs-3.12, `gluster vol get volname client-io-threads` displayed
'on' even though it wasn't and the xlator wasn't loaded on
the client-graph. This was due to removing certain checks in
glusterd_get_default_val_for_volopt as a part of commit
47604fad4c2a3951077e41e0c007ceb979bb2c24. Fix it.
2. Also, as a part of op-version bump-up, client-io-threads was being
loaded on the clients during volfile regeneration. Prevent it.
3. AFR assumes quorum-type to be auto in newly created replic 3 (odd
replica in general) volumes but `gluster vol get quorum-type` displays
'none'. Fix it.
Change-Id: I19e586361ed1065c70fb378533d3b4dac1095df9
BUG: 1545056
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
The subdirectories are expected to be present for a subdir
mount to be successful. If not, the client_handshake()
itself fails to succeed. When a volume is about to get
mounted first time, this is easier to handle, as if the
directory is not present in one brick, then its mostly
not present in any other brick. In case of add-brick,
the directory is not present in new brick, and there is
no chance of healing it from the subdirectory mount, as
in those clients, the subdir itself will be 'root' ('/')
of the filesystem. Hence we need a volume mount to heal
the directory before connections can succeed.
This patch does take care of that by healing the directories
which are expected to be mounted as subdirectories from the
volume level mount point.
Change-Id: I2c2ac7b7567fe209aaa720006d09b68584d0dd14
BUG: 1549915
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Removed
1) afr-v1 self-heal locks related code which is not used anymore
2) transaction has some data types that are not needed, so removed them
3) Never used lock tracing available in afr as gluster's network tracing does
the job. So removed that as well.
4) Changelog is always enabled and afr is always used with locks, so
__changelog_enabled, afr_lock_server_count etc functions can be deleted.
5) transaction.fop/done/resume always call the same functions, so no need
to have these variables.
BUG: 1549606
Change-Id: I370c146fec2892d40e674d232a5d7256e003c7f1
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
In order to take care of stale lock issue, a timer was intrduced
in mgmt_v3 lock. This timer is not freeing the memory due to
which this leak got introduced
With this fix now memory cleanup in locking is handled properly
Change-Id: I2e1ce3ebba3520f7660321f3d97554080e4e22f4
BUG: 1550339
Signed-off-by: Gaurav Yadav <gyadav@redhat.com>
We are not seeing much improvement with this change. So removing the
feature so that it doesn't need to be maintained anymore.
Fixes: #414
Change-Id: Ic7969b151544daf2547bd262a9fa03f575626411
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
GETATTR FUSE message can carry a file handle
reference in which case it serves as a hint
for the FUSE server that the stat data is
preferably acquired in context of the given
filehandle (which we call '"fgetattr"-like
semantics').
So far FUSE ignored the GETTATTR provided
filehandle and grabbed a file handle
heuristically. This caused confusion in the
caching layers, which has been tracked down
as one of the reasons of referred BUG.
As of the BUG, this is just a partial fix.
BUG: 1512691
Change-Id: I67eebbf5407ca725ed111fbda4181ead10d03f6d
Signed-off-by: Csaba Henk <csaba@redhat.com>
We have the following undefined symbol error from protocol/server.so:
glusterfs_mgmt_pmap_signout
glusterfs_autoscale_threads
See https://review.gluster.org/19225 (bz#1532238)
and https://review.gluster.org/19657 (bz#1550895)
(why are there two different bzs for the same bug?)
IMO this is a cleaner solution. I.e. moving the above two functions
to libgfrpc (.../rpc/rpc-lib/...)
I would also, for (foolish) consistency sake, like to see
glusterfs_mgmt_pmap_signin() moved from glusterfsd to libgfrpc as
well.
This works on f28/rawhide, with its new, more restrictive run-time
link semantics. The smoke and regression tests on earlier fedora and
centos will confirm that it works on those platforms too.
Change-Id: I9cfbd1cc15e7ebd9fc31b56ac791287fa2c584de
BUG: 1550895
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
This is required as we check for 'clnt-lk-version' in SETVOLUME callback
with older clients in place against newer servers. Change is similar to
what we have done via https://review.gluster.org/#/c/19560/.
Change-Id: If333c20cf9503f40687ec926c44c7e50222c05b5
BUG: 1544699
Signed-off-by: Anoop C S <anoopcs@redhat.com>
The volume_options_t struct was modified and a new member was introduced
in the middle of the struct. This caused GD2 to crash when it tried to
read the volume options. The new member has been moved to the end of the
struct to correct this.
And a note has been added to notify developers on how to modify this
struct, and the xlator_api_t struct.
Updates: gluster/glusterfs#302
Change-Id: I2e9899ec10516be29c7e9d574da53be8ec17a99e
Signed-off-by: Kaushal M <kaushal@redhat.com>
For as long as a shard's inode is in priv->lru_list, it should have a non-zero
ref-count. This patch achieves it by taking a ref on the inode when it
is added to lru list. When it's time for the inode to be evicted
from the lru list, a corresponding unref is done.
Change-Id: I289ffb41e7be5df7489c989bc1bbf53377433c86
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
At the time of pre-op, pre_op_xdata is populted with the xattrs we get from the
disk and at the time of post-op it gets over-written without unreffing the
previous value stored leading to a leak.
This is a regression we missed in
https://review.gluster.org/#/q/ba149bac92d169ae2256dbc75202dc9e5d06538e
BUG: 1550078
Change-Id: I0456f9ad6f77ce6248b747964a037193af3a3da7
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
> Add options to disable new features
> Commit ID: c071992e8d
> https://review.gluster.org/#/c/18291/
> By Michael Goulet <mgoulet@fb.com>
This patch is required to forward port io-threads namespace patch.
Updates: #401
Change-Id: Ice477fdf4b8934f9fac0b4a2f6c93db97429a586
Signed-off-by: Varsha Rao <varao@redhat.com>
This patch fixes the namespace test failure when brick multiplexing is enabled.
By changing the log file name, when brick multiplexing is enabled. As only one
log file generated for all bricks.
Change-Id: Ide941946e5e1b2676e7139e1b5bf6b93b93c0815
Signed-off-by: Varsha Rao <varao@redhat.com>
Coverity issue : FORWARD_NULL
fd is assigned within a condition, but the fd is used even outside
the condition.
Change-Id: I6548d605d8a8acc6a25f1657f9fb75586d513042
Signed-off-by: Poornima G <pgurusid@redhat.com>
Building with libtirpc (versus legacy glibc rpc) results in many
warnings about xdr macros that are redefined in libtirpc headers
because of the way compat.h and glusterfs.h are usually #included.
And these xdr macros in libglusterfs/src/compat.h - which were copied
from legacy glibc's rpc headers - are different than the same-name macros
in libtirpc. I haven't checked to see that any of the macros are
expanded (incorrectly) between the definition in compat.h and the
redefinition in tirpc/rpc/xdr.h; the risk seems pretty minimal. Regardless
it seems better, from a truth-and-beauty perspective to not have the
old, incorrect definitions in the first place.
Not to mention that any file that #includes compat.h and not glusterfs.h
does not need these xdr macro definitions at all. They're really only
needed when using really old glibc rpc, which would only be evident if
including glusterfs.h and/or glusterfs-fops.h. (Which by the way, nothing
currently #includes glusterfs-fops.h by itself. And maybe nothing ever
should?)
Change-Id: Ic11e4407d6ab7c498a8745a99379cbf4788a24e8
Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Framework in order to classify options.
Updates gluster/glusterfs#302
Change-Id: I3dd6ae27bd0eb8e0065ffca75838c801e4f3ac91
Signed-off-by: N Balachandran <nbalacha@redhat.com>
Problem: At the time of stopping the volume while brick multiplex is
enabled memory is not cleanup from all server side xlators.
Solution: To cleanup memory for all server side xlators call fini
in glusterfs_handle_terminate after send GF_EVENT_CLEANUP
notification to top xlator.
BUG: 1544090
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Note: Run all test-cases in separate build (https://review.gluster.org/19574)
with same patch after enable brick mux forcefully, all test cases are
passed.
Change-Id: Ia10dc7f2605aa50f2b90b3fe4eb380ba9299e2fc
> io-threads: nuke everything from a client when it disconnects
> Commit ID: 4d8268d760
> https://review.gluster.org/#/c/18254/
> By Jeff Darcy <jdarcy@fb.com>
This patch is required to forward port io-threads namespace patch.
Updates: #401
Change-Id: I13d3a74862eea3d01e8dbc8736987c3dae6e8b2a
Signed-off-by: Varsha Rao <varao@redhat.com>
... instead of adding this information in fd_ctx in call path and
retrieving it again in the callback.
Change-Id: Ibbddbbe85baadb7e24aacf5ec8a1250d493d7800
BUG: 1468483
Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
Currently the aggregate size is by default 128K (page size).
From performance perspective small number of large writes is faster
than large number of small writes, especially in EC volumes. But identifying
the right aggregate size depends on multiple factors like the memcpy overhead,
network overhead etc. On local machine, combining 128k writes to 1M writes for
EC volumes yielded 30% improvement.
As a part of this patch, aggregate size is just made configurable and page_size
is modified accordingly.
Raghavendra Gowdappa had suggested that, while aggregating writes we should get
rid of memcpy of large write size, and instead add the pointer to existinf vector,
will be doing it as a part of another patch. Also, in EC volumes, the vectors are
merged into one vector, so even if we save memcopy in write_behind, EC would anyways
do memcopy for merging vectors into one vector.
Updates: #364
Change-Id: Ib67294b8577bea14dde1c84cd271012ecea99f09
Signed-off-by: Poornima G <pgurusid@redhat.com>
Scale rpcsvc_request_handler threads to match the scaling of event
handler threads.
Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c51
for a discussion about why we need multi-threaded rpcsvc request
handlers.
Change-Id: Ib6838fb8b928e15602a3d36fd66b7ba08999430b
Signed-off-by: Milind Changire <mchangir@redhat.com>
Currently its passed in dht_blocking_inode(entry)lk, which would be a
global value for all the locks passed in the argument. This would
be a limitation for cases where we want to ignore failures on only few
locks and fail for others.
Change-Id: I02cfbcaafb593ad8140c0e5af725c866b630fb6b
BUG: 1543279
Signed-off-by: Raghavendra G <rgowdapp@redhat.com>
> performance/io-threads: Add watchdog to cover up a possible thread leak
> Commit ID: 8b6804f75c
> https://review.gluster.org/#/c/18239/
> By Shreyas Siravara <sshreyas@fb.com>
This patch is required to forward port io-threads namespace patch.
Updates: #401
Change-Id: Id057c34a2abb9fc6dfb4afcd5c7bbbfe5693bbb8
Signed-off-by: Varsha Rao <varao@redhat.com>