glusterfs

Author	SHA1	Message	Date
N Balachandran	d0fbe01f96	cluster/dht: Update dht option levels Set the levels for DHT options based on https://review.gluster.org/#/c/19466/ Change-Id: I51b31a706a0b9517404e83224c89de145fd5d7e1 updates: #430 Signed-off-by: N Balachandran <nbalacha@redhat.com>	2018-04-02 06:11:44 +00:00
Krutika Dhananjay	08fadcc2a7	mount/fuse: Add support for multi-threaded fuse readers Usage: Use 'reader-thread-count=<NUM>' as command line option to set the thread count at the time of mounting the volume. Next task is to make these threads auto-scale based on the load, instead of having the user remount the volume everytime to change the thread count. Updates #412 Change-Id: I94aa1505e5ae6a133683d473e0e4e0edd139b76b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>	2018-04-02 06:10:30 +00:00
N Balachandran	25690197a6	cluster/dht: Update layout in inode only on success With lookup-optimize enabled, gf_defrag_settle_hash in rebalance sometimes flips the on-disk layout on volume root post the migration of all files in the directory. This is sometimes seen when attempting to fix the layout of a directory multiple times before calling gf_defrag_settle_hash. dht_fix_layout_of_directory generates a new layout in memory but updates it in the inode ctx before it is set on disk. The layout may be different the second time around due to dht_selfheal_layout_maximize_overlap. If the layout is then not written to the disk, the inode now contains the wrong layout. gf_defrag_settle_hash does not check the correctness of the layout in the inode before updating the commit-hash and writing it to the disk thus changing the layout of the directory. Change-Id: Ie1407d92982518f2a0c40ec70ad370b34a87b4d4 updates: bz#1557435 Signed-off-by: N Balachandran <nbalacha@redhat.com>	2018-04-02 05:21:42 +00:00
Sanju Rakonde	3f9851db49	Revert "glusterd: handling brick termination in brick-mux" This reverts commit a60fc2ddc03134fb23c5ed5c0bcb195e1649416b. This commit was causing multiple tests to time out when brick multiplexing is enabled. With further debugging, it's found that even though the volume stop transaction is converted into mgmt_v3 to allow the remote nodes to follow the synctask framework to process the command, there are other callers of glusterd_brick_stop () which are not synctask based. Change-Id: I7aee687abc6bfeaa70c7447031f55ed4ccd64693 updates: bz#1545048	2018-03-29 14:58:27 +00:00
Ravishankar N	c87bd439ef	afr: add new value for read-hash-mode volume option Updates: #363 This new value (3) will try to wind read requests to the child of AFR having the least amount of pending requests in its queue. Change-Id: If6bda2aac9bf7aec3fc39622f78659313c4b6508 Signed-off-by: Ravishankar N <ravishankar@redhat.com>	2018-03-29 13:07:04 +05:30
Xavi Hernandez	89577d8b0a	cluster/ec: send list-node-uuids request to all subvolumes The xattr trusted.glusterfs.list-node-uuids was only sent to a single subvolume. This was returning null uuids from the other subvolumes as if they were down. This fix forces that xattr to be requested from all subvolumes. Change-Id: If62eb39a6857258923ba625e153d4ad79018ea2f fixes: bz#1561406 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2018-03-28 18:19:25 +00:00
Michael Scherer	caa76bf8d4	Fix gluster(8) formatting Looking at the man page show that "Snapshot command" wasn't aligned with the other section titles. Change-Id: I24bdb2e3728e03862fee57710cfe34b0607fe09a BUG: 1507230 Signed-off-by: Michael Scherer <misc@redhat.com>	2018-03-28 08:57:56 +00:00
Kaleb S. KEITHLEY	8e7c83e80d	glusterd: changing the op-version of volume stop mgmt v3 log message describe the actual test Change-Id: I1ea7300a6b186032a65236492d6d2a6eef0ab983 fixes: bz#1560441 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>	2018-03-28 07:39:55 +00:00
Shreyas Siravara	20fb08d525	rpc: update tirpc registration to "force" unregister old mapping before re-registering > Reviewed-on: https://review.gluster.org/16849 > Reviewed-by: Shreyas Siravara <sshreyas@fb.com> Change-Id: I05ed6b7c715a71e5819fbe8116e7c3146010f836 BUG: 1521030 Signed-off-by: Kevin Vigor <kvigor@fb.com> Signed-off-by: Amar Tumballi <amarts@redhat.com>	2018-03-28 07:38:59 +00:00
Zhang Huan	3b578daaec	rpc: simplify parameters when a saved frame is forced to unwind When a saved frame is to be forced unwind, there is no need to pass an empty iovector without any data pointed to. Change-Id: I6e858fb38644326e22239b83272b15db656035e5 BUG: 1523122 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>	2018-03-28 07:38:10 +00:00
Zhang Huan	f7d6d8579c	rpc: fix incorrect return value when xdr decode fails xdr_replymsg is called to decode reply message, and it returns failure if the message is corrupted. However, retrieving return value from the global errno is 0 even xdr_replymsg fails. Fix this issue by simply returning a negative value if call to xdr_replymsg fails. Change-Id: I2b9a1dc97652fbb6cf6568ea617f120713784a55 BUG: 1523122 Signed-off-by: Zhang Huan <zhanghuan@open-fs.com>	2018-03-28 07:37:34 +00:00
Sanju Rakonde	a60fc2ddc0	glusterd: handling brick termination in brick-mux Problem: There's a race between the last glusterfs_handle_terminate() response sent to glusterd and the kill that happens immediately if the terminated brick is the last brick. Solution: When it is a last brick for the brick process, instead of glusterfsd killing itself, glusterd will kill the process in case of brick multiplexing. And also changing gf_attach utility accordingly. Change-Id: I386c19ca592536daa71294a13d9fc89a26d7e8c0 fixes: bz#1545048 BUG: 1545048 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>	2018-03-28 04:27:18 +00:00
N Balachandran	95601229b8	cluster/dht: ENOSPC will not fail rebalance ENOSPC returned by a file migration is no longer considered a rebalance failure. Change-Id: I21cf3a8acdc827bc478e138d6cb5db649d53a28c fixes: bz#1553598 Signed-off-by: N Balachandran <nbalacha@redhat.com>	2018-03-28 04:08:53 +00:00
Sanoj Unnikrishnan	04ede2e163	Quota: heal directory on newly added bricks when quota limit is reached Problem: if a lookup is done on a newly added brick for a path on which limit has been reached, the lookup fails to heal the directory tree due to quota. Solution: Tag the lookup as an internal fop and ignore it in quota. Since marking internal fop does not usually give enough contextual information. Introducing new flags to pass the contextual info. Adding dict_check_flag and dict_set_flag to aid flag operations. A flag is a single bit in a bit array (currently limited to 256 bits). Change-Id: Ifb6a68bcaffedd425dd0f01f7db24edd5394c095 fixes: bz#1505355 BUG: 1505355 Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>	2018-03-28 04:07:12 +00:00
Poornima G	7d95a6ff71	quick-read: Provide statistics to the monitor Updates: #425 Change-Id: Iea5198821f4eabc46bc63529afa4a92d4b4c2be0 Signed-off-by: Poornima G <pgurusid@redhat.com>	2018-03-28 03:35:55 +00:00
Sanju Rakonde	ab8d18945b	glusterd: changing the op-version of volume stop mgmt v3 Change-Id: Iefc5a00d36436b23181871fa365f27b8d90cff0a fixes: bz#1560441 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>	2018-03-27 11:03:30 +05:30
Vishal Pandey	25fc639120	glfs heal binary change to accomodate socket file arguments Change-Id: I755d6552decd015aec7859ad2cf99c76c8bee9dc fixes: bz#1558380 BUG: 1558380 Signed-off-by: Vishal Pandey <vpandey@redhat.com>	2018-03-27 02:24:11 +00:00
Sanju Rakonde	9c047f4cae	glusterd: Implementing volume stop in mgmt v3 Change-Id: I8f9c594cf56331d54eb4884335699744685ef20d fixes: bz#1560441 Signed-off-by: Sanju Rakonde <srakonde@redhat.com>	2018-03-26 20:10:25 +05:30
Atin Mukherjee	a611679428	tests: fix nl-cache.t failure commit fef9293 changed network.inode-lru-limit from 50000 to 200000 in nl-cache group profile but the test wasn't changed to reflect it accordingly. Change-Id: Ibb5fb0a387f160f6b726246b161a9a7b33135755 fixes: bz#1560589 Signed-off-by: Atin Mukherjee <amukherj@redhat.com>	2018-03-26 19:28:25 +05:30
Amar Tumballi	de0e7efead	.testignore: remove the group profile files Change-Id: I6619827f1bf6fe9bd974537af6169164b19a0aa5 fixes: bz#1560393 Signed-off-by: Amar Tumballi <amarts@redhat.com>	2018-03-26 18:31:21 +05:30
Niklas Hambüchen	6611aa8203	glusterfind: Log remote stderr on `node_cmd` error. Fixes #1559130 The problem of lost stderr was introduced in commit feea851fad4f89b48bfe89fe3b75250cc7bd6501. Change-Id: Ic98f9bc9682ae3bd9c3ebea3855667fc8ba2843d BUG: 1559130 Signed-off-by: Niklas Hambüchen <mail@nh2.me>	2018-03-26 12:48:16 +00:00
Susant Palai	52c63dafd2	md-cache: fix ./tests/basic/md-cache/bug-1418249.t inode table size is currently set to 200000. Hence the need of change in testcase which was expecting the old value 50000. Change-Id: I8e44b1d0a2da1e8100bebd25f48bb36e2897b4f8 fixes: bz#1560393 Signed-off-by: Susant Palai <spalai@redhat.com>	2018-03-26 09:47:01 +05:30
Poornima G	fef929342e	extras/group: Change the server inode table size when upcall is on By default server inode table size is 16K, when upcall is enabled, there is going to be too many forgets sent on inodes as the brick can hold only 16K inodes in memory, so we increased this to 50K. This is still less than the client inode table size. We have seen performance improvement when server inode table size is set to 200000(almost as client inode table size). Hence changing the value to 200000. Increasing this increases the memory consumption by <1MB. BUG: 1559235 Change-Id: I931db965cd34bf33094328541bd5a633b3357805 Signed-off-by: Poornima G <pgurusid@redhat.com>	2018-03-24 05:38:47 +00:00
Poornima G	e043938e28	nl-cache: Provide statistics to the monitor Updates: #429 Change-Id: Ic2e64422055f1838d5d453643c739ef1e9319cfe Signed-off-by: Poornima G <pgurusid@redhat.com>	2018-03-24 05:38:26 +00:00
Poornima G	bf671adddf	md-cache: Provide statistics to the monitor Updates: #427 Change-Id: Ib1f45016ac75d7bc2755db0dd4b68ce1d95d26c3 Signed-off-by: Poornima G <pgurusid@redhat.com>	2018-03-24 05:38:00 +00:00
Sanoj Unnikrishnan	bc04046f0e	features/quota: Add new fields to translator options for GD2 alert-time, soft timeout, hard timeout, default soft limit and deem-statfs will be settable through volume set command. hence marked as settable. Other options are used only via quota commands. Updates #302 Change-Id: I02d258cc3aa7fe58ccbadd59441cce64cfd9ba6e Signed-off-by: Sanoj Unnikrishnan <sunnikri@redhat.com>	2018-03-24 05:14:48 +00:00
James Le Cuirot	d978ff0e3a	build: Fix misleading TIRPC result in configure summary Requesting ipv6-default even if you explicitly disable libtirpc will then implicitly enable libtirpc because that is required. That is fine but the configure summary should not then show TIRPC as disabled when it is not. The result has also been made clearer by stating that TIRPC is "missing" when it has been tried but not found. BUG: 1553938 Change-Id: I945bd6859aaf3defa682b0d05ee34a9827b9c45f Signed-off-by: James Le Cuirot <chewi@gentoo.org>	2018-03-24 05:14:05 +00:00
James Le Cuirot	d121b97f9a	build: Fix configure --without-ipv6-default behaviour The current behaviour disables ipv6-default when no switch is given at all but otherwise checks if libtirpc was requested, regardless of whether you have given --with-ipv6-default or --without-ipv6-default. I believe the intention was to enable when libtirpc is requested by default but otherwise respect the switch given. This is important because ipv6-default breaks Gluster for systems that have IPv6 disabled. BUG: 1553926 Change-Id: I76b91ae2699574b2e5b777453732bb5cbd79bbca Signed-off-by: James Le Cuirot <chewi@gentoo.org>	2018-03-24 05:14:05 +00:00
Niklas Hambüchen	0a08afbb9a	libgfchangelog: Correct the log message Provide correct error message for changelog end time check Updated error message to print "wrong result for end". Original patch by Keith Schincke <kschinck@redhat.com> from https://review.gluster.org/#/c/8121/ Change-Id: Ia3458cbac7784bfc71c05da67391a3f8259f18f0 BUG: 1559126 Signed-off-by: Niklas Hambüchen <mail@nh2.me>	2018-03-24 05:11:02 +00:00
Niklas Hambüchen	0056feaa21	python: Remove all uses of find_library. Fixes #1450593 `find_library()` doesn't consider LD_LIBRARY_PATH on Python < 3.6. Change-Id: Iee26085cb5d14061001f19f032c2664d69a378a8 BUG: 1450593 Signed-off-by: Niklas Hambüchen <mail@nh2.me>	2018-03-24 05:10:31 +00:00
Niklas Hambüchen	aaa4e373f3	glusterfind: Show C function when raising ChangelogException Fixes: #432 Change-Id: I9ab031e098aff717e619d9deb6410281b96de14a Signed-off-by: Niklas Hambüchen <mail@nh2.me>	2018-03-24 05:09:37 +00:00
Amar Tumballi	a69e8a664a	rfc.sh: provide a unified way to update bugs or github issues ID Change-Id: Ie78d87b31512da6201ae26f3d391fa3f8e5b68d1 fixes: bz#1545891 Signed-off-by: Amar Tumballi <amarts@redhat.com>	2018-03-22 15:01:38 +00:00
Csaba Henk	5b46d55660	client: make fuse direct I/O strategies explicit So far the --direct-io-mode option has been presented as of being Boolean valued. That is however not exact, as a third behavior is chosen if the option is not specified. We accept now the "auto" value as an explicit choice for the default heuristics, and indicate in the descriptions of the option (which occur in commandline help and in the gluterfs / mount.glusterfs man pages) that auto is the default. The default heuristics was briefly described in the commandline help. We are getting rid of that, because: - it's not the right place to provide such details; - there is no guarantee of keeping the current heuristics so it might go out of sync with reality; - that is already the case to some degree, because the description did not take into account that the default heuristics varies between platforms (on Mac, it's just "off"), and that xlators can also prescribe direct I/O for the file of their choice (see change I3fe3312cd96baa4eecfe1247ab7255b4f455f049). Change-Id: Ia83479c0c67fe66b7fc2e0e8db5b7792d9f44b28 Signed-off-by: Csaba Henk <csaba@redhat.com>	2018-03-22 04:27:11 +00:00
Milind Changire	286871f550	rpcsvc: enable ownthread feature for glusterfs4_0_fop_prog Ownthread feature needs enabling for glusterfs4_0_fop_prog Change-Id: Idce63eb094ae0fdfcddbd52d0dee25aa0e074926 BUG: 1559075 Signed-off-by: Milind Changire <mchangir@redhat.com>	2018-03-22 02:49:34 +00:00
Niklas Hambüchen	ba87963b76	socket: Improve error logging when loading SSL files fails * Say which file had the problem * Dump openssl error stack Fixes gluster/glusterfs#431. Change-Id: I66e9a0ae7758e9d7d8a5f19cc8ff898f01f2b491 Signed-off-by: Niklas Hambüchen <mail@nh2.me>	2018-03-21 18:44:53 +01:00
Xavi Hernandez	b5f307fa5e	cluster/ec: fix SHD crash for null gfid's When the self-heal daemon is doing a full sweep it uses readdirp to get extra stat information from each file. This information is obtained in two steps by the posix xlator: first the directory is read to get the entries and then each entry is stated to get additional info. Between these two steps, it's possible that the file is removed by the user, so we'll get an error, leaving stat info empty. EC's heal daemon was using the gfid blindly, causing an assert failure when protocol/client was trying to encode the gfid. To fix the problem a check has been added. If we detect a null gfid, we simply ignore it and continue healing. Change-Id: I2e4acdcecd0b6951055e50d1c37d686a2186a228 BUG: 1558016 Signed-off-by: Xavi Hernandez <xhernandez@redhat.com>	2018-03-21 16:27:36 +00:00
Pranith Kumar K	448dec703d	cluster/afr: Switch to active-fd-count for open-fd checks BUG: 1557932 Change-Id: I3783e41b3812267bc10c0d05d062a31396ce135b Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2018-03-21 10:36:32 +05:30
Pranith Kumar K	2da6650dfa	storage/posix: Add active-fd-count option in gluster Problem: when dd happens on sharded replicate volume all the writes on shards happen through anon-fd. When the writes don't come quick enough, old anon-fd closes and new fd gets created to serve the new writes. open-fd-count is decremented only after the fd is closed as part of fd_destroy(). So even when one fd is on the way to be closed a new fd will be created and during this short period it appears as though there are multiple fds opened on the file. AFR thinks another application opened the same file and switches off eager-lock leading to extra latency. Fix: Have a different option called active-fd whose life cycle starts at fd_bind() and ends just before fd_destroy() BUG: 1557932 Change-Id: I2e221f6030feeedf29fbb3bd6554673b8a5b9c94 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2018-03-21 10:36:31 +05:30
Ashish Pandey	ade6262cb6	cluster/ec: Add test cases for stripe-cache option Change-Id: I1508a336a7a927b389a19815ef57001cdf29b109 BUG: 1558074 Signed-off-by: Ashish Pandey <aspandey@redhat.com>	2018-03-20 19:07:15 +00:00
Pranith Kumar K	2a326ad32e	features/shard: Do list_del_init() while list memory is valid Problem: shard_post_lookup_fsync_handler() goes over the list of inode-ctx that need to be fsynced and in cbk it removes each of the inode-ctx from the list. When the first member of list is removed it tries to modifies list head's memory with the latest next/prev and when this happens, there is no guarantee that the list-head which is from stack memory of shard_post_lookup_fsync_handler() is valid. Fix: Do list_del_init() in the loop before winding fsync. BUG: 1557876 Change-Id: If429d3634219e1a435bd0da0ed985c646c59c2ca Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2018-03-20 08:57:37 +00:00
Sunny Kumar	ccd7825334	georep : Pause/Resume of geo-replication with wrong user While performing pause/resume on geo-replication with wrong user (other user then you setup), always returns success. Which further leads to snapshot creation failure as it is detecting active geo-replication session. Change-Id: I6e96e8dd3e861348b057475387f0093cb903ae88 BUG: 1550936 Signed-off-by: Sunny Kumar <sunkumar@redhat.com>	2018-03-20 07:27:16 +00:00
Mohit Agrawal	cf06dd5440	glusterd: TLS verification fails while using intermediate CA Problem: TLS verification fails while using intermediate CA if mgmt SSL is enabled. Solution: There are two main issue of TLS verification failing 1) not calling ssl_api to set cert_depth 2) The current code does not allow to set certificate depth while MGMT SSL is enabled. After apply this patch to set certificate depth user need to set parameter option transport.socket.ssl-cert-depth <depth> in /var/lib/glusterd/secure_acccess instead to set in /etc/glusterfs/glusterd.vol. At the time of set secure_mgmt in ctx we will check the value of cert-depth and save the value of cert-depth in ctx.If user does not provide any value in cert-depth in that case it will consider default value is 1 BUG: 1555154 Change-Id: I89e9a9e1026e37efb5c20f9ec62b1989ef644f35 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>	2018-03-19 19:00:03 +00:00
Sven Fischer	de52876407	cleanup: xlator_t structure's 'client_latency' variable is not used - Removed unused struct member and its one time usage. - cleaned up wrong white space member 'client_latency' was not used otherwise since it was added by commit 07cc8679cdf3b29680f4f105d0222da168d8bfc1 Author: Kevin Vigor <kvigor@fb.com> Date: Tue Mar 21 08:23:25 2017 -0700 Halo Replication feature for AFR translator Change-Id: Ibb0ea828d4090bbe8897f6af326b317884162a00 BUG: 1495153 Signed-off-by: Sven Fischer <sven@fischer-abc.de>	2018-03-19 03:30:31 +00:00
Gaurav Yadav	97233b3f69	glusterd: glusterd crash in gd_mgmt_v3_unlock_timer_cbk Memory cleanup of same pointer twice inside gd_mgmt_v3_unlock_timer_cbk causing glusterd to crash. Change-Id: I9147241d995780619474047b1010317a89b9965a BUG: 1550339	2018-03-15 10:30:56 +05:30
Pranith Kumar K	346714305f	cluster/afr: Make AFR eager-locking similar to EC Problem: 1) Afr's eager-lock only works for data transactions. 2) When there are conflicting writes, write with conflicting region initiates unlock of eager-lock leading to extra pre-ops and post-ops on the file. When eager-lock goes off, it leads to extra fsyncs for random-write workload in afr. Solution (that is modeled after EC): In EC, when there is a conflicting write, it waits for the current write to complete before it winds the conflicted write. This leads to better utilization of network and disk, because we will not be doing extra xattrops and FSYNCs and inodelk/unlock. Moved fd based counters to inode based counters. I tried to model the solution based on EC's locking, but it is not similar to AFR because we had to keep backward compatibility. Lifecycle of lock: ================== First transaction is added to inode->owners list and an inodelk will be sent on the wire. All the next transactions will be put in inode->waiters list until the first transaction completes inodelk and [f]xattrop completely. Once [f]xattrop also completes, all the requests in the inode->waiters list are checked if it conflict with any of the existing locks which are in inode->owners list and if not are added to inode->owners list and resumed with doing transaction. When these transactions complete fop phase they will be moved to inode->post_op list and resume the transactions that were paused because of conflicts. Post-op and unlock will not be issued on the wire until that is the last transaction on that inode. Last transaction when it has to perform post-op can choose to sleep for deyed-post-op-secs value. During that time if any other transaction comes, it will wake up the sleeping transaction and takes over the ownership of the lock and the cycle continues. If the dealyed-post-op-secs expire, then the timer thread will wakeup the sleeping transaction and it will set lock->release to true and starts doing post-op and then unlock. During this time if any other transactions come, they will be put in inode->frozen list. Once the previous unlock comes it will move the frozen list to waiters list and moves the first element from this waiters-list to owners-list and attempts the lock and the cycle continues. This is the general idea. There is logic at the time of dealying and at the time of new transaction or in flush fop to wakeup existing sleeping transactions or choosing whether to delay a transaction etc, which is subjected to change based on future enhancements etc. Fixes: #418 BUG: 1549606 Change-Id: I88b570bbcf332a27c82d2767dfa82472f60055dc Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>	2018-03-14 13:32:35 +00:00
Ashish Pandey	f32f85c4e6	cluster/ec: Change default read policy to gfid-hash Problem: Whenever we read data from file over NFS, NFS reads more data then requested and caches it. Based on the stat information it makes sure that the cached/pre-read data is valid or not. Consider 4 + 2 EC volume and all the bricks are on differnt nodes. In EC, with round-robin read policy, reads are sent on different set of data bricks. This way, it balances the read fops to go on all the bricks and avoid heating UP (overloading) same set of bricks. Due to small difference in clock speed, it is possible that we get minor difference for atime, mtime or ctime for different bricks. That might cause a different stat returned to NFS based on which NFS will discard cached/pre-read data which is actually not changed and could be used. Solution: Change read policy for EC as gfid-hash. That will force all the read to go to same set of bricks. Change-Id: I825441cc519e94bf3dc3aa0bd4cb7c6ae6392c84 BUG: 1554743 Signed-off-by: Ashish Pandey <aspandey@redhat.com>	2018-03-14 11:52:05 +05:30
Varsha Rao	a4e34af38d	tests/basic/namespace: Fix the namespace test failure In the jenkins regression test brick multiplexing is enabled by is_brick_mx_enabled function and not by setting cluster.brick-multiplex option. Hence check the count of bricks and its logs, this fixes the failure. Change-Id: Ibb2ed8fbffd3765f283da741689304a5579d447c BUG: 1555167 Signed-off-by: Varsha Rao <varao@redhat.com>	2018-03-14 10:53:20 +05:30
Xavi Hernandez	7f81067f45	cluster/ec: avoid delays in self-heal Self-heal creates a thread per brick to sweep the index looking for files that need to be healed. These threads are started before the volume comes online, so nothing is done but waiting for the next sweep. This happens once per minute. When a replace brick command is executed, the new graph is loaded and all index sweeper threads started. When all bricks have reported, a getxattr request is sent to the root directory of the volume. This causes a heal on it (because the new brick doesn't have good data), and marks its contents as pending to be healed. This is done by the index sweeper thread on the next round, one minute later. This patch solves this problem by waking all index sweeper threads after a successful check on the root directory. Additionally, the index sweep thread scans the index directory sequentially, but it might happen that after healing a directory entry more index entries are created but skipped by the current directory scan. This causes the remaining entries to be processed on the next round, one minute later. The same can happen in the next round, so the heal is running in bursts and taking a lot to finish, specially on volumes with many directory levels. This patch solves this problem by immediately restarting the index sweep if a directory has been healed. Change-Id: I58d9ab6ef17b30f704dc322e1d3d53b904e5f30e BUG: 1547662 Signed-off-by: Xavi Hernandez <jahernan@redhat.com>	2018-03-14 03:12:27 +00:00
Raghavendra G	fe52fc33d0	tests/bug-1110262.t: fix a race condition This test does: 1. mount a volume 2. kill a brick in the volume 3. mkdir (/somedir) In my local tests and in [1], I see that mkdir in step 3 fails because there is no dht-layout on root directory. The reason I think is by the time first lookup on "/" hit dht, a brick was killed as per step 2. This means layout was not healed for "/" and since this is a new volume, no layout is present on it. Note that the first lookup done on "/" by fuse-bridge is not synchronized with parent process of daemonized glusterfs mount completing. IOW, by the time glusterfs cmd executed there is no guarantee that lookup on "/" is complete. So, if step 2 races ahead of fuse_first_lookup on "/", we end up with an invalid dht-layout on "/" resulting in failures. Doint an operation like ls makes sure that lookup on "/" is completed before we kill a brick Change-Id: Ie0c4e442c4c629fad6f7ae850437e3d63fe4bea9 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 1543279	2018-03-13 13:01:49 +00:00
Sven Fischer	c64fa14965	run-tests.sh: added dependency check for netstat Because bug-924726.t depends on netstat, tests failed before. This got resolved by adding respective check to run-tests.sh. Enabled respective test again. Change-Id: I70c9bff03379ed9ee8cd95842c3501dfb50b8e86 BUG: 1312830 Signed-off-by: Sven Fischer <sven@fischer-abc.de>	2018-03-12 23:19:24 +01:00

1 2 3 4 5 ...

11530 Commits