1
0
mirror of https://github.com/samba-team/samba.git synced 2025-02-04 17:47:26 +03:00

457 Commits

Author SHA1 Message Date
Ronnie Sahlberg
df00979158 When we create new election data to send during elections, we must re-read the node flags from the main daemon to catch when the STOPPED flag is changed.
(This used to be ctdb commit ca4982c40d81db528fe915d5ecc01fcf7df0b522)
2009-07-17 11:37:03 +10:00
Ronnie Sahlberg
0c5f5ae58d stopped nodes can not win a recmaster election
stopped nodes must yield the recmaster role

(This used to be ctdb commit b75ac1185481060ab71bd743e1e48d333d716eba)
2009-07-09 14:44:03 +10:00
Ronnie Sahlberg
b57811bee6 change the infolevel when logging stop/continue commands
(This used to be ctdb commit 1e007c833098b03dd81797c081da1ae1b10c971c)
2009-07-09 14:34:12 +10:00
Ronnie Sahlberg
82c1be95ed recovery daemon needs to monitor when the local ctdb daemon is stopped and ensure that the databases gets frozen and the node enters recovery mode
(This used to be ctdb commit 99f239f8b96c8c0a06ac8ca8b8083be96265865a)
2009-07-09 14:19:32 +10:00
Ronnie Sahlberg
41a519191e dont let other nodes modify the STOPPED flag for the local process when pushing out flags changes
(This used to be ctdb commit 501a2747d839ca291b70c761098549cf6d47a158)
2009-07-09 13:20:14 +10:00
Ronnie Sahlberg
88f3c40d9c add two new controls, CTOP_NODE and CONTINUE_NODE
that are used to stop/continue a node instead of using modflags messages

(This used to be ctdb commit 54b4a02053a0f98f8c424e7f658890254023d39a)
2009-07-09 12:22:46 +10:00
Ronnie Sahlberg
66c8d4fb3d make it possible to start the daemon in STOPPED mode
(This used to be ctdb commit 866aa995dc029db6e510060e9e95a8ca149094ac)
2009-07-09 11:57:20 +10:00
Ronnie Sahlberg
1593e67399 send ARPs with an interval of 1.1 seconds during ip takeover.
this is to better handle linux clients which often default to ignore grat arps that arrive within 1 second of eachother.

(This used to be ctdb commit 5664da36943b4901a807a9594b0f45e859aafbf3)
2009-07-07 11:40:01 +10:00
Ronnie Sahlberg
289c58e9b6 add a new command "ctdb ipreallocate", this command will force the recovery master to perform a full ip reallocation process.
the ctdb command will block until the ip reallocation has comleted

(This used to be ctdb commit abad7b97fe0c066b33f6e75d0953bbed892a3216)
2009-07-02 13:00:26 +10:00
Ronnie Sahlberg
e6e1ff32a5 dont try sending a keepalive if the transport is down
(This used to be ctdb commit 5cdc04669db8c2ddbbff5af82307a16e8d807b83)
2009-06-30 12:17:05 +10:00
Ronnie Sahlberg
6450ae533a Dont even try allocating and sending a CALL packet if the transport is down
(This used to be ctdb commit cb8dd896914d4e44ad7b8bb000176a7c78f394ae)
2009-06-30 12:16:13 +10:00
Ronnie Sahlberg
127754e192 failing a dmaster send due to the transport being down is fatal
(This used to be ctdb commit c17dafc79bec25bbb796478c33f503503d382a20)
2009-06-30 12:14:58 +10:00
Ronnie Sahlberg
757ba01ddc if we fail a dmaster migration due to the transport being down, then that is a fatal condition.
(This used to be ctdb commit 75dea671f68ac6649095357c36b3697a927721e9)
2009-06-30 12:13:15 +10:00
Ronnie Sahlberg
dd1774cd85 dont try to send error packets if the transport is down
(This used to be ctdb commit 65b94d280731df3245b26d69f39acfaf5bccf0d8)
2009-06-30 12:10:27 +10:00
Ronnie Sahlberg
d4b30b34aa dont even try to send a message from the main daemon if the transport is down
(This used to be ctdb commit 9a2c4c3ed09ac9ea781d06999d11e5c3b5b4a97a)
2009-06-30 12:09:28 +10:00
Ronnie Sahlberg
9e5064dcea Dont try to allocate and send packets if the transport is down
(This used to be ctdb commit 945f04f06a425fd3940a2e4b832c63223a3f26b3)
2009-06-30 12:03:12 +10:00
Ronnie Sahlberg
22fb69d337 dont even try to allocate a packet if the transport is down since it will fail
(This used to be ctdb commit a73f316cb9cec877dc0bc3f7baa21be1b1454273)
2009-06-30 11:55:42 +10:00
Ronnie Sahlberg
816db4be38 Do not allow the "VerifyRecoveryLock" tunable to be changed if there is no reclock file
(This used to be ctdb commit 5334e40978350b6b597ee020bac52e37c8f9a8ba)
2009-06-25 14:45:17 +10:00
Ronnie Sahlberg
969cb64056 disable VerifyRecoveryLock when the user modifies the filename
(This used to be ctdb commit d973cb6e83b2f7cc37bd39c1219dcfbd4911a8ee)
2009-06-25 14:34:21 +10:00
Ronnie Sahlberg
5b235c3999 add a control to set the reclock file
(This used to be ctdb commit 36cc2e586f03fa497ee9b06f3e6afc80219c4aaa)
2009-06-25 14:25:18 +10:00
Ronnie Sahlberg
7f8d98ebb0 update the recovery daemon to read the recovery lock file off the main daemon and handle when the file is changed/enabled/disabled
(This used to be ctdb commit 31acc11a6389d4dd9f7b71b7cfa2f2450076f1f7)
2009-06-25 12:55:43 +10:00
Ronnie Sahlberg
2b253c094c add a control to read the current reclock file from a node
(This used to be ctdb commit ed6a4cbcdcbb4e0df83bec8be67c30288bf9bd41)
2009-06-25 12:17:19 +10:00
Ronnie Sahlberg
77ef745394 Allow setting the recovery lock file as "", which means that we do not use a file and that we implicitely also disable the recovery lock checking.
Update the init script to allow starting without a reclock file.

(This used to be ctdb commit 07855ff5eba71e7d607d52e234a42553d9b93605)
2009-06-25 11:50:45 +10:00
Ronnie Sahlberg
180a576f7b Dont access the reclock file at all if VerifyRecoveryLock is zero and also
make sure the reclock file is closed if the variable is cleared at runtime

(This used to be ctdb commit a25f4888689a0725971606163d87c39a41669292)
2009-06-25 11:41:18 +10:00
Ronnie Sahlberg
de1402d471 dont log an error if waitpid returns -1 and errno is ECHILD
(This used to be ctdb commit fdf50f3e774e3980af81c0b6f4ff81d085f4f697)
2009-06-19 15:55:13 +10:00
Ronnie Sahlberg
baead0fdcc dont leak file descriptors when set recmdoe timesout
(This used to be ctdb commit fc8a364eb095ec11ca01246a583bf1dc53510141)
2009-06-19 14:58:06 +10:00
Ronnie Sahlberg
d3c5fb4bd1 dont leak file descriptors
(This used to be ctdb commit 268c3e4b269a92741a02280c84384178e73de10e)
2009-06-19 14:54:22 +10:00
Ronnie Sahlberg
d72b14e86c in the recovery daemon, check that the recovery master can access the recovery lock file and verify it is not stale from a child process.
This allows us to timeout the operation if the underlying filesystem has become temporarily unresponsive without causing a new recovery.

(This used to be ctdb commit d177b08f1dc79534491f27726b05405d47e12e20)
2009-06-19 14:44:26 +10:00
Ronnie Sahlberg
1183b364f1 reduce the timeout we wait for the reclock child process to finish to 5 seconds
before we log an error and abort

(This used to be ctdb commit 6d1e4321b63973c2e53c63d386e8cc0bd9605cae)
2009-06-19 13:09:11 +10:00
Ronnie Sahlberg
0ddf79a3bc increase the timeout before we shutdown when ther ecovery daemon is hung
(This used to be ctdb commit facddcacb4a961cddb117818fa38a3e97770b2fa)
2009-06-18 09:20:18 +10:00
Ronnie Sahlberg
d1c40424f6 When we ban a node, only drop the IPs on the node being banned, not on every node
(This used to be ctdb commit 46e8c3737e6ff54fc80de8e962e922924c27bc35)
2009-06-10 10:35:20 +10:00
Ronnie Sahlberg
b046f5e3aa when adding an ip, try manually adding and takingover the ip instead of triggering a full recovery to do the same thing
(This used to be ctdb commit 4d5d22e64270cfb31be6acd71f4f97ec43df5b2c)
2009-06-05 17:00:47 +10:00
Ronnie Sahlberg
5371e3a793 lower the loglevel when we long that we skip an evenscript because it is not executable
(This used to be ctdb commit c265df3c7950aab51b8b6ef17040229b97345c35)
2009-06-01 15:29:36 +10:00
Ronnie Sahlberg
6c0c3577f8 dont try to queue packets for sending to (recently) deleted nodes since these nodes do not have a queue.
(This used to be ctdb commit 1b7c88ae7643f9bcc52b1d33095f97de88fc2316)
2009-06-01 14:56:19 +10:00
Ronnie Sahlberg
8a0880c843 when building the initial vnnmap, make sure to skip any deleted nodes
(This used to be ctdb commit 0cd66c744cd9533ce8d4c4374bcee3bf49b66dae)
2009-06-01 14:44:15 +10:00
Ronnie Sahlberg
dc5e4906cc use num_nodes and the nodes array instead of walking the vnnmap
when counting the number of active nodes

(This used to be ctdb commit df20cd9b05ad9ca72e32ccc42354eafc12b68c04)
2009-06-01 14:39:34 +10:00
Ronnie Sahlberg
e6170b5389 add a new node state : DELETED.
This is used to mark nodes as being DELETED internally in ctdb
so that nodes are not renumbered if / when they are removed from the nodes file.

This is used to be able to do "ctdb reloadnodes" at runtime without
causing nodes to be renumbered.
To do this, instead of deleting a node from the nodes file, just comment it out like

   1.0.0.1
   #1.0.0.2
   1.0.0.3

After removing 1.0.0.2 from the cluster,  the remaining nodes retain their
pnn's from prior to the deletion, namely 0 and 2

Any line in the nodes file that is commented out represents a DELETED pnn

(This used to be ctdb commit 6a5e4fd7fa391206b463bb4e976502f3ac5bd343)
2009-06-01 14:18:34 +10:00
Ronnie Sahlberg
4259156050 dont remove the socket when the dameon stops. This can race if the
service is immediately restarted

(This used to be ctdb commit b18356764cd49d934eab901e596bb75c6e3ecdf8)
2009-05-29 18:16:13 +10:00
Ronnie Sahlberg
96340bd166 Revert "we only need to have transaction nesting disabled when we start the new transaction for the recovery"
This reverts commit bf8dae63d10498e6b6179bbacdd72f1ff0fc60be.

(This used to be ctdb commit 87292029cb444ffab130ff7dae47a629c2d15787)
2009-05-25 16:55:27 +10:00
Ronnie Sahlberg
270907faec Revert "set the TDB_NO_NESTING flag for the tdb before we start a transaction from within recovery"
This reverts commit 1b2029dbb055ff07367ebc1f307f5241320227b2.

(This used to be ctdb commit 9762a3408f10409b629637d237ec513a825a6059)
2009-05-25 16:55:02 +10:00
Ronnie Sahlberg
26e1486db7 Whitespace changes and using the CTDB_NO_MEMORY() macro changes to
the previous patch.

(This used to be ctdb commit d623ea7c04daa6349b42d50862843c9f86115488)
2009-05-21 11:49:16 +10:00
Sumit Bose
2fcedf6dac add missing checks on so far ignored return values
Most of these were found during a review by Jim Meyering <meyering@redhat.com>

(This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)
2009-05-21 11:22:21 +10:00
Sumit Bose
11988fc77a structure member node_list_file is not used anywhere
(This used to be ctdb commit 0e84ea23d1d998d4d4ac7d8a858b3d8294f056cb)
2009-05-21 11:16:43 +10:00
Sumit Bose
9171a7784c structure member logfile is not used anywhere
(This used to be ctdb commit 4f86c991812c2d0bddbe3de9a9906cf5df118cd4)
2009-05-21 11:15:43 +10:00
Ronnie Sahlberg
9a3e19658d Change the loglevel of "registered tcp client for ..." to INFO
instead of ERR

(This used to be ctdb commit 92b5580c38c23b99c1692708540983b0c0fcd6cf)
2009-05-19 08:55:42 +10:00
Ronnie Sahlberg
98a54c4675 Track how long it takes to take out the recovery lock from both the main dameon and also from the recovery daemon.
Log this in "ctdb statistics".

Also add a varaible "RecLockLatencyMs" that will log an error everytime it takes longer than this to access the reclock file.

(This used to be ctdb commit 042377ed803bb8f7ca9d6ea1a387427b7b8ba45a)
2009-05-14 10:33:25 +10:00
Ronnie Sahlberg
42891227a4 add extra debug statements to the log to make it easier to see when a recovery dameon has hung due to the underlying filesystem hanging.
(This used to be ctdb commit 5b0067a4e335cbbf6e606646e612d4bfcfdb7441)
2009-05-12 18:39:34 +10:00
root
08492a524b change the talloc hierarchy for the main transaction_start context and the individual transaction_all handles
(This used to be ctdb commit 919b29850671b59bcf748aec25658ea09d8b4f1c)
2009-05-06 07:33:07 +10:00
root
af25fa38f3 fixed a problem with clients disconnecting during a traverse
When a client (such as smbstatus) is killed, it may have outstanding
traverse children on remote nodes. We need to catch the client
disconnect in ctdbd and send a control to all nodes telling them to
kill those outstanding traverse children.

(This used to be ctdb commit f2fb2df4619a14f7f6c11f9132ee7d793028042c)
2009-05-06 07:32:25 +10:00
root
bfea570af4 when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0
Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero   and when the client disconnects we decrement it to a negative number.

this is a pure cosmetic patch with no operational impact to ctdb

(This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)
2009-05-01 12:30:26 +10:00