1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-14 19:24:43 +03:00

550 Commits

Author SHA1 Message Date
Ronnie Sahlberg
ae209c74c8 dont reset the event script context everytime we start a new "ctdb eventscript ..."
command.
Use the existing context used for non-monitor events

Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV

(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
2009-11-19 11:03:51 +11:00
Ronnie Sahlberg
bc2675119d add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.
add commandt to extract this in memory buffer and to clear it

(This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)
2009-11-18 12:44:18 +11:00
Ronnie Sahlberg
24c593d21f create a new event context for the syslog daemon
(This used to be ctdb commit 354c0edacf2d6cec5b295e139d4fec618bad1b06)
2009-11-17 12:07:10 +11:00
Ronnie Sahlberg
61de178e0a set up a pipe betweent he main daemon and the child we use for syslogling so that we can clean up the childprocess when we stop ctdbd
(This used to be ctdb commit cb8df973ccd446d87fbdd9a27843e54841ba5d89)
2009-11-16 15:17:32 +11:00
Volker Lendecke
1fa1830f81 Fix a segfault in the eventscript timeout handler.
The state was freed too early.

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit eda052101728cf922ce892e3c53b4f37e7ceac42)
2009-11-05 11:13:53 +01:00
Ronnie Sahlberg
d8f7fd88ac dont use the pointer after it has been talloc_free()d.
(This used to be ctdb commit 1cbf06a126621b3e932925cdad2ef9c009f93d4e)
2009-11-05 16:07:23 +11:00
Ronnie Sahlberg
4bf4e15379 move the check to skip vacuuming on persistent database to the ctdb_vacuuming_init() function
(This used to be ctdb commit fb83dba255fc91413a475b273e374e0c4d538137)
2009-11-03 10:48:27 +11:00
Michael Adam
fe9929165f server: disable vacuuming for persistent tdbs.
The vacuum process treats persistent databases the same as
non-persistent and thus ignores the extra state for transactions.
This way, it breaks the api-level transactions.

Michael

(This used to be ctdb commit f98fefbc566eefbfcc660646af6e25256ab82b13)
2009-11-03 00:16:28 +01:00
Ronnie Sahlberg
e33722a569 start the syslog child a little later, after we have forked and detached from the local shell
(This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)
2009-10-30 19:39:11 +11:00
Ronnie Sahlberg
5d73f19418 create a child process to write to syslog.
use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign.

we need this when syslog becomes "slow",   like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that

(This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)
2009-10-30 18:53:17 +11:00
Michael Adam
673a8588b1 server: fix debug message in trans2_commit (refusing persistent store during transaction)
log the right db_id
also log the client_id

Michael

(This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d)
2009-10-30 09:29:25 +11:00
Michael Adam
1de0c6f807 server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit
Michael

(This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376)
2009-10-30 09:28:06 +11:00
Michael Adam
7384dfe4a9 server: line-wrap a debug statement in trans2_commit
Michael

(This used to be ctdb commit 3be446434adb0f3095ac0ef4b7c4a6258780b863)
2009-10-30 09:27:33 +11:00
Michael Adam
7bfa959a86 server: output client_id in some debug messages in trans2_commit
Michael

(This used to be ctdb commit 11fefd02e6c9531ffb28b9e6acaf42ba39757d87)
2009-10-30 09:26:51 +11:00
Michael Adam
4d073bd779 server: fix a debug message in trans2_commit - log the correct db_id
Michael

(This used to be ctdb commit ab9657b5a66d5665e6c5fd1bf8eb4074a3bffeec)
2009-10-30 09:26:16 +11:00
Michael Adam
dca16d5f64 server: extend a debug message in ctdb_control_trans2_error()
Michael

(This used to be ctdb commit 0fb9573d1c838b436ab9be83e197b68f35f94acb)
2009-10-30 09:24:17 +11:00
Michael Adam
2187e6c379 server: add positive debug statements to trans2_commit and trans2_finished
When the operation completed / started successfully.

Michael

(This used to be ctdb commit 0df012d58eb83195ea0365be19e0566dbc394a66)
2009-10-30 09:23:29 +11:00
Michael Adam
0113744fec server: trans2_active: don't report a transaction active on the node that performs the transaction
Otherwise a node can lock itself out, e.g. when a commit control times out...

Michael

(This used to be ctdb commit cb432e30351d5e5a41e98da3c7b1c2a4d400a3a2)
2009-10-30 09:22:18 +11:00
Wolfgang Mueller-Friedt
9713b8ea9a ensure tdb names end with .tdb. and any number of digits
(This used to be ctdb commit 8ab1349feb64a91cb500c130ea299e2182491f06)
2009-10-29 13:46:37 +11:00
Wolfgang Mueller-Friedt
2c137b7030 vacuuming needed additional check before getting rid of the record; there is a gap between selecting the records and deleting them, therefore we have to check if the records still can be deleted when we actually are about to delete them
(This used to be ctdb commit a6fbc65aca35c41c428a82d7402e43c6eaac1d6e)
2009-10-29 13:45:17 +11:00
Ronnie Sahlberg
f5e90ec3b5 Revert "From Wolfgang M."
This reverts commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed.

(This used to be ctdb commit 363e7e939ad46b3f75c83c30d4163d63876c2456)
2009-10-29 13:44:12 +11:00
Ronnie Sahlberg
023d09cd38 Revert "update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover."
This reverts commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36.

(This used to be ctdb commit cb36bbb5418290e8e5b770d2d836285b15da2a6f)
2009-10-29 10:49:00 +11:00
Ronnie Sahlberg
279b7ca564 update the "uptime" command to indicate the "time since last" is the time since the last recovery OR failover.
(This used to be ctdb commit 3b0d44497800a16400d05a30bdaf6e6c285d4b36)
2009-10-29 10:37:10 +11:00
Michael Adam
abac42ca34 server: add a new ctdb control CTDB_TRANS2_ACTIVE
This aske the daemon wheter a transaction is currently active on a
given DB on that node. More precisely this asks for the transaction_active
flag in the ctdb_db_context that is set in the CTDB_TRANS2_COMMIT
control and cleared in the CTDB_TRANS2_ERROR or CTDB_TRANS2_FINISHED controls.

This will be useful for fixing race conditions in the transaction code.

Michael

(This used to be ctdb commit 8d430ae6968dfe566614379436fc3c56003fcd88)
2009-10-29 10:14:30 +11:00
Ronnie Sahlberg
d379b30182 create a separate context for non-monitor eventscripts so they dont collide
(This used to be ctdb commit 325de818f88f339a16dc4544e899a2d735933c44)
2009-10-28 17:35:15 +11:00
Ronnie Sahlberg
f8a8c0d6e4 return 0 in the event script callback if it was aborted by a different script
(This used to be ctdb commit 8d5cb2586a1d5a0255cc18295430927b914d4527)
2009-10-28 16:40:31 +11:00
Ronnie Sahlberg
e07ca41886 change the eventscript handling to allow EventScriptTimeout for each individual script isntead of for the entire set of scripts
restructure the talloc hierarchy to allow this

(This used to be ctdb commit 64da4402c6ad485f1d0a604878a7b0c01a0ea5f0)
2009-10-28 16:11:54 +11:00
Ronnie Sahlberg
3526bc830d Enhance the logging fromeventscripts.
When a single script is finished, also log the name of the script, the duration it took and the return status.

In the loop where we signal back to the main daemon that the script finished, do this once every 100ms instead of once every 1 second

(This used to be ctdb commit 6a1f7a7b1b3a0b8f89998db8fdad83bbb4e9b5a5)
2009-10-28 09:07:43 +11:00
Ronnie Sahlberg
d1bf89a617 temporarily try allowing clients to attach to databases even if the node is banned/stopped or inactive in any other way.
(This used to be ctdb commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0)
2009-10-27 15:17:45 +11:00
Ronnie Sahlberg
1d7681709b dont run the monitor event so frequently after a event has failed.
use _exit() instead of exit() when terminating an eventscript.

(This used to be ctdb commit cc30ee2f4f33cb75b2be980c2d4dff6c7c23852f)
2009-10-27 13:51:45 +11:00
Ronnie Sahlberg
4d40b86805 for debugging
add a global variable holding the pid of the main daemon.
change the tracking of time() in the event loop to only check/warn when called from the main daemon

(This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)
2009-10-27 13:18:52 +11:00
Stefan Metzmacher
198866d82d server: if takeover runs when the recovery master becomes unhealthy
The problem was this:

When the monitor event fails, the node->flags get updated,
and an update (containing the old and new flags) is sent to
the recovery master.

If the recovery master sends the update to itself (the same process),
it was compairing the node->flags variable with the received new flags.
This check always found both flag values to be equal
and never sets the rec->need_takeover_run variable to true.

There were two problem, first the push_flags_handler() function
didn't pass the received old flags.

And the ctdb_control_modflags() function ignored the received old flags.

metze

(This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)
2009-10-26 14:21:45 +11:00
Stefan Metzmacher
7a616a0d7b server: print out the full 64-bit srvid on 32-bit hosts
metze

(This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949)
2009-10-26 14:20:52 +11:00
Ronnie Sahlberg
2d06e9d252 automatically re-activate the reclock file check if we set the reclock file to something
(This used to be ctdb commit db250cad7c92c1cc0a690725a4e39531a2e1b7fd)
2009-10-26 10:13:20 +11:00
Ronnie Sahlberg
5aaa15fdb2 lower the log level of a debug message
(This used to be ctdb commit 496dc2e80b714811c6e69dc928deaad61cf603b1)
2009-10-26 09:35:18 +11:00
Ronnie Sahlberg
86d1b4c465 Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects.
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
   A special prefix for the srvid space has been set aside for samba :
   Only samba is allowed to use srvid's with the top 32 bits set like this.
   The lower 32 bits are for samba to use internally.

2, register a "notification" using the new control :
                    CTDB_CONTROL_REGISTER_NOTIFY         = 114,
   This control takes as indata a structure like this :
struct ctdb_client_notify_register {
        uint64_t srvid;
        uint32_t len;
        uint8_t notify_data[1];
};

srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.

If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.

A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".

3, a client that no longer wants to have a notification set up can deregister using control
                    CTDB_CONTROL_DEREGISTER_NOTIFY       = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
        uint64_t srvid;
};

When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.

(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
2009-10-23 15:24:51 +11:00
Ronnie Sahlberg
c61c655769 when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file
(This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)
2009-10-23 13:55:21 +11:00
Ronnie Sahlberg
3c9b43531a set the eventscripts to timeout after 20 seconds
change the ban count to 10 failures before we ban by default

(This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)
2009-10-23 13:54:45 +11:00
Ronnie Sahlberg
e627fae600 if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters.
The waiters reference the locakwait handle in order to remove itself from the li
nked list which caused a SEGV.

We dont actually need to remove ourselves from this list here since
if the parent freeze_handle holding the list is freed, then all waiters are rele
ased as well, and the only place we actually need to relink the waiter is in ctd
b_freeze_lock_handler, where we want to respond back to the clients and release
the waiters  but we still want to keep the freeze_handle hanging around.

(This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)
2009-10-22 13:41:28 +11:00
Ronnie Sahlberg
902c476c03 From Volker L
Fix some warnings  and an incorrect check for a talloc failure

(This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)
2009-10-22 12:19:40 +11:00
Ronnie Sahlberg
831f9e05a6 From Wolfgang M.
With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.

(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)
2009-10-22 07:58:44 +11:00
Ronnie Sahlberg
d5fd4fc0ce During tests it is common to add/delete test eventscripts at runtime.
This can race with teh eventascript handling that does a :

list all scripts,   sort them,  then execute them

so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy

(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)
2009-10-21 16:50:39 +11:00
Ronnie Sahlberg
a92ba7f729 lower the debug levels for the "create FD messages" so we dont fill up the logs.
(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)
2009-10-21 15:26:24 +11:00
Ronnie Sahlberg
9b8c72c446 When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
2009-10-21 15:20:55 +11:00
Michael Adam
769a36c048 In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed.
Spotted by Volker.

Michael

(This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)
2009-10-21 12:49:59 +11:00
Ronnie Sahlberg
14b14a2efb mprove the log message when we skip the ip allocation check from the recovery daemon.
we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery.

(This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445)
2009-10-21 11:51:30 +11:00
Ronnie Sahlberg
28f277acd4 From Wolfgang Mueller
make sure to always create the vactun database and get rid of some annoying log messages

(This used to be ctdb commit 54f9c314a0354f1039208fe6ac7dc159b6db8750)
2009-10-20 13:01:15 +11:00
Ronnie Sahlberg
d788dd3627 From wolfgang Mueller
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned

(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
2009-10-20 12:59:48 +11:00
Ronnie Sahlberg
598419e57b Dont run eventscript monitor when the databases are frozen.
The databases can become frozen a while before we do the actual recovery
since we have the re-recovery timeout.

There is no point in doing much monitoring if we are waiting for a recovery,
or if we are banned.
This will eliminate some annoying log entries where certain tests will fail if the databases are locked.

(This used to be ctdb commit ff824676fab94168707aada7423ae766bc0f711c)
2009-10-15 16:03:43 +11:00
Ronnie Sahlberg
9de3652380 add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
2009-10-15 11:24:54 +11:00