1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

337 Commits

Author SHA1 Message Date
Andrew Tridgell
98502135e7 added new multi-record transaction commit code
(This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)
2008-07-30 19:57:00 +10:00
Andrew Tridgell
abe0232818 rename the structure we use for marshalling multiple records
(This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106)
2008-07-30 14:24:56 +10:00
Andrew Tridgell
79793708a4 fixed buffering in ctdb logging code to handle multiple lines
correctly

(This used to be ctdb commit e8ef9891aa31c374921b23cc74e1eda1f8218bf0)
2008-07-23 15:25:52 +10:00
Ronnie Sahlberg
1bfcca524d From Michael Adams,
change one element from private to private_data

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e)
2008-07-22 09:07:42 +10:00
Ronnie Sahlberg
d0707c98c0 if a new node enters the cluster, that node will already be frozen at start
but the rest of the nodes are not frozen.

at this stage an election is called by the new node.

Since in this case the nodes are not froze, we can not modify the recmaster
of the nodes so it is expected that this control would fail.

Add a boolean to send_election_request() to make it not
try to set the recmaster locally for the case where we are in an election phase
while not frozen.

(This used to be ctdb commit c5035657606283d2e35bea40992505e84ca8e7be)
2008-07-18 12:07:25 +10:00
Ronnie Sahlberg
6d5f96c249 lower a debug statement
(This used to be ctdb commit 3d58f9b524a40c7b43a2a855212db090e9becefa)
2008-07-18 10:41:18 +10:00
Ronnie Sahlberg
8b520bcb5f lower a debug message
(This used to be ctdb commit 554dcf16d37c8b9e4704df11d21fb272f30f5cec)
2008-07-18 10:38:51 +10:00
Ronnie Sahlberg
90ff67dc74 Only decrement the "number of persistent writes in flight" If/when
it is >0    or we will break if used against an unpatched samba server

(This used to be ctdb commit 52a38487f981fd5981c02a7a063ad2c598591c10)
2008-07-17 18:47:20 +10:00
Ronnie Sahlberg
6eb4e46fe1 Add two new controls to start and cancel a persistent update.
This allows ctdb to automatically start a new full blown recovery
if a client has started updating the local tdb for a persistent database
but is kill -9ed before it has ensured the update is distributed clusterwide.

(This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)
2008-07-17 13:50:55 +10:00
Ronnie Sahlberg
0964c59dc6 Do not allow "ctdb eventscript" to start new eventscripts while we are in recovery mode
(This used to be ctdb commit 8140825e1d06053a900fd0adf0a150622c0fc146)
2008-07-17 09:04:15 +10:00
Ronnie Sahlberg
e4e298e10e change how we filter out "empty" records in the traversecode
so that we output the same list of keys in "catdb" as "tdbdump".

when traversing a persistent database, as an optimization, only
traverse on the local node (and thus skip checking if we are
dmaster or not). If the local node is not part of the vnnmap and thus
would not be guaranteed to have an uptodate persistent database
we instead traverse it on one of the other nodes that are in the vnnmap.

(This used to be ctdb commit 2b0bd6c302545f2533a7a67dfc6bb5f9f60799f7)
2008-07-16 12:23:18 +10:00
Ronnie Sahlberg
66222af5e4 Fix a very subtle race where we could get a double free of a talloced
memory if ctdb_run_eventscript() would be called
during processing of ctdb_event_script_timeout() for
user unvoked eventscripts. (eventsccripts invoked by "ctdb eventscript ...")

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 15bc66ae801b0c69a65a7a2acf5df151e76edc2a)
2008-07-11 10:33:46 +10:00
Ronnie Sahlberg
ab8535eaa5 make LVS a capability so that we can see which nodes are configured with
LVS and which are not using LVS.

"ctdb getcapabilities"

(This used to be ctdb commit 172d01fb34f032e098b1c77a7b0f17bf11301640)
2008-07-10 10:37:22 +10:00
Ronnie Sahlberg
334db8ccba proper waitpid() fix.
remove all waitpid() calls and use the event system to trap sigchld

(This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)
2008-07-09 14:02:54 +10:00
Ronnie Sahlberg
522830dea8 Revert "waitpid() can block if it takes a long time before the child terminates"
This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10.

revert the waitpid changes.   we need to waitpid for some childredn so should
refactor the approach completely

(This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)
2008-07-08 17:41:31 +10:00
Ronnie Sahlberg
79425ddec5 Revert "set sigchild to SIG_IGN instead of SIG_DFL"
This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76.

(This used to be ctdb commit 2030e9ff2ca044181b72c3b87d513bf27057b5a2)
2008-07-08 17:40:53 +10:00
Ronnie Sahlberg
71d2315eee set sigchild to SIG_IGN instead of SIG_DFL
(This used to be ctdb commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76)
2008-07-08 16:31:23 +10:00
Ronnie Sahlberg
d67de4a7d2 waitpid() can block if it takes a long time before the child terminates
so we should not call it from the main daemon.

1, set SIGCHLD to SIG_DFL to make sure we ignore this signal

2, get rid of all waitpid() calls

3, change reporting of event script status code from _exit()/waitpid()   to write()/read() one byte across the pipe.

(This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)
2008-07-08 03:48:11 +10:00
Ronnie Sahlberg
6bfbec28a4 use more libral handling of event scripts timing out.
If the event script that timed out was for the "monitor" event, then
even if it timed out we still return SUCCESS back to the guy invoking the eventscript.
Only consider the eventscript for "monitor" to have failed with an error
IFF it actually terminated with an error,   or if it timed out 5 times in a row and hung.

(This used to be ctdb commit 60f3c04bd8b20ecbe937ffed08875cdc6898b422)
2008-07-07 20:38:59 +10:00
Ronnie Sahlberg
2003196816 we need a 'case x:' in our ugly 'encode the control opcode as a linenumber in valgrind output' hack to make it work
(This used to be ctdb commit f4929e164be1703f74fc332e740b85cfe1ae3e73)
2008-07-07 08:52:04 +10:00
Ronnie Sahlberg
64e02585e7 If a transaction commit fails. Log this error and cancel all pending transactions to the
databases instead of calling ctdb_fatal()

(This used to be ctdb commit ff2985aaef999d180277db4cf644fee0ea79c14d)
2008-07-07 08:51:05 +10:00
Ronnie Sahlberg
f25fd04f73 in the destructor for the lock-wait child, make sure that we cancel any pending
transactions.

(This used to be ctdb commit 45b6ff64f6ddf037b810c4e5f8b9f04d71067b98)
2008-07-07 08:50:12 +10:00
Andrew Tridgell
9999f18369 an extraordinarily ugly patch!
This is a hack to allow backtraces under valgrind to show what opcode
is getting uninitialised bytes

(This used to be ctdb commit 67bb12c8f0af5914efb44b76bc6ddbb11fc0fcdf)
2008-07-04 18:00:24 +10:00
Andrew Tridgell
50cd520c6a don't use mmap in tdb if --nosetsched is set. That makes valgrind
happier (it doesn't like the mmap/msync calls in tdb)

(This used to be ctdb commit f3a729998ce67f5d2e3b2ad41d96e8f04c0d18d8)
2008-07-04 17:32:21 +10:00
Andrew Tridgell
b3bcb42774 fixed a warning
(This used to be ctdb commit 015cd221c3c62eaa3cd0351fb8e93292c7c293aa)
2008-07-04 17:04:37 +10:00
Andrew Tridgell
60e5d83cb0 fixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the
_VOID varient

(This used to be ctdb commit 07c9133aedecaee3607ad3b6fa94e5c56417a9de)
2008-07-04 17:04:26 +10:00
Andrew Tridgell
07e145316c zero out the ctdb->freeze_handle when we free it
This prevents heap corruption when a freeze child dies

(This used to be ctdb commit 4edc6d40cb63936146af99030b7819683238abfc)
2008-07-04 16:05:04 +10:00
Ronnie Sahlberg
64c4639ce9 we dont need to explicitely thaw the databases from the recovery daemon
since this is already done implicitely when we changed recovery mode
back to normal

(This used to be ctdb commit af1f6cf7561fe9cb5c97f940d4458c83bdd8e2a0)
2008-07-03 12:46:09 +10:00
Ronnie Sahlberg
ef769e7237 track both when we last started and ended a recovery.
make ctdb uptime print how long the recovery took

in the recovery daemon when we check that the public ip address
allocation on the local node is correct (we have the ips we should have
and we dont have any we shouldnt have) use ctdb uptime and check the
recovery start/stop times and make sure we dont check for ip allocation
inconsistencies during a recovery  where the ip address allocation is in flux.

(This used to be ctdb commit f86551580349b7f662f9a07e4eb0c1189e38e429)
2008-07-02 13:55:59 +10:00
Ronnie Sahlberg
bb2019bb0f update a comment to reflect that this is not always a real recovery
it can also be printed when we just do an ip reallocation

(This used to be ctdb commit e4c9e511fc5e15e0638ebb9117cb4a65ca8fda4b)
2008-07-02 12:01:19 +10:00
Ronnie Sahlberg
1ccc4a8e2b test
(This used to be ctdb commit 4f2d722cf29175c3c207e6ebb6d4f9e370767249)
2008-06-26 14:14:37 +10:00
Ronnie Sahlberg
f1b3ddc357 Revert "test"
This reverts commit f71287a28d66db202fe52f9a43b6daf2389d7f66.

(This used to be ctdb commit a928857e38d645baca62cea7f7367488d140dca7)
2008-06-26 14:00:36 +10:00
Ronnie Sahlberg
2cffc2e9c6 test
(This used to be ctdb commit f71287a28d66db202fe52f9a43b6daf2389d7f66)
2008-06-26 13:51:18 +10:00
Ronnie Sahlberg
c5de452dca reduce loglevel of the info message we are updating the flags on all nodes
(This used to be ctdb commit 9a98a21979558dcd6421b3fcb97d21ab82b792d8)
2008-06-26 13:15:41 +10:00
Ronnie Sahlberg
c5e7e0b2fd force an update of the flags from the recmaster after each monitoring run
(This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4)
2008-06-26 13:08:37 +10:00
Ronnie Sahlberg
cfc0af79ce third attempt for fixing a freeze child writing to the socket
(This used to be ctdb commit b8c8c5cb351747863c5d1366b57c96122ade5db0)
2008-06-26 11:52:26 +10:00
Ronnie Sahlberg
97f8bf16c5 verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be
(This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e)
2008-06-26 11:08:09 +10:00
Ronnie Sahlberg
2910ea1606 only loop over the write it the write failed
(This used to be ctdb commit b99d687894cb69d863345713055d9c8dc1b29194)
2008-06-26 11:02:08 +10:00
Ronnie Sahlberg
77ef05e95b the write() from the freeze child process can fail
try writing many times and log an error if the write failed

(This used to be ctdb commit f15b224e42e81cda84b98f01f919d463e80fb89f)
2008-06-26 09:54:27 +10:00
Ronnie Sahlberg
fd921aea28 ban the node after 3 failed scripts by default
(This used to be ctdb commit b4e6d8e37c7f985f357af82b4a524959bb97ec4c)
2008-06-13 13:45:23 +10:00
Ronnie Sahlberg
779468ab3f if the event scripts hangs EventScriptsBanCount consecutive times in a row
the node will ban itself for the default recovery ban period

(This used to be ctdb commit 7239d7ecd54037b11eddf47328a3129d281e7d4a)
2008-06-13 13:18:06 +10:00
Ronnie Sahlberg
30535c815d when a eventscript has timed out, log the event options (i.e. "monitor" "takeip 1.2..." etc)
to the log

(This used to be ctdb commit dbe31581abf35fc4a32d3cbf487dd34e2b9c937a)
2008-06-13 12:18:00 +10:00
Ronnie Sahlberg
e6d1d766c5 make it possible to re-start a recovery without marking the current node as
the culprit.

(This used to be ctdb commit 3a69fad0b1dee4a482461680c556358409e53c4d)
2008-06-13 11:47:42 +10:00
Ronnie Sahlberg
4b6b094860 add a callback for failed nodes to the async control helper.
this callback is called for every node where the control failed (or timed out)

when we issue the start recovery control from recovery master,
set any node that fails as a culprit   so it will eventually be banned

(This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2)
2008-06-12 16:53:36 +10:00
Ronnie Sahlberg
d8433cacb2 first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

(This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)
2008-06-04 17:12:57 +10:00
Ronnie Sahlberg
598fba7fad fix a comment
note that we dont actually send the ipv6 "gratious arp" on the wire just yet.
(since ipv6 doesnt use arp)
but all the infrastructure is there when we implement sending raw neig.disc. packets

(This used to be ctdb commit b87fab857bc9b3537527be93b7f68484502d6b84)
2008-06-04 15:23:06 +10:00
Ronnie Sahlberg
7d39ac131b convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

(This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)
2008-06-04 15:13:00 +10:00
Ronnie Sahlberg
1c88f422d5 add a parameter for the tdb-flags to the client function
ctdb_attach()   so that we can pass TDB_NOSYNC when we attach to
a persistent database and want fast unsafe writes instead of
slow but safe tdb_transaction writes.

enhance the ctdb_persistent test suite to test both safe and unsafe writes

(This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)
2008-06-04 10:46:20 +10:00
Ronnie Sahlberg
60a3fb926d dont bother casting to a void* private_data pointer,
just pass it as 'state' structure

(This used to be ctdb commit 1d7c3eb454e33cd17c74606c4ea011fd79959c80)
2008-05-28 13:40:12 +10:00
Ronnie Sahlberg
0b0f5bc5e6 remove another field we dont need in the childwrite_handle structure
(This used to be ctdb commit 70085523f4c35a20786023c489325554e2a6f9c1)
2008-05-28 13:31:58 +10:00