1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

307 Commits

Author SHA1 Message Date
Ronnie Sahlberg
1ccc4a8e2b test
(This used to be ctdb commit 4f2d722cf29175c3c207e6ebb6d4f9e370767249)
2008-06-26 14:14:37 +10:00
Ronnie Sahlberg
f1b3ddc357 Revert "test"
This reverts commit f71287a28d66db202fe52f9a43b6daf2389d7f66.

(This used to be ctdb commit a928857e38d645baca62cea7f7367488d140dca7)
2008-06-26 14:00:36 +10:00
Ronnie Sahlberg
2cffc2e9c6 test
(This used to be ctdb commit f71287a28d66db202fe52f9a43b6daf2389d7f66)
2008-06-26 13:51:18 +10:00
Ronnie Sahlberg
c5de452dca reduce loglevel of the info message we are updating the flags on all nodes
(This used to be ctdb commit 9a98a21979558dcd6421b3fcb97d21ab82b792d8)
2008-06-26 13:15:41 +10:00
Ronnie Sahlberg
c5e7e0b2fd force an update of the flags from the recmaster after each monitoring run
(This used to be ctdb commit 251aeadc8b16a9c27a4bae78c97ad6e93e6cfdf4)
2008-06-26 13:08:37 +10:00
Ronnie Sahlberg
cfc0af79ce third attempt for fixing a freeze child writing to the socket
(This used to be ctdb commit b8c8c5cb351747863c5d1366b57c96122ade5db0)
2008-06-26 11:52:26 +10:00
Ronnie Sahlberg
97f8bf16c5 verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be
(This used to be ctdb commit 3387597926ad71e4140cc504b828486d99a3ec8e)
2008-06-26 11:08:09 +10:00
Ronnie Sahlberg
2910ea1606 only loop over the write it the write failed
(This used to be ctdb commit b99d687894cb69d863345713055d9c8dc1b29194)
2008-06-26 11:02:08 +10:00
Ronnie Sahlberg
77ef05e95b the write() from the freeze child process can fail
try writing many times and log an error if the write failed

(This used to be ctdb commit f15b224e42e81cda84b98f01f919d463e80fb89f)
2008-06-26 09:54:27 +10:00
Ronnie Sahlberg
fd921aea28 ban the node after 3 failed scripts by default
(This used to be ctdb commit b4e6d8e37c7f985f357af82b4a524959bb97ec4c)
2008-06-13 13:45:23 +10:00
Ronnie Sahlberg
779468ab3f if the event scripts hangs EventScriptsBanCount consecutive times in a row
the node will ban itself for the default recovery ban period

(This used to be ctdb commit 7239d7ecd54037b11eddf47328a3129d281e7d4a)
2008-06-13 13:18:06 +10:00
Ronnie Sahlberg
30535c815d when a eventscript has timed out, log the event options (i.e. "monitor" "takeip 1.2..." etc)
to the log

(This used to be ctdb commit dbe31581abf35fc4a32d3cbf487dd34e2b9c937a)
2008-06-13 12:18:00 +10:00
Ronnie Sahlberg
e6d1d766c5 make it possible to re-start a recovery without marking the current node as
the culprit.

(This used to be ctdb commit 3a69fad0b1dee4a482461680c556358409e53c4d)
2008-06-13 11:47:42 +10:00
Ronnie Sahlberg
4b6b094860 add a callback for failed nodes to the async control helper.
this callback is called for every node where the control failed (or timed out)

when we issue the start recovery control from recovery master,
set any node that fails as a culprit   so it will eventually be banned

(This used to be ctdb commit 72f89bac13cbe8c3ca3e7a942469cd2ff25abba2)
2008-06-12 16:53:36 +10:00
Ronnie Sahlberg
d8433cacb2 first cut to convert takeover_callback_state{}
to use ctdb_sock_addr instead of sockaddr_in

(This used to be ctdb commit 5444ebd0815e335a75ef4857546e23f490a22338)
2008-06-04 17:12:57 +10:00
Ronnie Sahlberg
598fba7fad fix a comment
note that we dont actually send the ipv6 "gratious arp" on the wire just yet.
(since ipv6 doesnt use arp)
but all the infrastructure is there when we implement sending raw neig.disc. packets

(This used to be ctdb commit b87fab857bc9b3537527be93b7f68484502d6b84)
2008-06-04 15:23:06 +10:00
Ronnie Sahlberg
7d39ac131b convert handling of gratious arps and their controls and helpers to
use the ctdb_sock_addr structure so tehy work for both ipv4 and ipv6

(This used to be ctdb commit 86d6f53512d358ff68b58dac737ffa7576c3cce6)
2008-06-04 15:13:00 +10:00
Ronnie Sahlberg
1c88f422d5 add a parameter for the tdb-flags to the client function
ctdb_attach()   so that we can pass TDB_NOSYNC when we attach to
a persistent database and want fast unsafe writes instead of
slow but safe tdb_transaction writes.

enhance the ctdb_persistent test suite to test both safe and unsafe writes

(This used to be ctdb commit 4948574f5a290434f3edd0c052cf13f3645deec4)
2008-06-04 10:46:20 +10:00
Ronnie Sahlberg
60a3fb926d dont bother casting to a void* private_data pointer,
just pass it as 'state' structure

(This used to be ctdb commit 1d7c3eb454e33cd17c74606c4ea011fd79959c80)
2008-05-28 13:40:12 +10:00
Ronnie Sahlberg
0b0f5bc5e6 remove another field we dont need in the childwrite_handle structure
(This used to be ctdb commit 70085523f4c35a20786023c489325554e2a6f9c1)
2008-05-28 13:31:58 +10:00
Ronnie Sahlberg
71ec7b25b0 remote a comment that is no longer relevant
remove a field in the childwrite_handle structure we dont need

(This used to be ctdb commit a53db1ec3f29f4418ff51e0f452026c12470bf93)
2008-05-28 13:30:22 +10:00
Ronnie Sahlberg
ceaf488f05 do persistent writes in a child process
(This used to be ctdb commit 2da3d1f876f5d654f849af8a3e588f5a61300c3d)
2008-05-28 13:04:25 +10:00
Ronnie Sahlberg
0941019cb7 restore a timeout value to the default settings instead of the hardcoded 3 second test value
(This used to be ctdb commit 437752d002a108bcbbf6dc8bfb5dbf16dc5f1c58)
2008-05-22 16:33:36 +10:00
Ronnie Sahlberg
dd6c9d5a78 fix some memory hierarchy bugs in allocation of the state structure for persistent writes.
since these two controls (UPDATE_RECORD and PERSISTENT_STORE) can respond
asynchronously to the control,   we can not allocate the state variable as a child off ctdb_req_control  instead we must allocate state as a child off ctdb itself
and steal ctdb_req_control so it becomes a child of state.

othervise both ctdb_req_control and also state will be released immediately after we have finished setting up the async reply and returned.

(This used to be ctdb commit 6f6de0becd179be9eb9a6bf70562b090205ce196)
2008-05-22 16:29:46 +10:00
Ronnie Sahlberg
d895f43504 cleanup of the previous patch.
With these patches, ctdbd will enforce and (by default) always use
tdb_transactions when updating/writing records to a persistent database.

This might come with a small performance degratation  since transactions
are slower than no transactions at all.

If a client, such as samba wants to use a persistent database but does NOT
want to pay the performance penalty, it can specify TDB_NOSYNC  as the
srvid parameter in the ctdb_control() for CTDB_CONTROL_DB_ATTACH_PERSISTENT.

In this case CTDBD will remember that "this database is not that important"
so I can use unsafe (no transaction) tdb_stores to write the updates.
It will be faster than the default (always use transaction) but less crash safe.

(This used to be ctdb commit 3d85d2cf669686f89cacdc481eaa97aef1ba62c0)
2008-05-22 13:12:53 +10:00
Ronnie Sahlberg
ed2cf0291d second try for safe transaction stores into persistend tdb databases
for stores into persistent databases, ALWAYS use a lockwait child take out the lock for the record and never the daemon itself.

(This used to be ctdb commit 7fb6cf549de1b5e9ac5a3e4483c7591850ea2464)
2008-05-22 12:47:33 +10:00
Ronnie Sahlberg
92a0c0fc13 lowe the loglevel for the warning that releaseip was called for a non-public address.
the address might be a public address on a different node so no need to fiull up the logs with thoise messages

(This used to be ctdb commit c8181476748395fe6ec5284c49e9d37b882d15ea)
2008-05-21 11:50:41 +10:00
Ronnie Sahlberg
9c23bf7776 lower the loglevel for when we have "tickles" for an ip address that is not a public address on the local node (it may be a public address on other nodes)
(This used to be ctdb commit 1360c2f08a463f288b344d02025e84113743026d)
2008-05-21 11:44:50 +10:00
Ronnie Sahlberg
f4fd4d0af8 dont disable/enable monitoring for each eventscript, instead
just disable the monitoring during the "startrecovery" event and enable it again once recovery has completed

(This used to be ctdb commit 68029894f80804c9f31fc90ed0c1b58f75812c3d)
2008-05-16 08:20:40 +10:00
Ronnie Sahlberg
37b681627e dont check whether the "recovered" event was successful or not
since  this event wont run unless the recovery mode is normal   but we
can not know what the recovery mode will be in the future on a remote node
so since we issue these commands   that will execute in the future at some other node
it is pointless to try to check if it worked or not

in particular if "failure to successfully run the eventscript" would then trigger a full new recovery which is disruptive and expensive.

(This used to be ctdb commit 2c292039a0139dcf5bb2bd964eb6f8902d094c50)
2008-05-15 15:01:01 +10:00
Ronnie Sahlberg
f2661ec859 remove some unnessecary tests if ->vnn is null or not
(This used to be ctdb commit f0169ac8166a19d65ce254496e21d095aed87c2f)
2008-05-15 13:28:19 +10:00
Ronnie Sahlberg
09cc3ccff5 Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery
(This used to be ctdb commit 3038d0b74895b51af4f85f2f304508ed16d245f4)
2008-05-15 12:28:52 +10:00
Ronnie Sahlberg
3e14bbcce6 Merge git://git.samba.org/tridge/ctdb
(This used to be ctdb commit d5fb4489f83f1f956b2c083cfad1861c5ddde283)
2008-05-15 08:02:51 +10:00
Andrew Tridgell
8ec3665231 put the return in the right place
We were refusing the 'startrecovery' event

(This used to be ctdb commit 788d38812d73729f11d12e9812b16092c0ae4123)
2008-05-14 22:05:09 +10:00
Andrew Tridgell
e465110f95 Fix the chicken and egg problem with ctdb/samba and a registry smb.conf
This attempts to fix the problem of ctdb event scripts blocking due to
attempted access to the ctdb databases during recovery. The changes are:

  - now only the 'shutdown' and 'startrecovery' events can be called
    with the databases locked in recovery. The event scripts must ensure
    that for these two events no database access is attempted

  - the recovered, takeip and releaseip events could previously be called
    inside a recovery. The code now ensures that this doesn't happen, delaying
    the events till after recovery has finished

  - the 50.samba event script now avoids using testparm unless it is really
    needed

This needs extensive testing.

(This used to be ctdb commit e3cdb8f2be6a44ec877efcd75c7297edb008a80b)
2008-05-14 20:57:04 +10:00
Ronnie Sahlberg
909ff219e0 Start implementing support for ipv6.
This enhances the framework for sending tcp tickles to be able to send ipv6 tickles as well.

Since we can not use one single RAW socket to send both handcrafted ipv4 and ipv6 packets, instead of always opening TWO sockets, one ipv4 and one ipv6 we get rid of the helper ctdb_sys_open_sending_socket() and just open (and close)  a raw socket of the appropriate type inside ctdb_sys_send_tcp().
We know which type of socket v4/v6 to use based on the sin_family of the destination address.

Since ctdb_sys_send_tcp() opens its own socket  we no longer nede to pass a socket
descriptor as a parameter.  Get rid of this redundant parameter and fixup all callers.

(This used to be ctdb commit 406a2a1e364cf71eb15e5aeec3b87c62f825da92)
2008-05-14 15:47:47 +10:00
Ronnie Sahlberg
b8eb5925cf Try to use tdb transactions when updating a record and record header inside the ctdb daemon.
If a transaction could be started, do safe transaction store when updating the record inside the daemon.
If the transaction could not be started (maybe another samba process has a lock on the database?) then just do a normal store instead (instead of blocking the ctdb daemon).

The client can "signal" ctdb that updates to this database should, if possible, be done using safe transactions by specifying the TDB_NOSYNC flag when attaching to the database.
The TDB flags are passed to ctdb in the "srvid" field of the control header when attaching using the CTDB_CONTROL_DB_ATTACH_PERSISTENT.

Currently, samba3.2 does not yet tell ctdbd to handle any persistent databases using safe transactions.

If samba3.2 wants a particular persistent database to be handled using
safe transactions inside the ctdbd daemon, it should pass
TDB_NOSYNC as the flags to the call to attach to a persistent database
in ctdbd_db_attach()     it currently specifies 0 as the srvid

(This used to be ctdb commit 8d6ecf47318188448d934ab76e40da7e4cece67d)
2008-05-12 13:37:31 +10:00
Ronnie Sahlberg
adf40341a7 ctdb->methods becomes NULL when we shutdown the transport.
If we shutdown the transport   and CTDB later decides to send a command out
for queueing, the call to ctdb->methods->allocate_pkt() will SEGV.

This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the
"shutdown" eventscripts to finish.
If the event scripts now take much much longer to execute for some reason, this
race condition becomes much more probable.

Decorate all dereferencing of ctdb->methods->    with a check that ctdb->menthods is non-NULL

(This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7)
2008-05-11 14:28:33 +10:00
Ronnie Sahlberg
f196afd58b fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed.
Remove a bogus check inside the recovery daemon that ONLY redistributed public addresses IFF the local node had/served public addresses.
This was a valid optimization long ago when we enforced that all nodes must use the same public addresses file   but is invalid today where we can have different public addresses configs on all nodes  and even have some nodes that do NOT use public addresses at all.

(This used to be ctdb commit 5833e6b99d9afaf35dc8354df8676b9115418b23)
2008-05-09 13:41:31 +10:00
Andrew Tridgell
abe6d816bb fixed realloc bug
Should always use type safe talloc functions when possible. In this case we were allocating bytes instead of uint32_t

(This used to be ctdb commit cb14ee57dd0a589242da1ac2830bb7939df460a5)
2008-05-08 19:59:24 +10:00
Ronnie Sahlberg
92b61cd7d5 Expand the client async framework so that it can take a callback function.
This allows us to use the async framework also for controls that return
outdata.

Add a "capabilities" field to the ctdb_node structure. This field is
only initialized and kept valid inside the recovery daemon context and not
inside the main ctdb daemon.

change the GET_CAPABILITIES control to return the capabilities in outdata instead of in the res return variable.

When performing a recovery inside the recovery daemon, read the capabilities from all connected nodes and update the ctdb->nodes list of nodes.
when building the new vnnmap after the database rebuild in recovery, do not include any nodes which lack the LMASTER capability in the new vnnmap.
Unless there are no available connected node that sports the LMASTER capability in which case we let the local node (recmaster) take on the lmaster role temporarily (i.e. become a member of the vnnmap list)

(This used to be ctdb commit 0f1883c69c689b28b0c04148774840b2c4081df6)
2008-05-06 15:42:59 +10:00
Ronnie Sahlberg
2c23959616 make sure we lose all elections for recmaster role if we do not have the recmaster capability.
(unless there are no other node at all available with this capability)

(This used to be ctdb commit 8556e9dc897c6b9b9be0b52f391effb1f72fbd80)
2008-05-06 13:56:56 +10:00
Ronnie Sahlberg
6863c8f573 close and reopen the reclock pnn file at regular intervals.
handle failure to get/hold the reclock pnn file better and just
treat it as a transient backend filesystem error and try again later
instead of shutting down the recovery daemon

when we have lost the pnn file   and if we are recmaster
release the recmaster role so that someone else can become recmaster isntead

(This used to be ctdb commit e513277fb09b951427be8351d04c877e0a15359d)
2008-05-06 13:27:17 +10:00
Ronnie Sahlberg
80f85dc390 Monitor that the recovery daemon is still running from the main ctdb daemon
and if it has terminated, then we shut down the main daemon as well

(This used to be ctdb commit 7e587acaf8006254e89ff9b4bf48454821c85863)
2008-05-06 11:19:17 +10:00
Ronnie Sahlberg
d86e48d5ff Add ability to disable recmaster and lmaster roles through sysconfig file and
command line arguments

(This used to be ctdb commit 34b952e4adc53ee82345275a0e28231fa1b2533e)
2008-05-06 10:41:22 +10:00
Ronnie Sahlberg
a9c45f9513 Add a capabilities field to the ctdb structure
Define two capabilities :
can be recmaster
can be lmaster
Default both capabilities to YES

Update the ctdb tool to read capabilities off a node

(This used to be ctdb commit 50f1255ea9ed15bb8fa11cf838b29afa77e857fd)
2008-05-06 10:02:27 +10:00
Ronnie Sahlberg
073f4a7cb4 when a node disgrees with us re who is recmaster
make it mark that node as a lcuprit so it eventually gets banned

(This used to be ctdb commit eff3f326f8ce6070c9f3c430cd14d1b71a8db220)
2008-04-22 00:56:27 +10:00
Ronnie Sahlberg
0e1a20b603 Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""
remove the transaction stuff and push   so that the git tree will work

This reverts commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b.

(This used to be ctdb commit 876d3aca18c27c2239116c8feb6582b3a68c6571)
2008-04-10 15:59:51 +10:00
Ronnie Sahlberg
39f119b42c Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
This reverts commit 171d1d71ef9f2373620bd7da3adaecb405338603.

(This used to be ctdb commit 539bbdd9b0d0346b42e66ef2fcfb16f39bbe098b)
2008-04-10 14:57:41 +10:00
Ronnie Sahlberg
9684befa16 Revert "- accept an optional set of tdb_flags from clients on open a database,"
This reverts commit 49330f97c78ca0669615297ac3d8498651831214.

(This used to be ctdb commit 171d1d71ef9f2373620bd7da3adaecb405338603)
2008-04-10 14:45:45 +10:00