1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

2498 Commits

Author SHA1 Message Date
Stefan Metzmacher
1c6829f3c2 ctdb_client: fix DEBUG statement in ctdb_ctrl_modflags()
metze

(This used to be ctdb commit a244b75ee49556b0ff51e254cc812594ee3b23a7)
2009-10-26 14:22:07 +11:00
Stefan Metzmacher
198866d82d server: if takeover runs when the recovery master becomes unhealthy
The problem was this:

When the monitor event fails, the node->flags get updated,
and an update (containing the old and new flags) is sent to
the recovery master.

If the recovery master sends the update to itself (the same process),
it was compairing the node->flags variable with the received new flags.
This check always found both flag values to be equal
and never sets the rec->need_takeover_run variable to true.

There were two problem, first the push_flags_handler() function
didn't pass the received old flags.

And the ctdb_control_modflags() function ignored the received old flags.

metze

(This used to be ctdb commit 8ec633b64a05a2d903c2b9639909f15f6375548f)
2009-10-26 14:21:45 +11:00
Stefan Metzmacher
7a616a0d7b server: print out the full 64-bit srvid on 32-bit hosts
metze

(This used to be ctdb commit 440e870d61267054b24404bcb69e599226353949)
2009-10-26 14:20:52 +11:00
Stefan Metzmacher
ee97e2676d tcp: don't log an error when we succefully bind to the desired address
metze

(This used to be ctdb commit 752a9c81de97be509de7e7feddde749cc5ee22a8)
2009-10-26 14:20:23 +11:00
Ronnie Sahlberg
299b027b8c patch the event loop so we read the current time every iteration.
log an error if the clock jumps backwards
also log an error if the clock jumps >5 seconds forward (we assume here we will get at least one event every 5 seconds)

(This used to be ctdb commit 11193e1e192bee6f579bdf1303153571a82711d7)
2009-10-26 13:20:35 +11:00
Ronnie Sahlberg
8aacfa348d Suggestion from Volker,
make ctdb_queue_length() cheaper by using a counter variable instead of counting the number of packets each time.

(This used to be ctdb commit 331c6e3afd96d8b5e191153a631efdbdabb6ea33)
2009-10-26 12:20:52 +11:00
Ronnie Sahlberg
c36fa583f3 disabel the multipath eventscript by default
(This used to be ctdb commit e79c3bcead7bd4bfb74d0aec81908da71551c107)
2009-10-26 10:22:00 +11:00
Ronnie Sahlberg
9db2a5ca05 update the manpage for ctdb setreclock
(This used to be ctdb commit ab4a6a58fb002ec29c19d167800e47987b023fe4)
2009-10-26 10:11:00 +11:00
Ronnie Sahlberg
2d06e9d252 automatically re-activate the reclock file check if we set the reclock file to something
(This used to be ctdb commit db250cad7c92c1cc0a690725a4e39531a2e1b7fd)
2009-10-26 10:13:20 +11:00
Ronnie Sahlberg
5aaa15fdb2 lower the log level of a debug message
(This used to be ctdb commit 496dc2e80b714811c6e69dc928deaad61cf603b1)
2009-10-26 09:35:18 +11:00
Ronnie Sahlberg
86d1b4c465 Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects.
The way to use this is from a client to :
1, first create a message handle and bind it to a SRVID
   A special prefix for the srvid space has been set aside for samba :
   Only samba is allowed to use srvid's with the top 32 bits set like this.
   The lower 32 bits are for samba to use internally.

2, register a "notification" using the new control :
                    CTDB_CONTROL_REGISTER_NOTIFY         = 114,
   This control takes as indata a structure like this :
struct ctdb_client_notify_register {
        uint64_t srvid;
        uint32_t len;
        uint8_t notify_data[1];
};

srvid is the srvid used in the space set aside above.
len and notify_data is an arbitrary blob.
When notifications are later sent out to all clients, this is the payload of that notification message.

If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster.

A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client".

3, a client that no longer wants to have a notification set up can deregister using control
                    CTDB_CONTROL_DEREGISTER_NOTIFY       = 115,
which takes this as arguments :
struct ctdb_client_notify_deregister {
        uint64_t srvid;
};

When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd.

(This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)
2009-10-23 15:24:51 +11:00
Ronnie Sahlberg
c61c655769 when scripts timeout, log pstree to a file in /tmp and just log the filename in the messages file
(This used to be ctdb commit 0785afba8e5cd501b9e0ecb4a6a44edf43b57ab0)
2009-10-23 13:55:21 +11:00
Ronnie Sahlberg
3c9b43531a set the eventscripts to timeout after 20 seconds
change the ban count to 10 failures before we ban by default

(This used to be ctdb commit 38d7487bc68c8cf85980004aceeef24ae32d6f36)
2009-10-23 13:54:45 +11:00
Ronnie Sahlberg
65757fe1d6 Merge commit 'martins/master'
(This used to be ctdb commit 514a60c57557042e463efeff53dd11b9fec40561)
2009-10-23 10:43:13 +11:00
Ronnie Sahlberg
42718a8842 new version 1.0.99
(This used to be ctdb commit 14fca8383b6b1da49278a9181a975543b956161b)
2009-10-22 18:16:33 +11:00
Martin Schwenke
69cca03851 Merge commit 'origin/master'
(This used to be ctdb commit f3e09f2cfd33e79e69fc8c84ce4781a31a7a0437)
2009-10-22 17:48:09 +11:00
Martin Schwenke
a128b7e3bb Document onnode -n and -f options.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 431f79f7c9038ebd95d27c2465207ca40b8f4f23)
2009-10-22 17:47:10 +11:00
Ronnie Sahlberg
e627fae600 if a lock wait child died/finished, we could have released the lockwait handle and set it to NULL before we call the destructors for releaseing the waiters.
The waiters reference the locakwait handle in order to remove itself from the li
nked list which caused a SEGV.

We dont actually need to remove ourselves from this list here since
if the parent freeze_handle holding the list is freed, then all waiters are rele
ased as well, and the only place we actually need to relink the waiter is in ctd
b_freeze_lock_handler, where we want to respond back to the clients and release
the waiters  but we still want to keep the freeze_handle hanging around.

(This used to be ctdb commit e01ab46bafad09a5e320d420734db129d35863bc)
2009-10-22 13:41:28 +11:00
Ronnie Sahlberg
902c476c03 From Volker L
Fix some warnings  and an incorrect check for a talloc failure

(This used to be ctdb commit 27296a47b3d057a6729287acf128b2b67775ecde)
2009-10-22 12:19:40 +11:00
Ronnie Sahlberg
831f9e05a6 From Wolfgang M.
With the new vacuuming code, dont treat an invalid dmaster as fatal. Let it update to the new value insetad.

(This used to be ctdb commit 5b70fa8cfd5916d3c212823ad5cc1b251ae175ed)
2009-10-22 07:58:44 +11:00
Martin Schwenke
8b2101bc61 Merge commit 'origin/master'
(This used to be ctdb commit 61282d4a9be9e544aaa86f3cffc5b58e417f5ab1)
2009-10-21 21:48:15 +11:00
Martin Schwenke
12798118a1 Test suite: Remove the disable/enable monitor tests - they are useless.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8264c42969d4be7fc6c5b4d56f8b5ef7c62b3bfb)
2009-10-21 21:47:06 +11:00
Martin Schwenke
f2a9ba6976 Test suite: Fix the timeouts on the skip share check tests.
The timeout for waiting for state changes isn't very predictable.  It
is "about" MonitorInterval seconds...  but can be longer given the
duration of eventscript runs and other things.  So, we change the
timeout to MonitorInterval + EventScriptTimeout, hoping it never takes
that long.

Move the eventscript installation/removal from the old fake-tests into
a function in the functions file.  Implement supporting functions to
create/remove/check-for various files that it handles.  Also add a
function that uses all of this that waits for the next monitor event
(but only if all other monitor events pass).

The final check in the skip share check tests uses the above and waits
for a monitor event, and then checks that the node is still healthy.

Also enhance the wait_until function to handle a command starting with
'!' (as a separate word) to make it easy to wait for a file not to
exist.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 25e82a8a667a54c6921ef076c63fdd738dd75d19)
2009-10-21 21:36:39 +11:00
Ronnie Sahlberg
d5fd4fc0ce During tests it is common to add/delete test eventscripts at runtime.
This can race with teh eventascript handling that does a :

list all scripts,   sort them,  then execute them

so trap status code 127 which means the script could not be executed (or /bin/sh does not exist) and treat it as not to cause the node to become unhealthy

(This used to be ctdb commit befabc917edb036ca81f5216f65a6d62b26ee83e)
2009-10-21 16:50:39 +11:00
Ronnie Sahlberg
a92ba7f729 lower the debug levels for the "create FD messages" so we dont fill up the logs.
(This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)
2009-10-21 15:26:24 +11:00
Ronnie Sahlberg
9b8c72c446 When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES.
Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery.

    This avoids having queued up very very large number of MESSAGES that samba semds
     between eachother to nodes that are blocked/banned/stopped for extended periods
    .

(This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)
2009-10-21 15:20:55 +11:00
Ronnie Sahlberg
149ea4e577 dont restart ctdb when installing the rpm
(This used to be ctdb commit ead97cabeb1e0b73bff9d45f8aec8b226769ee9f)
2009-10-21 13:54:02 +11:00
Michael Adam
769a36c048 In ctdb_ltdb_store(), add a missing transaction_cancel when local store failed.
Spotted by Volker.

Michael

(This used to be ctdb commit 0a4d409baabf242a87c06293789d589c896b104c)
2009-10-21 12:49:59 +11:00
Ronnie Sahlberg
14b14a2efb mprove the log message when we skip the ip allocation check from the recovery daemon.
we also skip this check if we are already in the process of performing an ip reallocation and not only when we are performing a full recovery.

(This used to be ctdb commit 1a09b02767f3928d3c5db0e0afc59bb938e4a445)
2009-10-21 11:51:30 +11:00
Ronnie Sahlberg
ff8363697d treat interfaces with the name ethX* as bond devices
(This used to be ctdb commit 3997d7e5471810e9a2f145ce2e795073dfc5eded)
2009-10-21 11:34:17 +11:00
Martin Schwenke
7b1e9267f2 Test suite: A timeout of MonitorInterval seconds sometimes isn't enough.
Monitor events sometimes happen a little bit more than MonitorInterval
seconds apart.  This changes some timeouts to MonitorInterval + 1
seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6ef4364b3349145b2fec23e0431cd6df6dcadd41)
2009-10-20 17:11:01 +11:00
Martin Schwenke
cd0424cde1 Merge commit 'origin/master'
(This used to be ctdb commit a4aac7312947aa3b26bc26993f04b586c64f18cb)
2009-10-20 16:53:04 +11:00
Martin Schwenke
b84c2d3a6e Test suite: New tests for validating SKIP_SHARE_CHECK options.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f50d64a8ac91415ca297216d2103ff940076f02b)
2009-10-20 16:52:22 +11:00
Martin Schwenke
43780f5f57 Test suite: Update 99_ctdb_uninstall_eventscript.sh to use ctdb_init().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2b478b0f5f09dd06626592573f053706ac637edd)
2009-10-20 16:51:06 +11:00
Martin Schwenke
d79f7647e7 Test suite: Fix bug in node_has_status().
This function has been broken since it was updated to work with the
"stopped" state (probably commit
67c5bfb5f02c9d45a32d976021ede4fb2174dfe9).  Although ${var#:*:0}
removes the shortest matching prefix of $var, '*' can match substrings
that include ':' if '0' isn't where you expect.  So we were making
unexpected matches and incorrectly returning true for some cases.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11137bc2d492a62a26ec9f9f62ff362e81643f66)
2009-10-20 16:45:29 +11:00
Martin Schwenke
469ee69363 Test suite: add -x option to ctdb_init() function.
This facilitates tracing of tests.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1f906bd3476e7cebf217e35b5477d6a7bb615a0c)
2009-10-20 16:44:44 +11:00
Ronnie Sahlberg
6dd7a8bcfa version 1.0.98
(This used to be ctdb commit 02862c086d045497f49f3c060700419815d607e7)
2009-10-20 15:36:35 +11:00
Ronnie Sahlberg
28f277acd4 From Wolfgang Mueller
make sure to always create the vactun database and get rid of some annoying log messages

(This used to be ctdb commit 54f9c314a0354f1039208fe6ac7dc159b6db8750)
2009-10-20 13:01:15 +11:00
Ronnie Sahlberg
d788dd3627 From wolfgang Mueller
Add a tuneable so that when scripts starts to hang/timeout, we can make the node unhealthy instead of banned

(This used to be ctdb commit 2e9fc6f0609833c6d8146196011ef780669d615d)
2009-10-20 12:59:48 +11:00
Martin Schwenke
b77094e897 Merge commit 'origin/master'
(This used to be ctdb commit b3ae2b753261443dca317803752a9d61285a3270)
2009-10-19 16:46:45 +11:00
Ronnie Sahlberg
58780f4137 add a direcotry where multiple local scripts can be added to run when executing eventscripts
(This used to be ctdb commit 27d152a918680a59c7412aec7e1772f25b72d469)
2009-10-19 16:22:15 +11:00
Ronnie Sahlberg
cdc77af3ab wait a bit longer before shutting down when the reclock file is missing
pring the filename of the missing file when we turn unhealthy and also
a 'df'

(This used to be ctdb commit 97ded8a629ec762f71bad28515e4fbc810790b1d)
2009-10-19 15:33:20 +11:00
Ronnie Sahlberg
1e91fd0a25 Revert "dont shutdown a node when the reclock file is temporarily unavailable."
This reverts commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50.

(This used to be ctdb commit 02f68dc60e0b7bf26d631850b12834d5c71a88f2)
2009-10-19 15:30:44 +11:00
Martin Schwenke
aca9d7f104 Merge branch 'onnode_options'
(This used to be ctdb commit 454125ccfda04aa6b4e14f5c05164d29f41a0ead)
2009-10-16 16:39:46 +11:00
Martin Schwenke
b20d680070 Merge commit 'origin/master'
(This used to be ctdb commit 5ad283458e59ea8232e01f34be007901c10c8a2e)
2009-10-16 16:36:48 +11:00
Martin Schwenke
0bff3b4289 initscript: when stopping on Red Hat use the success/failure functions.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit bf5402b41282da94fee1ab3e4546ec089ff12f37)
2009-10-16 16:35:56 +11:00
Ronnie Sahlberg
598419e57b Dont run eventscript monitor when the databases are frozen.
The databases can become frozen a while before we do the actual recovery
since we have the re-recovery timeout.

There is no point in doing much monitoring if we are waiting for a recovery,
or if we are banned.
This will eliminate some annoying log entries where certain tests will fail if the databases are locked.

(This used to be ctdb commit ff824676fab94168707aada7423ae766bc0f711c)
2009-10-15 16:03:43 +11:00
Ronnie Sahlberg
d258616984 dont shutdown a node when the reclock file is temporarily unavailable.
Leave the node as UNHEALTHY this stops clients from accessing the node until
the reclock file can be accessed again

(This used to be ctdb commit f5e9f3007c10a937158bc8cdfabf33c984cf9c50)
2009-10-15 13:19:10 +11:00
Ronnie Sahlberg
9de3652380 add logging everytime we create a filedescriptor in the main ctdb daemon
so we can spot if there are leaks.

plug two leaks for filedescriptors related to when sending ARP fail
and one leak when we can not parse the local address during tcp connection establish

(This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)
2009-10-15 11:24:54 +11:00
Ronnie Sahlberg
6152a7060b new version 1.0.97
(This used to be ctdb commit ef992a64d2376b621d4d2973ae22e567158aee12)
2009-10-15 07:41:56 +11:00