1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

383 Commits

Author SHA1 Message Date
Ronnie Sahlberg
edb7241c05 redesign how reloadnodes is implemented.
modify the transport methods to allow to restart individual connections
and set up destructors properly.

only tear down/set-up tcp connections to nodes removed from the cluster
or nodes added to the cluster.
Leave tcp connections to unchanged nodes connected.

make "ctdb reloadnodes" explicitely cause a recovery of the cluster once
the files have been realoaded

(This used to be ctdb commit d1057ed6de7de9f2a64d8fa012c52647e89b515b)
2008-12-02 13:26:30 +11:00
Ronnie Sahlberg
a782bdbacd inew version 1.0.66
ddwq

(This used to be ctdb commit 499a01fece2a5f24f1b2943cf3dc6e9a3a8ca3b5)
2008-11-24 19:06:02 +11:00
Ronnie Sahlberg
1e2831898c allow to change the recmaster even the database is not frozen
(This used to be ctdb commit 03e2e436db5cfd29a56d13f5d2101e42389bfc94)
2008-11-21 16:24:12 +11:00
Andrew Tridgell
59b6a9a9e6 fixed problem with looping ctdb recoveries
After a node failure, GPFS can get into a state where non-blocking
fcntl() locks can take a long time. This means to the ctdb set_recmode
test timing out, which leads to a recovery failure, and a new
recovery. The recovery loop can last a long time.

The fix is to consider a fcntl timeout as a success of this test. The
test is to see that we can't lock the shared reclock file, so a
timeout is fine for a success.

(This used to be ctdb commit 6579a6a2a7161214adedf0f67dce62f4a4ad1afe)
2008-11-21 10:24:13 +11:00
Ronnie Sahlberg
331b9bdb5f dont override/change CTDB_BASE if it is already set by the shell
(This used to be ctdb commit 0a6f9326cb99f14b5c9edd0d8854d8229df49910)
2008-11-20 16:39:56 +11:00
Ronnie Sahlberg
a2a5904f66 Keepalive packets were only sent every KeepaliveInterval if the socket
had been completely idle during that interval.
If we had been sending other packets such as Messages, Calls or Controls
there wouldnt be any need for an explicit keepalive and thus we didnt
send one.

This does make it somewhat awkward when analyzing traces since it is
non-intuitive when keepalives are sent and when they are not sent.

Change the keepalive logic to always send a keepalive regardless of
whether the link is idle or not.

(This used to be ctdb commit 7a18f33ec7512100dd067c65f0470889ff8fd591)
2008-11-20 13:35:08 +11:00
Ronnie Sahlberg
94a56ea410 reqrite the handling of flag updates across the cluster to eliminate a
race between the ctdb tool and the recovery daemon both at once
trying to push flag changes across the cluster.

(This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)
2008-11-20 12:43:18 +11:00
Ronnie Sahlberg
06728fdac9 we actually need a ctdb_db variable
(This used to be ctdb commit aba984f1b85f5a2d370b093061cf15843ee53758)
2008-11-03 21:54:52 +11:00
Ronnie Sahlberg
d7007793ea latency is measured in us, not ms
use an explicit ctdb_db variable instead of dereferencing state

(This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)
2008-10-30 13:34:10 +11:00
Ronnie Sahlberg
e1b0cea427 add control and logging of very high latencies.
log the type of operation and the database name for all latencies higher
than a treshold

(This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)
2008-10-30 12:49:53 +11:00
Ronnie Sahlberg
b9bd20ce55 add a context and a timed event so that once we have been in recovery
mode for too long we drop all public ip addresses

(This used to be ctdb commit 403c68f96e1380dd07217c688de2730464f77ea0)
2008-10-22 11:04:41 +11:00
Ronnie Sahlberg
beed899c4f null out the pointer before we reload the nodes file
(This used to be ctdb commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3)
2008-10-17 21:38:42 +11:00
Ronnie Sahlberg
a924ef78b6 when we reload the nodes file, we may need to reload the nodes file
inside the recovery daemon as well.

(This used to be ctdb commit 82fd2b6b5cd8e988c38fa6b74121a048757bdeef)
2008-10-17 21:18:06 +11:00
Ronnie Sahlberg
ce66008e08 specify a "script log level" on the commandline to set under which log
level any/all output from eventscripts will be logged as

(This used to be ctdb commit cdc79d4f22f1a6aec5c34115969421f93663932a)
2008-10-17 07:56:12 +11:00
Ronnie Sahlberg
5808a7be96 allow multiple eventscripts using the same prefix.
this eases the pain for users that use out of tree eventscripts

(This used to be ctdb commit 8313dfb6fc5404cd2d065af6620412f8664ada11)
2008-10-16 17:57:50 +11:00
Ronnie Sahlberg
233b0e5cbb lower the loglevel for the informational message that a TCP_ADD opeation
described an ip address not known to be a public address.

This could happen if someone for genuine reasons accesses a share
through a static ip address.
It can also happen if non homogenous public address configurations are
used and when a tcp description is pushed out to a different node that
does not server/know the specific ip address.

(This used to be ctdb commit 9b1d089c99413f3681440f3cf33c293d118c9108)
2008-10-15 03:02:09 +11:00
Ronnie Sahlberg
41d19e650c Revert "from Mathieu Parent <math.parent@gmail.com>"
This reverts commit dc9cd4779db4a89697731e4cf415be51067a07c1.

Conflicts:

(This used to be ctdb commit d13da2e8fe2fab619540525d98a5502a23ab7d20)
2008-10-15 01:08:29 +11:00
Ronnie Sahlberg
cb300382b0 update TAKEIP/RELEASEIP/GETPUBLICIP/GETNODEMAP controls so we retain an
older ipv4-only version of these controls.

We need this so that we are backwardcompatible with old versions of ctdb
and so that we can interoperate with a ipv4-only recmaster during a
rolling upgrade.

(This used to be ctdb commit 6b76c520f97127099bd9fbaa0fa7af1c61947fb7)
2008-10-14 10:40:29 +11:00
Ronnie Sahlberg
e5a3a73e64 from Mathieu Parent <math.parent@gmail.com>
Hi,

I have attached a patch necessary as debian log dir (/var/log) is not
a subdir of VARDIR (/var/lib on rpm systems, /var/lib/ctdb on debian).
As I don't know much about autotools and friends, this patch may be
hacky.

This is part of the process to minimize diff between distributions.

(This used to be ctdb commit dc9cd4779db4a89697731e4cf415be51067a07c1)
2008-10-13 08:27:33 +11:00
Ronnie Sahlberg
3411e98e14 skip empty lines in the public addresses file, not skip all non-empty
lines

(This used to be ctdb commit dc108adada33bb713f71a2859eda3b439ed0cd1a)
2008-10-07 19:34:34 +11:00
Ronnie Sahlberg
374906860c from Michael Adams : allow #-style comments in the nodes and public
addresses file

(This used to be ctdb commit 5f96b33a379c80ed8a39de1ee41f254cf48733f9)
2008-10-07 19:25:10 +11:00
Ronnie Sahlberg
46187433ca remove an unused variable
(This used to be ctdb commit 4237bd3753dcb024c17461e974414bef1b609416)
2008-10-07 18:14:44 +11:00
Ronnie Sahlberg
1778280d50 When we reload the nodes file
instead of shutting down/restarting the entire tcp layer
just bounce all outgoing connections and reconnect

(This used to be ctdb commit e701a531868149f16561011e65794a4a46ee6596)
2008-10-07 18:12:54 +11:00
Ronnie Sahlberg
3e274e5f8c use the correct tunable failcount not timeout
(This used to be ctdb commit 475cfada33b4c13aaaca773d5485bbe26bffbf46)
2008-09-17 14:24:12 +10:00
Ronnie Sahlberg
a3bbe238c9 The ctdb daemon keeps track of whether the recovery process is running
correctly by measuring how long it was since the last successful
communication with the recovery daemon was recorded.

After a certain timeout the ctdb daemon would deem the recovery daemon
as inoperable and shut down.

If the system clock is suddenly changed forward by many (60 or more)
seconds this could cause the timeout to trigger prematurely/immediately
where ctdb would incorrectly think that more than 60 seconds had passed
since last successful communications and thus abort.

Instead of cehcking for one timeout occuring, only deem the recovery
daemon to be "down" and trigger a shutdown if communications have
timedout for three intervals in a row.

(This used to be ctdb commit 196968c552e6ebcb57389d769a4b25f42fa8bc5d)
2008-09-17 14:17:41 +10:00
Ronnie Sahlberg
ad56356005 fix a slow memory leak in the recovery daemon in the error paths for the
memdump function

(This used to be ctdb commit 5e641ef9d6cca286061138a9680dcf2495736e8b)
2008-09-16 09:00:48 +10:00
Ronnie Sahlberg
7b718fffd7 fix some slow memory leaks in the vacuuming handler in the recovery
daemon

(This used to be ctdb commit 95bf36559d62f29e6f538f3a173b504ef3258341)
2008-09-16 07:55:57 +10:00
Ronnie Sahlberg
ab3649155a From Volker L
Fix a slow memory leak in the recovery daemon if there is a recoery
triggered during the public ip reassignment process

(This used to be ctdb commit 0aca4daf908b76d6013ff3dfad41beb9114fc1a3)
2008-09-16 06:50:28 +10:00
Ronnie Sahlberg
3bedb7f6d1 lower the debug level for when printing that the nodeflags have changed
(This used to be ctdb commit a89977f8cb2463a87147dcc0ad936cb5d4131670)
2008-09-09 13:55:31 +10:00
Ronnie Sahlberg
6474f3278d additional monitoring between the two daemons.
we currently only monitor that the dameons are running by kill(0, pid)
and verifying the the domain socket between them is ok.

this is not sufficient since we can have a situation where the recovery
daemon is hung.

this new code monitors that the recovery daemon is operating.
if the recovery hangs, we log this and shut down the main daemon

(This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)
2008-09-09 13:44:46 +10:00
Ronnie Sahlberg
70c7525a02 zero out the address structure to keep valgrind happy
(This used to be ctdb commit 8060e591b0eb2d184b5a7444487477225d2e1dbf)
2008-08-29 12:26:02 +10:00
Ronnie Sahlberg
a35fa0aa8f rename ctdb_tcp_client back to the original name ctdb_control_tcp
(This used to be ctdb commit 4d1c0418cfe6170bc081684dbe45908a5d285f0b)
2008-08-27 10:24:35 +10:00
Ronnie Sahlberg
eb23d7b6d4 we must canonicalize the sockaddr structures in killtcp so that we do the necessary downgrade if required
(This used to be ctdb commit 2f8b33948e395228cbac3450c0c684e49069abf0)
2008-08-20 12:02:54 +10:00
Ronnie Sahlberg
ef997d344f initial ipv6 patch
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)
2008-08-19 14:58:29 +10:00
Andrew Tridgell
76528cfc6b fixed a memory leak in the recovery daemon
thanks to vl for spotting this

(This used to be ctdb commit 96df98d9f86ecc6bb1a458eb2101e5c1bc0f96e6)
2008-08-11 23:33:05 +10:00
Andrew Tridgell
1431210d46 fixed send of release IP message
(This used to be ctdb commit db6bc3745a56cc12e60e727190a098a6527690d6)
2008-08-08 22:06:39 +10:00
Andrew Tridgell
aa1bc0abba added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell
the difference between a initial commit attempt and a retry, which
allows us to get the persistent updates counter right for retries

(This used to be ctdb commit 7f29c50ccbc7789bfbc20bcb4b65758af9ebe6c5)
2008-08-08 13:11:28 +10:00
Andrew Tridgell
5a0249d34c return a more detailed error code from a trans2 commit error
(This used to be ctdb commit 6915661a460cd589b441ac7cd8695f35c4e83113)
2008-08-08 09:58:49 +10:00
Andrew Tridgell
66d154ef5f Merge commit 'ronnie/1.0.53'
(This used to be ctdb commit 58e6dc722ad1e2415b71baf1d471885169dde14d)
2008-08-08 00:48:19 +10:00
Andrew Tridgell
5ee51ae84e fixed a looping error bug with the new transactions code
(This used to be ctdb commit 0592ba2a4fbd1b3b7a6bd0780eadbd6d449baaad)
2008-08-08 00:44:33 +10:00
Ronnie Sahlberg
31fcc1bbb2 Merge git://git.samba.org/tridge/ctdb
(This used to be ctdb commit 66c61137a5c01afcbae329ffbe121e78ae087399)
2008-08-07 18:50:48 +10:00
Andrew Tridgell
bbedba23c7 cover some corner cases where the persistent database could become
inconsistent

(This used to be ctdb commit c76c214be401cb116265ed17ffe6c77c979ded82)
2008-08-07 13:34:18 +10:00
Ronnie Sahlberg
b9d8bb23af remove the reclock file we store pnn counts in.
This file creates additional locking stress on the backend filesystem and we may not need it anyway.

(This used to be ctdb commit 84236e03e40bcf46fa634d106903277c149a734f)
2008-08-06 11:52:26 +10:00
Andrew Tridgell
78acc59784 implemented replayable transactions in ctdb to prevent deadlock
(This used to be ctdb commit b6d9a0396fb4b325778d3810dc656f719f31b9f1)
2008-08-04 14:51:51 +10:00
Andrew Tridgell
cf739ac892 renamed the pulldb structure to a ctdb_marshall_buffer
(This used to be ctdb commit bad53b2d342bb9760497e6f4a61e64ca50d6e771)
2008-07-30 19:59:18 +10:00
Andrew Tridgell
ca3eaf87e1 make sure we honor the TDB_NOSYNC flag from clients in the server
(This used to be ctdb commit 9806d18b93218c216d538e28f9ed495269f0a938)
2008-07-30 19:58:49 +10:00
Andrew Tridgell
98502135e7 added new multi-record transaction commit code
(This used to be ctdb commit 9ff3380099fe6f4d39de126db0826971a10ee692)
2008-07-30 19:57:00 +10:00
Andrew Tridgell
abe0232818 rename the structure we use for marshalling multiple records
(This used to be ctdb commit 4d205476d286570a6e1f52b59af42858ce051106)
2008-07-30 14:24:56 +10:00
Andrew Tridgell
79793708a4 fixed buffering in ctdb logging code to handle multiple lines
correctly

(This used to be ctdb commit e8ef9891aa31c374921b23cc74e1eda1f8218bf0)
2008-07-23 15:25:52 +10:00
Ronnie Sahlberg
1bfcca524d From Michael Adams,
change one element from private to private_data

Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 0de79352c9b36c118e36905f08ebbe38ecbb957e)
2008-07-22 09:07:42 +10:00