1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

1365 Commits

Author SHA1 Message Date
Andrew Tridgell
ec5995221f fixed order of changelog
(This used to be ctdb commit 05940cc8a7c7e75b976f2f0151d03fdf63c59395)
2007-12-27 10:19:09 +11:00
Andrew Tridgell
d116235197 updated release info
(This used to be ctdb commit 657aac41b2c2f7e4d53e5709d4eb8dbd9c5f5616)
2007-12-27 10:13:54 +11:00
Andrew Tridgell
2a2f1e3d91 fixed segv on failed ctdb_ctrl_getnodemap
(This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec)
2007-12-27 10:07:01 +11:00
Andrew Tridgell
36fd19774a update release number and changelog
(This used to be ctdb commit fa9723e1e43cdbb5c0c3c31fd79c50aa1298ba3d)
2007-12-04 15:50:43 +11:00
Andrew Tridgell
6ef3bff4ed merge from ronnie
(This used to be ctdb commit 072ef744951d3aa59dd8be70578b99b18c37d988)
2007-12-04 15:20:40 +11:00
Andrew Tridgell
a55c3709ea make DeterministicIPs the default
(This used to be ctdb commit e7d077e98a40a62dbd6bfd174f29afba7b5529ef)
2007-12-04 15:18:27 +11:00
Ronnie Sahlberg
7cef33b40a rework banning/unbanning nodes
ctdb_recoverd.c
Always handle banning/unbanning locally on the node that is being 
banned/unbanned instead of on the recovery master.
This means that if a ban request comes in to the recovery master for a 
remote node, we pass the request on to the remote node instead of 
setting up the ban and ban timeouts locally.

ctdb.c
send ban/unban requests to the node being banned/unbanned instead of to 
the recmaster

(This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e)
2007-12-03 15:45:53 +11:00
Ronnie Sahlberg
64008e28bb for the banned status, we should allocate this structure as a child of
the banned_nodes array and not the rec structure so that  ban_state is 
destroyed when the banned_nodes array gets destroyed
(and so that when this struct is destroyed, that any pending 
ctdb_ban_timeout events are also destroyed.)

othervise we may end up with multiple ban_timeout timed events going in 
parallell since we destroy/recreate the banned_nodes structure during 
election   but we never destroy/recreate the rec structure.

(This used to be ctdb commit fbd663d56a2a4421a5c0e541962c87e2e9c7cd82)
2007-12-03 11:39:17 +11:00
Ronnie Sahlberg
ad6abacca7 merge from tridge
(This used to be ctdb commit db2f9197ede28cc19c190c38e977bff09f13b729)
2007-12-03 10:21:45 +11:00
Andrew Tridgell
7edb41692e merge from ronnie
(This used to be ctdb commit 6653a0b67381310236e548e5fc0a9e27209b44e0)
2007-12-03 10:19:24 +11:00
Ronnie Sahlberg
2f1baf34d3 up the loglevel for the enable/disable monitoring to level 1
(This used to be ctdb commit 5043a0afeedbd30c7f64c2733c8ae5bf75479a98)
2007-12-01 10:06:42 +11:00
Ronnie Sahlberg
07dd0f6ff0 log that monitoring has been "disabled" not that it has been "stopped"
when monitoring is disabled

(This used to be ctdb commit e7c92f661a523deae9544b679d412ae79cc0ede7)
2007-11-30 10:53:35 +11:00
Ronnie Sahlberg
975fbc8e22 always set up a new monitoring event regardless of whether monitoring is
enabled or not

(This used to be ctdb commit c3035f46d1a65d2d97c8be7e679d59e471c092c2)
2007-11-30 10:14:43 +11:00
Ronnie Sahlberg
50573c5391 add ctdb_disable/enable_monitoring() that only modifies the monitoring
flag.
change calling of the recovered/takeip/releaseip event scripts to use 
these enable/disable functions instead of stopping/starting monitoring.

when we disable monitoring we want all events to still be running
in particular the events to monitor for dead nodes  and we only want to 
supress running the monitor event scripts

(This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)
2007-11-30 10:09:54 +11:00
Ronnie Sahlberg
0eb6c04dc1 get rid of the control to set the monitoring mode.
monitoring should always be enabled
(though a node may want to temporarily disable running the "monitor"
event scripts but can do so internally without the need for this 
control)

(This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)
2007-11-30 10:00:04 +11:00
Ronnie Sahlberg
192ba82b73 ->monitor_context is NULL when monitoring is disabled.
Check whether monitoring is enabled or not before creating new events
and log why the event is not set up othervise

(This used to be ctdb commit 2f352b2606c04a65ce461fc2e99e6d6251ac4f20)
2007-11-30 09:02:37 +11:00
Ronnie Sahlberg
8ac8cce487 dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE
control, instead call ctdb_start/stop_monitoring()

ctdb_stop_monitoring() dont allocate a new monitoring context, leave it 
NULL. Also set the monitoring_mode in this function so that 
ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync.
Add a debug message to log that we have stopped monitoring.

ctdb_start_monitoring()  check whether monitoring is already active and 
make the function idempotent.
Create the monitoring context when monitoring is started.
Update ->monitoring_mode once the monitoring has been started.
Add a debug message to log that we have started monitoring.

When we temporarily stop monitoring while running an event script,
restart monitoring after the event script wrapper returns instead of in 
the event script callback.

Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring.

dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If 
monitoring is disabled, this event handler will not be called.

(This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa)
2007-11-30 08:44:34 +11:00
Ronnie Sahlberg
5c3a270991 move ctdb_set_culprit higher up in the file
when we are the recmaster and we update the local flags for all the 
nodes, if one of the nodes fail to respond and give us his flags,
set that node as a "culprit"

as one of the first things to do in the monitor_cluster loop, check if 
the current culprit has caused too many (20) failures and if so ban that 
node.


this is for the situation where a remote node may still be CONNECTED but 
it fails to respond to the getnodemap control  causing the recovery 
master to loop in monitor_cluster   aborting the monitoring when the 
node fails to respond   but before anything will trigger a call to 
do_recovery().
If one or more of the databases or nodes are frozen at this stage, this 
would lead to smbd being blocked for potentially a longish time.

(This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795)
2007-11-28 15:04:20 +11:00
Ronnie Sahlberg
9e73dc87cc Add a --node-ip argument so that one can specify which ip address a
specific instance of ctdbd should bind to. This helps when running a
"virtual" cluster on a single machine where all instcances bind to 
different alias interfaces.

If --node-ip is specified, then we will only try to bind to this ip 
address only. Othervise we fall back to the original method trying the
ip addresses in /etc/ctdb/nodes one by one until we find one we can bind 
to.

No variable in /etc/sysconfig/ctdb added since this parameter only makes 
sense in a virtual test/debug cluster.

(This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)
2007-11-26 10:52:55 +11:00
Ronnie Sahlberg
0597be3386 when monitoring the node from the recovery daemon, check that the
recovery daemon and the ctdb daemon both agree on whether the node is 
banned or not   and if they disagree then reban the node again after 
logging an error to the debug log

(This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0)
2007-11-23 12:41:29 +11:00
Ronnie Sahlberg
a260145f9f check for recursive bans in ctdb_ban_node() and remove the previous ban
if this is an attempt to ban an already banned node

(This used to be ctdb commit 214f2d7b04d0a491d466fc85c8d016efde416f9e)
2007-11-23 12:38:37 +11:00
Ronnie Sahlberg
6b284e5905 add log output for when ctdb_ban_node() and ctdb_unban_node() are called
when these functions are called to ban or unban a node make sure we 
update the CTDB_NODE_BANNED flag in rec->node_flags since this field and
flag are checked during the election process

(This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f)
2007-11-23 12:36:14 +11:00
Ronnie Sahlberg
b5e79fb06f If update_local_flags() finds that a node has changed its BANNED status
so it differs from what the local ctdb daemon on the recovery master 
thinks it should be  we should call for a re-election

(This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb)
2007-11-23 11:53:06 +11:00
Ronnie Sahlberg
b2a81fb6b1 when we as the recovery daemon on the recovery master detects that the
flags differ between the local ctdb daemon and the remote node
we can force a flags update on all nodes and not just the local daemon

(This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd)
2007-11-23 11:31:42 +11:00
Ronnie Sahlberg
af5bc9b915 add an extra log if we get a modflags control but it doesnt change any
flags


in update_local_flags()
(this is only called if we are or we belive we are the recmaster)
when we detect that the flags of a remote node is different from what 
our local node thinks the flags should be for that remote node
we should send a node-flag-changed message to the local daemon so 
that it updates the flags for that node.

(This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53)
2007-11-23 10:52:29 +11:00
Ronnie Sahlberg
c36ce05d08 if we get a modflag control but the flags remain unchanged, log this
(This used to be ctdb commit 5a0cd9b37b21665054bd35facd87f0a6ff4dcd55)
2007-11-23 10:31:51 +11:00
Ronnie Sahlberg
e95a4b5cdb when we print "Remote node had flags xx local had flags xx
we swapped the flags when printing them to the log

(This used to be ctdb commit 9fc8831a7fcd34763567227d61cd525ec441ebf2)
2007-11-23 09:54:38 +11:00
Ronnie Sahlberg
6b5cfde016 merge from tridge
(This used to be ctdb commit f9e3531747d293711016bce99f08f42366c9d85b)
2007-11-23 09:51:41 +11:00
Andrew Tridgell
330bf59ab1 increase release number
(This used to be ctdb commit 1178fcce1a701441d43d4ed959f0ba6b50a5b07d)
2007-11-18 15:15:19 +11:00
Andrew Tridgell
74b1678d9d - merge from ronnie
- auto-detect CTDB_MANAGES_WINBIND from smb.conf if not set

(This used to be ctdb commit 3d675c7bcedbd483c923df54d1af068758edc206)
2007-11-18 15:14:54 +11:00
Andrew Tridgell
4608fcc3a4 need public_addresses for test suite
(This used to be ctdb commit 6d79994eace4802ab72dda2793028264c47d2d56)
2007-11-18 15:01:26 +11:00
Ronnie Sahlberg
b09d3de759 from Christian A
when monitoring that all nfs shares are available, allow both ' ' and 
'\t' characters to separate the exported directory from the options
in /etc/exports

(This used to be ctdb commit ac6cfe9de0acdcf9461068684fa890504454aae4)
2007-11-16 13:37:27 +11:00
Ronnie Sahlberg
9f4b0dab03 only check port 21 when monitoring vsftpd
(This used to be ctdb commit 41b0d71aaee186138eddc97d49503841fa26f234)
2007-11-15 06:56:02 +11:00
Ronnie Sahlberg
dfa6829621 add CTDB_MANAGES_WINBIND to /etc/sysconfig/ctdb to allow ctdb to be used
in environments where samba is used without winbind

(This used to be ctdb commit 1ae5af14f90fd81a20b14c02c0c5ad355a609134)
2007-11-14 16:17:52 +11:00
Ronnie Sahlberg
04ff06d135 merge from tridge
(This used to be ctdb commit 8e5d488ae2f7cae2a7a6386ed85a3d26f7d39261)
2007-11-13 13:27:00 +11:00
Andrew Tridgell
0943fba979 make it easier to test starting large numbers of virtual nodes
(This used to be ctdb commit cf61bf8b8806d29772985c904d5ee15c24d4d767)
2007-11-13 10:28:06 +11:00
Andrew Tridgell
45f0fdfc20 make election handling much more scalable
(This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675)
2007-11-13 10:27:44 +11:00
Ronnie Sahlberg
3d98766334 merge from tridge
(This used to be ctdb commit f2c8ef106e41c38f73adfc196b6cc328174fbd58)
2007-11-13 07:38:58 +11:00
Andrew Tridgell
3427793f01 don't do the first startup event until we are out of recovery
(This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)
2007-11-12 13:10:15 +11:00
Ronnie Sahlberg
71dfab2e31 merge from tridge
(This used to be ctdb commit 6ccf7a2545c57545111a6236c9b4b493b8464060)
2007-11-12 12:28:20 +11:00
Andrew Tridgell
bde886988b prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation

This makes the 'startup' script just a special type of the 'monitor'
script which is called first

(This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)
2007-11-12 10:53:11 +11:00
Ronnie Sahlberg
3c1f9882a8 revert 773
(This used to be ctdb commit 5a1c8f458ddc9b0ff532afda6007e32db10a71c8)
2007-11-12 10:23:35 +11:00
Ronnie Sahlberg
df5dd43e7c add a new tunable "CheckNodesFile" that when set to 0 will disable the
check in the recovery daemon that all nodes are using the same 
/etc/ctdb/nodes file.

Also add some more missing checks that the pnn used is a valid pnn 
before using it to dereferencing the ctdb->nodes array


This is useful since it allows us to add more physical nodes to a an 
existing cluster without having to bring down the entire cluster.

The to add an additional node to an existing cluster would then be
1, on all nodes set CheckNodesFile=0 using 'ctdb setvar'
2, on all nodes add CTDB_SET_CheckNodesFile=0 to /etc/sysconfig/ctdb
For each each node, one at a time :
3, use 'ctdb disable' to stop the hosted services
4, service ctdb stop
5, service ctdb start
Once all nodes have been restarted 
6, on all nodes remove CTDB_SET_CheckNodesFile=0 from 
/etc/sysconfig/ctdb
7, on all nodes set CheckNodesFile=0 using 'ctdb setvar'

8, configure and start up the new node

During this procedure, only one node at a time was brought 
down/restarted and was so only for a short period.

(This used to be ctdb commit 462501a32143e943ce350bd904a47c0955414a51)
2007-11-05 13:36:11 +11:00
Andrew Tridgell
82bd652749 patch from michael adam
(This used to be ctdb commit a7a3bef90f033bab5cb110a6ef77a8bef48f2588)
2007-11-02 13:20:29 +11:00
Ronnie Sahlberg
8b1ad1073b merge from tridge
(This used to be ctdb commit 10302eeecc36c4ce94a4e2e0e57864be790325da)
2007-11-01 09:00:14 +11:00
Andrew Tridgell
29e48fe54a increase release number
(This used to be ctdb commit dc648b1bb6becc52dcf900add97418a5634367eb)
2007-10-30 10:19:43 +11:00
Andrew Tridgell
684282f7a1 added bonding info to ctdb_diagnostics
(This used to be ctdb commit 71b5fc434bc5d88eb0669ee29aa932ba12737e07)
2007-10-30 10:18:52 +11:00
Andrew Tridgell
87bfa7d61e merge from ronnie
(This used to be ctdb commit 22b110549ff35f2560043abd5d85bed4b35295ee)
2007-10-29 13:43:12 +11:00
root
2a70ac8801 the while loop in the startup event runs as a subshell so we need an extra || exit 1 at the end
to propagate the error code back to the caller of the script

(This used to be ctdb commit c30d5c328784059949f5e82a07008e9632234f20)
2007-10-29 12:34:45 +11:00
Ronnie Sahlberg
8599f2008d if bond* interfaces are used as public interfaces we can not rely on ethtool but
have to check /proc for the status instead

(This used to be ctdb commit 4ed7747267aea265b7a71c651abf6d5db4f4718b)
2007-10-29 10:51:16 +11:00