1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-24 21:34:56 +03:00
Commit Graph

1355 Commits

Author SHA1 Message Date
Andrew Tridgell
d116235197 updated release info
(This used to be ctdb commit 657aac41b2c2f7e4d53e5709d4eb8dbd9c5f5616)
2007-12-27 10:13:54 +11:00
Andrew Tridgell
2a2f1e3d91 fixed segv on failed ctdb_ctrl_getnodemap
(This used to be ctdb commit 5daf9a72f0e60a9af7cf32ae6d759be7d94857ec)
2007-12-27 10:07:01 +11:00
Andrew Tridgell
36fd19774a update release number and changelog
(This used to be ctdb commit fa9723e1e43cdbb5c0c3c31fd79c50aa1298ba3d)
2007-12-04 15:50:43 +11:00
Andrew Tridgell
6ef3bff4ed merge from ronnie
(This used to be ctdb commit 072ef744951d3aa59dd8be70578b99b18c37d988)
2007-12-04 15:20:40 +11:00
Andrew Tridgell
a55c3709ea make DeterministicIPs the default
(This used to be ctdb commit e7d077e98a40a62dbd6bfd174f29afba7b5529ef)
2007-12-04 15:18:27 +11:00
Ronnie Sahlberg
7cef33b40a rework banning/unbanning nodes
ctdb_recoverd.c
Always handle banning/unbanning locally on the node that is being 
banned/unbanned instead of on the recovery master.
This means that if a ban request comes in to the recovery master for a 
remote node, we pass the request on to the remote node instead of 
setting up the ban and ban timeouts locally.

ctdb.c
send ban/unban requests to the node being banned/unbanned instead of to 
the recmaster

(This used to be ctdb commit 880dd9f5fd0b91e450da93e195cc5c62cb1dcd6e)
2007-12-03 15:45:53 +11:00
Ronnie Sahlberg
64008e28bb for the banned status, we should allocate this structure as a child of
the banned_nodes array and not the rec structure so that  ban_state is 
destroyed when the banned_nodes array gets destroyed
(and so that when this struct is destroyed, that any pending 
ctdb_ban_timeout events are also destroyed.)

othervise we may end up with multiple ban_timeout timed events going in 
parallell since we destroy/recreate the banned_nodes structure during 
election   but we never destroy/recreate the rec structure.

(This used to be ctdb commit fbd663d56a2a4421a5c0e541962c87e2e9c7cd82)
2007-12-03 11:39:17 +11:00
Andrew Tridgell
7edb41692e merge from ronnie
(This used to be ctdb commit 6653a0b67381310236e548e5fc0a9e27209b44e0)
2007-12-03 10:19:24 +11:00
Ronnie Sahlberg
2f1baf34d3 up the loglevel for the enable/disable monitoring to level 1
(This used to be ctdb commit 5043a0afeedbd30c7f64c2733c8ae5bf75479a98)
2007-12-01 10:06:42 +11:00
Ronnie Sahlberg
07dd0f6ff0 log that monitoring has been "disabled" not that it has been "stopped"
when monitoring is disabled

(This used to be ctdb commit e7c92f661a523deae9544b679d412ae79cc0ede7)
2007-11-30 10:53:35 +11:00
Ronnie Sahlberg
975fbc8e22 always set up a new monitoring event regardless of whether monitoring is
enabled or not

(This used to be ctdb commit c3035f46d1a65d2d97c8be7e679d59e471c092c2)
2007-11-30 10:14:43 +11:00
Ronnie Sahlberg
50573c5391 add ctdb_disable/enable_monitoring() that only modifies the monitoring
flag.
change calling of the recovered/takeip/releaseip event scripts to use 
these enable/disable functions instead of stopping/starting monitoring.

when we disable monitoring we want all events to still be running
in particular the events to monitor for dead nodes  and we only want to 
supress running the monitor event scripts

(This used to be ctdb commit a006dcc4f75aba950dd701ad7d1a84e89df285e8)
2007-11-30 10:09:54 +11:00
Ronnie Sahlberg
0eb6c04dc1 get rid of the control to set the monitoring mode.
monitoring should always be enabled
(though a node may want to temporarily disable running the "monitor"
event scripts but can do so internally without the need for this 
control)

(This used to be ctdb commit e3a33618026823e6af845fd8513cddb08e6b5584)
2007-11-30 10:00:04 +11:00
Ronnie Sahlberg
192ba82b73 ->monitor_context is NULL when monitoring is disabled.
Check whether monitoring is enabled or not before creating new events
and log why the event is not set up othervise

(This used to be ctdb commit 2f352b2606c04a65ce461fc2e99e6d6251ac4f20)
2007-11-30 09:02:37 +11:00
Ronnie Sahlberg
8ac8cce487 dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE
control, instead call ctdb_start/stop_monitoring()

ctdb_stop_monitoring() dont allocate a new monitoring context, leave it 
NULL. Also set the monitoring_mode in this function so that 
ctdb_stop/start_monitoring() and ->monitoring_mode are kept in sync.
Add a debug message to log that we have stopped monitoring.

ctdb_start_monitoring()  check whether monitoring is already active and 
make the function idempotent.
Create the monitoring context when monitoring is started.
Update ->monitoring_mode once the monitoring has been started.
Add a debug message to log that we have started monitoring.

When we temporarily stop monitoring while running an event script,
restart monitoring after the event script wrapper returns instead of in 
the event script callback.

Let monitoring_mode start out as DISABLED and let it be enabled once we call ctdb_start_monitoring.

dont check for MONITORING_DISABLED in check_fore_dead_nodes(). If 
monitoring is disabled, this event handler will not be called.

(This used to be ctdb commit 3a93ae8bdcffb1adbd6243844f3058fc742f76aa)
2007-11-30 08:44:34 +11:00
Ronnie Sahlberg
5c3a270991 move ctdb_set_culprit higher up in the file
when we are the recmaster and we update the local flags for all the 
nodes, if one of the nodes fail to respond and give us his flags,
set that node as a "culprit"

as one of the first things to do in the monitor_cluster loop, check if 
the current culprit has caused too many (20) failures and if so ban that 
node.


this is for the situation where a remote node may still be CONNECTED but 
it fails to respond to the getnodemap control  causing the recovery 
master to loop in monitor_cluster   aborting the monitoring when the 
node fails to respond   but before anything will trigger a call to 
do_recovery().
If one or more of the databases or nodes are frozen at this stage, this 
would lead to smbd being blocked for potentially a longish time.

(This used to be ctdb commit 83b0261f2cb453195b86f547d360400103a8b795)
2007-11-28 15:04:20 +11:00
Ronnie Sahlberg
9e73dc87cc Add a --node-ip argument so that one can specify which ip address a
specific instance of ctdbd should bind to. This helps when running a
"virtual" cluster on a single machine where all instcances bind to 
different alias interfaces.

If --node-ip is specified, then we will only try to bind to this ip 
address only. Othervise we fall back to the original method trying the
ip addresses in /etc/ctdb/nodes one by one until we find one we can bind 
to.

No variable in /etc/sysconfig/ctdb added since this parameter only makes 
sense in a virtual test/debug cluster.

(This used to be ctdb commit d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0)
2007-11-26 10:52:55 +11:00
Ronnie Sahlberg
0597be3386 when monitoring the node from the recovery daemon, check that the
recovery daemon and the ctdb daemon both agree on whether the node is 
banned or not   and if they disagree then reban the node again after 
logging an error to the debug log

(This used to be ctdb commit 6cd6e534493066edd4bb2c6ae5be0e9a9d495aa0)
2007-11-23 12:41:29 +11:00
Ronnie Sahlberg
a260145f9f check for recursive bans in ctdb_ban_node() and remove the previous ban
if this is an attempt to ban an already banned node

(This used to be ctdb commit 214f2d7b04d0a491d466fc85c8d016efde416f9e)
2007-11-23 12:38:37 +11:00
Ronnie Sahlberg
6b284e5905 add log output for when ctdb_ban_node() and ctdb_unban_node() are called
when these functions are called to ban or unban a node make sure we 
update the CTDB_NODE_BANNED flag in rec->node_flags since this field and
flag are checked during the election process

(This used to be ctdb commit 740c632ae96a2d34327d1b575780aaf079d93f4f)
2007-11-23 12:36:14 +11:00
Ronnie Sahlberg
b5e79fb06f If update_local_flags() finds that a node has changed its BANNED status
so it differs from what the local ctdb daemon on the recovery master 
thinks it should be  we should call for a re-election

(This used to be ctdb commit 21ad6039c31ef5cc0e40a35a41220f91943947cb)
2007-11-23 11:53:06 +11:00
Ronnie Sahlberg
b2a81fb6b1 when we as the recovery daemon on the recovery master detects that the
flags differ between the local ctdb daemon and the remote node
we can force a flags update on all nodes and not just the local daemon

(This used to be ctdb commit a924eb89c966ecbae029ca137e06cffd40cc70fd)
2007-11-23 11:31:42 +11:00
Ronnie Sahlberg
af5bc9b915 add an extra log if we get a modflags control but it doesnt change any
flags


in update_local_flags()
(this is only called if we are or we belive we are the recmaster)
when we detect that the flags of a remote node is different from what 
our local node thinks the flags should be for that remote node
we should send a node-flag-changed message to the local daemon so 
that it updates the flags for that node.

(This used to be ctdb commit 36225e4e271f7a4065398253747fb20054f99a53)
2007-11-23 10:52:29 +11:00
Ronnie Sahlberg
c36ce05d08 if we get a modflag control but the flags remain unchanged, log this
(This used to be ctdb commit 5a0cd9b37b21665054bd35facd87f0a6ff4dcd55)
2007-11-23 10:31:51 +11:00
Ronnie Sahlberg
e95a4b5cdb when we print "Remote node had flags xx local had flags xx
we swapped the flags when printing them to the log

(This used to be ctdb commit 9fc8831a7fcd34763567227d61cd525ec441ebf2)
2007-11-23 09:54:38 +11:00
Andrew Tridgell
330bf59ab1 increase release number
(This used to be ctdb commit 1178fcce1a701441d43d4ed959f0ba6b50a5b07d)
2007-11-18 15:15:19 +11:00
Andrew Tridgell
74b1678d9d - merge from ronnie
- auto-detect CTDB_MANAGES_WINBIND from smb.conf if not set

(This used to be ctdb commit 3d675c7bcedbd483c923df54d1af068758edc206)
2007-11-18 15:14:54 +11:00
Andrew Tridgell
4608fcc3a4 need public_addresses for test suite
(This used to be ctdb commit 6d79994eace4802ab72dda2793028264c47d2d56)
2007-11-18 15:01:26 +11:00
Ronnie Sahlberg
b09d3de759 from Christian A
when monitoring that all nfs shares are available, allow both ' ' and 
'\t' characters to separate the exported directory from the options
in /etc/exports

(This used to be ctdb commit ac6cfe9de0acdcf9461068684fa890504454aae4)
2007-11-16 13:37:27 +11:00
Ronnie Sahlberg
9f4b0dab03 only check port 21 when monitoring vsftpd
(This used to be ctdb commit 41b0d71aaee186138eddc97d49503841fa26f234)
2007-11-15 06:56:02 +11:00
Ronnie Sahlberg
dfa6829621 add CTDB_MANAGES_WINBIND to /etc/sysconfig/ctdb to allow ctdb to be used
in environments where samba is used without winbind

(This used to be ctdb commit 1ae5af14f90fd81a20b14c02c0c5ad355a609134)
2007-11-14 16:17:52 +11:00
Andrew Tridgell
0943fba979 make it easier to test starting large numbers of virtual nodes
(This used to be ctdb commit cf61bf8b8806d29772985c904d5ee15c24d4d767)
2007-11-13 10:28:06 +11:00
Andrew Tridgell
45f0fdfc20 make election handling much more scalable
(This used to be ctdb commit 05938d462b92bd9ecb8e35f53651bded47c48675)
2007-11-13 10:27:44 +11:00
Andrew Tridgell
3427793f01 don't do the first startup event until we are out of recovery
(This used to be ctdb commit 689940eb6e23f16ee063331caf3986613a8963ea)
2007-11-12 13:10:15 +11:00
Andrew Tridgell
bde886988b prevent a deadly embrace between smbd and ctdbd by moving the calling
of the startup event scripts after the point where recovery has
started and the node is in normal operation

This makes the 'startup' script just a special type of the 'monitor'
script which is called first

(This used to be ctdb commit 7424c30a5fd04aea0137c466b4318c3f185280d8)
2007-11-12 10:53:11 +11:00
Andrew Tridgell
82bd652749 patch from michael adam
(This used to be ctdb commit a7a3bef90f033bab5cb110a6ef77a8bef48f2588)
2007-11-02 13:20:29 +11:00
Andrew Tridgell
29e48fe54a increase release number
(This used to be ctdb commit dc648b1bb6becc52dcf900add97418a5634367eb)
2007-10-30 10:19:43 +11:00
Andrew Tridgell
684282f7a1 added bonding info to ctdb_diagnostics
(This used to be ctdb commit 71b5fc434bc5d88eb0669ee29aa932ba12737e07)
2007-10-30 10:18:52 +11:00
root
2a70ac8801 the while loop in the startup event runs as a subshell so we need an extra || exit 1 at the end
to propagate the error code back to the caller of the script

(This used to be ctdb commit c30d5c328784059949f5e82a07008e9632234f20)
2007-10-29 12:34:45 +11:00
Ronnie Sahlberg
8599f2008d if bond* interfaces are used as public interfaces we can not rely on ethtool but
have to check /proc for the status instead

(This used to be ctdb commit 4ed7747267aea265b7a71c651abf6d5db4f4718b)
2007-10-29 10:51:16 +11:00
Ronnie Sahlberg
ba6f9ae4a7 merge from tridge
(This used to be ctdb commit c7777b966f6a6e0f4126c03300338fdc822ac6c9)
2007-10-29 08:50:51 +11:00
Ronnie Sahlberg
bd73497a18 merge from tridge
(This used to be ctdb commit 919ba610c61cfaf5ecc1ab64ad8be34a80d928f4)
2007-10-29 08:40:46 +11:00
Andrew Tridgell
6d75f0703e added monitoring of ftp ports
(This used to be ctdb commit 4780e078fb55d69053f78a4bbc7c67e569bb5dae)
2007-10-26 14:53:09 +10:00
Ronnie Sahlberg
533a530177 since service nfs stop/start sometimes fail to bring up the mount daemon on rhel5
check if mountd is running during monitoring and if it is not, try to restart it

(This used to be ctdb commit 3d4b74669164b519398aeeacd59714f1e3884eff)
2007-10-23 12:35:43 +10:00
Andrew Tridgell
1d6b4f418d update release number
(This used to be ctdb commit fe6766940b2cf8a84ed51824158c956362a5806d)
2007-10-23 11:56:52 +10:00
Andrew Tridgell
2cea351f45 merge from ronnie
(This used to be ctdb commit cc70a2cc5f5400d6480cb609e1fa203236917976)
2007-10-23 11:45:36 +10:00
Ronnie Sahlberg
44ab81763d merge from tridge
(This used to be ctdb commit 938e375a80ce2f1827117c38554f576f73a5c71e)
2007-10-23 06:42:45 +10:00
Andrew Tridgell
6e6de1e4b7 fixed a problem with backgrounding onnnode
(This used to be ctdb commit 4e23630224bb219cfbbf129c4562da5a4c2d601a)
2007-10-22 21:11:02 +10:00
Andrew Tridgell
8e22bca5ca fixed a double close of a socket, leading to an EPOLL error
(This used to be ctdb commit bbe8ad842bdfedd37ef14a6be07ad939113fe9b1)
2007-10-22 16:41:11 +10:00
Ronnie Sahlberg
6a32af60b8 nfs may take a while to stop so do it in hte background
(This used to be ctdb commit 2ccaeaf6a65731c17173a4945e3e00e230e67d35)
2007-10-22 15:14:49 +10:00
Andrew Tridgell
2d8afd85d5 another place where we need to mark connect_fde as freed
(This used to be ctdb commit d047fbeafebe4b150602f9a91802795659058b16)
2007-10-22 15:13:32 +10:00
Andrew Tridgell
2931ed5d17 fixed a valgrind uninitialised memory error due to pad bytes
(This used to be ctdb commit aea9b0c8d467fe19815c046969e9c1049a3a20ac)
2007-10-22 15:13:08 +10:00
Andrew Tridgell
f09537e7f1 prevent a double free
(This used to be ctdb commit 5a1b923abb36c6deb99ae178fdd54f12235dc309)
2007-10-22 14:07:35 +10:00
Ronnie Sahlberg
1d6a74f943 when shutting down, we should stop monitoring
(This used to be ctdb commit 325683ef8f326f0565a827ff2c493adcab6e0d64)
2007-10-22 12:34:51 +10:00
Ronnie Sahlberg
4a97876fb7 when we are shutting down, we should first shut down the recovery daemon
(This used to be ctdb commit 39ade6b329adcd3234124d6a8daaa6181abf739b)
2007-10-22 12:34:08 +10:00
Ronnie Sahlberg
f022df1d40 dont set parameters in statd-callout if they should be set they
bshould be set from 10.interfaces

(This used to be ctdb commit 0c7c2dae0a976922de58793d576855bc37cd38e1)
2007-10-22 10:18:38 +10:00
Ronnie Sahlberg
caad5dc38d dont set some of the sysctl variables in statd-callout. these are
mainly useful for avoiding ack-storms when doing very rapid 
failover/failback during testing   but should not be required in 
real-world.

this gets rid of a lof of annoying messages from the messages file

(This used to be ctdb commit 50d289dcce2caa7c7be9b6faa3b38b69c2237038)
2007-10-21 06:42:33 +10:00
Andrew Tridgell
1a8338e443 increase release number
(This used to be ctdb commit 747ff96f1d93c52ba7548d0540266b0277d88ac1)
2007-10-19 12:22:24 +10:00
Andrew Tridgell
f47f758fe8 merge from ronnie
(This used to be ctdb commit d444fdc7782496abe4b27003b647ac49fb52e6be)
2007-10-19 09:39:07 +10:00
Andrew Tridgell
623e216dcf remove a incorrectly added file
(This used to be ctdb commit ff01a32db81b6c04d42634f5660181c270988264)
2007-10-19 09:30:55 +10:00
Ronnie Sahlberg
e81e008a36 add missing ) in the IB transport (which i dont compile for)
(This used to be ctdb commit 7f7a184bae87d46bd589d11068b6443b007366b4)
2007-10-19 09:05:37 +10:00
Ronnie Sahlberg
fe7b5b4d85 add a stub restart method for IB
(This used to be ctdb commit d318504ad5a49dbdfa307be39ae88df839e6308d)
2007-10-19 09:04:52 +10:00
Ronnie Sahlberg
d1ba047b7f add a new transport method so that when a node is marked as dead, we
shut down and restart the transport

othervise, if we use the tcp transport the tcp connection might try to 
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp 
connection quickly reaches the maximum backoff rto which is often 60 or 
120 seconds.   this would mean that it could take up to 60/120 seconds 
before the tcp layer detects that the connection is dead and it has to 
be reestablished.

(This used to be ctdb commit 0256db470879ce556b0f00070f7ebeaf37e529ab)
2007-10-19 08:58:30 +10:00
Ronnie Sahlberg
755511d28d set the flags explicitely isnstead of masking them in
(This used to be ctdb commit 27a5f9dead44890683f9dbc4f07cda11264aa03b)
2007-10-18 16:54:00 +10:00
Andrew Tridgell
b814462c38 added some debug lines to help track down a problem
(This used to be ctdb commit 2ca31e9de179f76e392a26cc8305e2473357c760)
2007-10-18 16:27:36 +10:00
Andrew Tridgell
5e3d5b1314 merge from ronnie
(This used to be ctdb commit a6b094fdede0ae850e87877fad0b9dd1f3a26869)
2007-10-18 15:51:15 +10:00
Andrew Tridgell
d939a2901b merge from ronnie
(This used to be ctdb commit 75d4b386293e186a6bb8532515585ab72670d663)
2007-10-18 15:44:02 +10:00
Ronnie Sahlberg
e4ec6e9d6b flush the route cache when we have added the single public ip to the
node

cleanup and remove everything when we do a shutdown event

(This used to be ctdb commit 221432f45073bc7624803058c8bbf18838e7ceeb)
2007-10-18 14:13:48 +10:00
Ronnie Sahlberg
537841fadb use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
about this packet any more and just forget it ever saw it

(This used to be ctdb commit 42a2a777cbc15a8cbbea7ecf2fb1c6dafa242d0c)
2007-10-17 15:03:58 +10:00
Ronnie Sahlberg
9a93f4b8df reverse the order in which public ips are listed so it matches the order
of the public_addresses file

(This used to be ctdb commit ce987661edd9160982e65866fb773445d296e5c7)
2007-10-17 13:42:42 +10:00
Ronnie Sahlberg
805ba22d65 merge from tridge
(This used to be ctdb commit 87760a95ec0a9e3cb2c415c569235a1ff58318cb)
2007-10-17 10:10:52 +10:00
Andrew Tridgell
85f91b9d5c increase release number
(This used to be ctdb commit 69fe7ce1d7874ce51d79de29adc53c207cb8869f)
2007-10-16 20:14:04 +10:00
Andrew Tridgell
6b9d73a96d more detail on multipath config
(This used to be ctdb commit 78c44f2267cbef5fbc57d56dfd5ff40972733a1f)
2007-10-16 20:13:28 +10:00
Ronnie Sahlberg
ce7a054d20 add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file   
since at this stage it MUST be locked by the recovery daemon.

in order to avoid a non-blocking fnctl() lock from blocking and cause 
"issues"  we move the 'test that we can not lock reclock file' into a 
child process.

(This used to be ctdb commit 3af994641ec2234e37da1fa1f693441586471a7e)
2007-10-16 15:27:07 +10:00
Ronnie Sahlberg
056aac6e0c add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.

Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb

When this is set,    the first entry in /etc/ctdb/public_addresses will 
always be hosted by node 0, when that node is available, the second 
entry by node1 and so on.

This tunable allows the allocation of addresses to become very 
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are 
identical on all the nodes in the cluster.

(This used to be ctdb commit f0ca221f235731542090d8a6c86f2b7cd2ce2f96)
2007-10-16 12:15:02 +10:00
Ronnie Sahlberg
25d3a031d0 include system/network.h so we get the prototype for inet_aton()
(This used to be ctdb commit 7145764b2d217f88a723dcb0ffd4e5a1567d64cf)
2007-10-16 11:29:33 +10:00
Ronnie Sahlberg
7e2e1b14fb merge from tridge
(This used to be ctdb commit 9e6bc12c9be2dabcfb9c6aeef257ef4737287fab)
2007-10-16 11:26:22 +10:00
Ronnie Sahlberg
b3ff7d904d dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding 
itself after a node arriving/leaving the cluster

(This used to be ctdb commit 6c0d206dea7116db71bccb4802a93dd7283249f6)
2007-10-16 09:50:31 +10:00
Andrew Tridgell
d7f6b63f0a only link to -lipq if needed
(This used to be ctdb commit 7c378d881e37db0f14e07ccba19fde1f9f4f0831)
2007-10-15 14:44:06 +10:00
Andrew Tridgell
574db736f2 improved handling of systems without libipq.h
(This used to be ctdb commit cfa8ddd3ca53c0160558137cccfc7e73e46ec36c)
2007-10-15 14:37:54 +10:00
Andrew Tridgell
9570939337 disable ipmux code until we have a configure test
(This used to be ctdb commit fd83f0f3eb233f22ce9b5b4afbc4f26e3c865b3c)
2007-10-15 14:29:47 +10:00
Andrew Tridgell
99bc0aca93 sync flags between nodes in monitor loop in recmaster
(This used to be ctdb commit 6eef86e06388fc53a1212f1e2783ae174c6cd210)
2007-10-15 14:28:51 +10:00
Andrew Tridgell
0e855c0772 merge from ronnie
(This used to be ctdb commit d18712caba11855010be52f90bac656683076676)
2007-10-15 14:17:49 +10:00
Andrew Tridgell
0a5bfdd315 disable optimisation for now, until we find a occasional segv
(This used to be ctdb commit d09570c70551aa40390ce9ceffe7bc234e1afafe)
2007-10-15 13:31:09 +10:00
Andrew Tridgell
174879621e add config option for disabling bans
(This used to be ctdb commit 153b911f7f957d4c564b04f5aa878033a02da9e4)
2007-10-15 13:22:58 +10:00
Ronnie Sahlberg
ebe772b1b2 use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
(This used to be ctdb commit 6abb46b010851f5719f12273b4a3d46ec986f0c7)
2007-10-11 07:51:57 +10:00
Ronnie Sahlberg
870a57a55b use kill_tcp_connections() to kill off all tcp connections to the
"single public ip" address when we do a recovery

(This used to be ctdb commit 19b52a2d5db31efa9e7c77037097ff8539986ac3)
2007-10-11 07:30:10 +10:00
Ronnie Sahlberg
fa5d51c238 move the kill_tcp_connections() function from 10.interfaces to functions
(This used to be ctdb commit 055948530fb16bf49c42fc4489f29a21665156c0)
2007-10-11 07:27:38 +10:00
Ronnie Sahlberg
1a4999076b first check that recovery master is connected (we know this from our own
flags)

then pull the flags off recovery master before checking if it is banned

(This used to be ctdb commit 94c1d234e57a40eda2d8b892dd9fbe1ffc4b3433)
2007-10-11 07:10:17 +10:00
Ronnie Sahlberg
167e100d4b simplify election handling
make sure we read and update the flags from all remote nodes before we 
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.

(This used to be ctdb commit e85f3806483ea420559d449e0e4d81bec996740f)
2007-10-11 06:16:36 +10:00
Ronnie Sahlberg
33a6aa3c3f merge from tridge
(This used to be ctdb commit 4690a205fe4325b03ab044bdb5fbc9aa3e94db6e)
2007-10-10 10:49:55 +10:00
Andrew Tridgell
011a205b86 make sure reconnected nodes start off as unhealthy so they don't get a public IP
(This used to be ctdb commit c733ec6760cae01ce277f491caf1355e46de5cf7)
2007-10-10 10:45:22 +10:00
Ronnie Sahlberg
bdd67bba1e add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.

add a vnn structure to the ctdb context to describe the single public ip 
address


update the killtcp control in the daemon that if a socketpair that is to 
be killed does not match a normal public address it checks if the 
destination address maches the single public ip address and if so uses 
that vnn structure from the ctdb context


this allows killtcp to kill also connections to the single public ip 
instead of only normal public addresses

(This used to be ctdb commit 5661ba17b91f62821dec1c76056c78b99752a90b)
2007-10-10 09:42:32 +10:00
Ronnie Sahlberg
7735957693 remove some debug outputs
(This used to be ctdb commit f29c0b52df1f455909ba133e3ad3bc462dc32929)
2007-10-09 13:45:42 +10:00
Ronnie Sahlberg
03e0277e03 send out gratious arps when we are starting up serving the "single
public ip" but before we start the ipmux tool

(This used to be ctdb commit dad1a80f39763314825939095f7656c13dcdbdc3)
2007-10-09 12:00:12 +10:00
Ronnie Sahlberg
80cd82f8e4 add a control to send gratious arps from the ctdb daemon
(This used to be ctdb commit 563819dd1acb344f95aabb4bad990b36f7ea4520)
2007-10-09 11:56:09 +10:00
Ronnie Sahlberg
292e9d9109 add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.

this ip address is attached to lo on all nodes but only the recmaster 
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming 
packets to this ip address onto the other node sin the cluster based on 
the ip address of the client host


to use this feature one must
1, have one fixed ip address in the customers network attached 
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
   to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
   to specify which ipaddress should be the "single public ip address"




to test with only one single client,   attach several ip addresses to 
the client and ping the public address from the client with different -I 
options.   look in network trace to see to which node the packet is 
passed onto.

(This used to be ctdb commit 50d648c95e4e6d7c2867a034c2b550086d853320)
2007-10-08 14:05:22 +10:00
Ronnie Sahlberg
ab5d098bf6 add a function in the ctdb tool to determine whether the local node is
the recmaster or not.

return 0 if the node is the recmaster and 1 (true) if it is not or if 
we could not communicate with the ctdb daemon.


call it 'isnotrecmaster' to cope with that if the tool could not bind to 
the socket to tyalk to the daemon, the tool will automatically return an 
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the 
local daemon and if the local daemon confirms this node is the recmaster

(This used to be ctdb commit ae5fcb790b6c3985f514fa8a96bc00c2619f2a28)
2007-10-08 09:47:20 +10:00
Ronnie Sahlberg
de6c5ed14d merge from tridge
(This used to be ctdb commit 02cda01c032804cb1c53593ceb98685c827e2d58)
2007-10-06 08:11:24 +10:00
Andrew Tridgell
50770008df fixed several places where we set the recovery culprit incorrectly
(This used to be ctdb commit d9da73395fa443801fc68ec53a42b548e832d58a)
2007-10-05 13:51:31 +10:00
Andrew Tridgell
4115492992 - catch ESTALE in the recovery lock by trying a read()
- priortise nodes that are unbanned and healthy in the election

(This used to be ctdb commit 929feb475dfdf7283f0e99b50b179e1c91d3a39f)
2007-10-05 13:28:21 +10:00
Andrew Tridgell
fb48f2d5a2 we are the culprit if we can't get the reclock
(This used to be ctdb commit 1d320e113c6134ff6822b985a47131d8204af35a)
2007-10-05 12:01:40 +10:00
Ronnie Sahlberg
72379ee3eb change async.private to async.private_data since private is a reserved
work in c++

(This used to be ctdb commit 79eb28f6cd5dcc30b04966d202a050eaf98a2552)
2007-09-26 14:25:32 +10:00
Andrew Tridgell
ef995062b2 upped version number
(This used to be ctdb commit 4312e20e047ddb0f825c5e0c51d85dfa6a1b7df8)
2007-09-24 15:27:01 +10:00
Ronnie Sahlberg
359448ff00 when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt   or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode

(This used to be ctdb commit a77a77e8b5180f6a4a1f3d7d4ff03811f3b71b56)
2007-09-24 10:52:26 +10:00
Andrew Tridgell
e3d0ec8797 fixed a fd leak on the recovery lock
(This used to be ctdb commit 186f35c42ed4fcc9ed44390b0dd036ece475d45e)
2007-09-24 10:19:07 +10:00
Andrew Tridgell
80100c3573 run monitoring more quickly when unhealthy and at startup
(This used to be ctdb commit ff1c205928e3ef5bcc6bf4e4b2122a19fa38d8f4)
2007-09-24 10:12:18 +10:00
Andrew Tridgell
b87ddd9148 no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
2007-09-24 10:00:14 +10:00
Andrew Tridgell
4178cb98a1 fixed a valgrind error, and some warnings
(This used to be ctdb commit c0f52dbb385fa0748680adb7c40755c92e577551)
2007-09-24 09:57:14 +10:00
Andrew Tridgell
416c0cec6e make the persistent dbdir configurable
(This used to be ctdb commit 2587b887dcfce26b12c66fcb5d34e92da42a1776)
2007-09-21 16:12:04 +10:00
Andrew Tridgell
2607c222fc avoid using connected nodes that aren't in the vnn map yet
(This used to be ctdb commit 2b5ae133f5f6fa9ad1a8896fe4b4c542d4ca462d)
2007-09-21 15:44:13 +10:00
Ronnie Sahlberg
51d912063c in ctdb_control_persistent_store() we must talloc_steal() the pointer to
c   to prevent it from being immediately freed (and our persistent store 
state with it) if we need to wait asynchronously for other nodes before 
we can reply back to the client

(This used to be ctdb commit fa5915280933e4d2e7d4d07199829c9c2b87a335)
2007-09-21 15:19:33 +10:00
Ronnie Sahlberg
61e885d0b9 when ctdb attaches to a database it broadcasts the attach to all other
nodes so that the db is created on them as well

when we send this broadcast   we must use the correct control and not 
assume all databases created are of the temporary kind 

(This used to be ctdb commit 106f816d4a0814ca4418de051289d9fc62df7dd2)
2007-09-21 13:47:40 +10:00
Ronnie Sahlberg
e9f45419da merge from tridge
(This used to be ctdb commit bb283ee8ebaea848366e9c3b3d3244da459a7967)
2007-09-21 13:20:29 +10:00
Andrew Tridgell
c60988325d added support for persistent databases in ctdbd
(This used to be ctdb commit 3115090a0d882beca9d70761130b74bb0821f201)
2007-09-21 12:24:02 +10:00
Ronnie Sahlberg
577fffdc5d merge from tridge
(This used to be ctdb commit c4262a3f980a9fabe0985fc3281e9f5de1cdf35e)
2007-09-19 11:54:45 +10:00
Ronnie Sahlberg
dc0bd93789 one more command to run to enable winbind for vsftpd
(This used to be ctdb commit 1f6d13a364cde58b66a4bc52e909cc68b8c807d7)
2007-09-19 11:53:48 +10:00
Andrew Tridgell
81bfa58d58 make sure we set close on exec on any possibly inherited fds
(This used to be ctdb commit d9dec82076f14a348e7b67b4350180681ff86f32)
2007-09-19 11:46:37 +10:00
Andrew Tridgell
0438b07b53 separate out the various fs display ops
(This used to be ctdb commit dc89e1a428da5d5ca2a9c4988c05de3ea65f00f4)
2007-09-19 11:46:11 +10:00
Andrew Tridgell
bd7eeebe16 expanded ctdb_diagnostics a bit
(This used to be ctdb commit 70a4bb3dc7e624ad778949dbc874c2617fd532e6)
2007-09-17 15:31:33 +10:00
Ronnie Sahlberg
d9f936fefe add documantation of additional requirements for FTP so that users can
log in and access files using the AD username/password

(This used to be ctdb commit 679e125770247fc24dfb14b5781d44f639457ecd)
2007-09-17 13:01:16 +10:00
Andrew Tridgell
c6abc4e1a8 increase release number
(This used to be ctdb commit b213f4f1bf5bcfb40b9ce176df22216e3ebbe964)
2007-09-14 19:27:11 +10:00
Andrew Tridgell
ed75f988d5 merge from ronnie
(This used to be ctdb commit 913c33a7d2f67570548fecc568dba874e5f72dd2)
2007-09-14 15:23:23 +10:00
Ronnie Sahlberg
2d0261afeb let ctdb ip only print the ip addresses known to the specified node
and not the entire cluster

(This used to be ctdb commit eb1f67a56d752c9f42a9a26a6697a7ab8e668b3a)
2007-09-14 15:19:44 +10:00
Ronnie Sahlberg
90a37c4fb4 update vnn -> pnn in documentation
(This used to be ctdb commit bb62b4df514255b95d3871b254bec9c440bc4a06)
2007-09-14 14:24:53 +10:00
Ronnie Sahlberg
05e1f67381 documentation updates
it is --event-script-dir      not --event-script

add explanation of the public_addresses file

(This used to be ctdb commit 21325b23e786ac1c2abc07ea75b0814e9c725a9e)
2007-09-14 14:19:12 +10:00
Andrew Tridgell
c62490569b cope with non-standard install dirs in event scripts
(This used to be ctdb commit 52fff5345873690a9cc86495f414343eaa3bd540)
2007-09-14 14:14:03 +10:00
Andrew Tridgell
305f432e50 fix pkill args
(This used to be ctdb commit 9690de97b4746f4a79830465e3a1679e9fbda671)
2007-09-14 11:59:04 +10:00
Andrew Tridgell
955d4d8615 make sure all public IPs are removed at startup
(This used to be ctdb commit b16f33787f2a9471285037f4a6d470e826536570)
2007-09-14 11:56:40 +10:00
Ronnie Sahlberg
8edcd3f83f during startup make sure to delete any public addresses from any
interface

(This used to be ctdb commit 18d80ea6db39e61f60e4c01de164d58bcbd8ab10)
2007-09-14 10:37:10 +10:00
Ronnie Sahlberg
6052078b53 let each node verify that they have a correct assignment of public ip
addresses (i.e. htey hold those they should hold   and they dont hold 
any of those they shouldnt hold)

if an inconsistency is found, mark the local node as recovery mode 
active
and wait for the recovery master to trigger a full blown recovery

(This used to be ctdb commit 55a5bfc8244c5b9cdda3f11992f384f00566b5dc)
2007-09-14 10:16:36 +10:00
Andrew Tridgell
42fc00bda9 - merge from ronnie
- add a flag to check that recovery completed correctly. If not, re-trigger it in monitoring

(This used to be ctdb commit d5ed941d9bab4af30d8b5f9b77bdf43d9218d69b)
2007-09-14 09:49:12 +10:00
Andrew Tridgell
3b159e4e60 wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
(This used to be ctdb commit 216eb4be7ec481cfe9aaeeada257b77cb394d2e4)
2007-09-14 09:25:11 +10:00
Andrew Tridgell
9cf96a5e4c nicer use of testparm
(This used to be ctdb commit a611ea930fb9dae6e56f6a74b2bdc9e08066d4d1)
2007-09-14 09:24:34 +10:00
Ronnie Sahlberg
4c20141659 update the section about event scripts
(This used to be ctdb commit a0744480c85a4e8648bd0ae7600f90d311b931ea)
2007-09-14 08:56:27 +10:00
Ronnie Sahlberg
528b8af87f disable nfsv4 in etc/sysconfig/nfs
(This used to be ctdb commit b71e11f0e27bb3ff908ad171aa5b1f724609ad05)
2007-09-14 08:15:24 +10:00
Ronnie Sahlberg
4186d8eaba when a ctdb_takeover_run has failed we must make sure that
need_takeover_run is set to true  or else we might forget to rerun it 
again during the next recovery


othervise,  need_takeover_run is only set to true IFF the node flags for 
a remote node and the local nodes differ.
It is possible that a takeover run fails  and thus the reassignment of 
ip addresses is incomplete  but before we get back to the test in    
monitor_cluster()  that all the node flags of all nodes have converged 
and they now match each others again.   and thus causing 
monitor_cluster() to fail to realize that a takeover run is needed.

(This used to be ctdb commit ae7e866787cebd14394983ce1834387c959d1022)
2007-09-13 14:51:37 +10:00
Andrew Tridgell
2f86c3f827 ensure smbd and winbindd do die in 50.samba
(This used to be ctdb commit 6f23affedb626fc7a5ca86c4763f3045a5586231)
2007-09-13 14:36:23 +10:00
Ronnie Sahlberg
ab1c8c074e merge from tridge
(This used to be ctdb commit eda3caa77be352967a41ff9bddda5296c94797a9)
2007-09-13 14:28:18 +10:00
Andrew Tridgell
9d50595b8a prevent recursion in the calling of ctdb_takeover_run
(This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d)
2007-09-13 14:08:18 +10:00
Andrew Tridgell
6fa6101b1a more shell scripting fixes in 10.interface
(This used to be ctdb commit 4ee2230b3f2ae7437a9d0cf973eb4645d276accd)
2007-09-13 11:57:42 +10:00
Andrew Tridgell
30de14fe79 force recovery if unable to tell a node to release an IP
(This used to be ctdb commit 6895788d2499344a03357e5c1103cb8383e9eaf7)
2007-09-13 11:19:49 +10:00
Andrew Tridgell
25940014c0 fixed script errors in 10.interface
(This used to be ctdb commit 0c759614d27758cef3eba5942b2cccad54193cbb)
2007-09-13 11:19:30 +10:00
Andrew Tridgell
3c0f61cb92 we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)
2007-09-13 10:45:06 +10:00
Andrew Tridgell
4f261ae191 remove more cruft from the logs
(This used to be ctdb commit b67f35c483b6cbb5facaa6380c7794709f44213a)
2007-09-13 10:39:05 +10:00
Andrew Tridgell
023b885793 new approach for killing TCP connections on IP release
(This used to be ctdb commit c33a0db29b5604966f582b1f8c5fd66760c72197)
2007-09-13 10:24:48 +10:00
Andrew Tridgell
1b53ecc445 remove clutter from ctdb log file
(This used to be ctdb commit 54d5dcaaee0498f40bbee5059cc72d0ca75d33b7)
2007-09-13 10:03:18 +10:00
Andrew Tridgell
a919f6927a fixed return code
(This used to be ctdb commit 30165b5a19f9bd9d1f62c9c222df0711c1c6a927)
2007-09-13 10:02:56 +10:00
Andrew Tridgell
96c54c6188 handle hung or slow ctdb daemons on shutdown
(This used to be ctdb commit a3089211782ab12387c1b04efa28914c94d89b30)
2007-09-12 13:26:24 +10:00
Andrew Tridgell
6c77184d96 - set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

(This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)
2007-09-12 13:23:36 +10:00
Andrew Tridgell
67bd64ef35 - don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier
(This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)
2007-09-12 13:22:31 +10:00
Andrew Tridgell
a478c78f03 changed some debug levels
(This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)
2007-09-12 13:21:19 +10:00
Ronnie Sahlberg
536d393452 use the public addresses variable instead of hardcoding the path
(This used to be ctdb commit 8e23f173cda8a76bbc243863bfc49fe8c7b907f4)
2007-09-12 07:28:24 +10:00
Ronnie Sahlberg
98f968d8d3 move all ip addresses onto loopback when we startup ctdb
(This used to be ctdb commit 5d7500f7d93f0d36ffbf3c966c5b38f82f0376c7)
2007-09-12 07:26:30 +10:00
Andrew Tridgell
a6728e0520 fixed location of arp_filter
(This used to be ctdb commit ea239c82fca2b9a648d21e5c603e632011958452)
2007-09-11 16:38:32 +10:00
Andrew Tridgell
5b65a6c7f0 get interface right
(This used to be ctdb commit e0edc38d7e897f7de2850eb2cfd17fea75c16fcc)
2007-09-10 20:45:27 +10:00
Ronnie Sahlberg
a9a8ad07b4 grab the interface name from tok and not from the uninitialized array
(This used to be ctdb commit 23a47ca2331a163b5fde03bd2f6f1d478633aede)
2007-09-10 16:34:11 +10:00
Ronnie Sahlberg
9c1b2f4856 merged patch from tridge
(This used to be ctdb commit 90ab044093f67b656e21861ce12d6fee5794d21f)
2007-09-10 16:23:06 +10:00
Andrew Tridgell
8cd7ca149e fixed a pointer cast warning
(This used to be ctdb commit df0e7a4aa13112d613702d8ea0fb0e18510d293c)
2007-09-10 15:16:17 +10:00
Andrew Tridgell
57d8102cf8 added back --public-interface to startup script
(This used to be ctdb commit 9e9cb3c0da7251f522c655366ef0868037577a9c)
2007-09-10 15:09:28 +10:00
Andrew Tridgell
f3ae1cdb02 - use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)
2007-09-10 14:27:29 +10:00
Andrew Tridgell
70ec39b1b1 add back in --public-interface as a default
(This used to be ctdb commit cdf56daf69b2c8381ee673943e982ad20f19affd)
2007-09-10 14:26:35 +10:00
Andrew Tridgell
42168177ef merge from ronnie
(This used to be ctdb commit 1f21d4d563232926c35d03c4d69eb69190823dc6)
2007-09-10 13:21:11 +10:00
Andrew Tridgell
f3927719c9 add crontab and sysctl output
(This used to be ctdb commit b1b59f3294ee7a5ed6d685f373bf19d3152170fa)
2007-09-10 11:27:07 +10:00
Ronnie Sahlberg
50381480eb update a comment
(This used to be ctdb commit e7d3ef4443686529299e8f293398cc0522235627)
2007-09-10 07:45:57 +10:00
Ronnie Sahlberg
4ac749bfa4 change the signature to ctdb_sys_have_ip() to also return:
a bool that specifies whether the ip was held by a loopback adaptor or 
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address 
over to the loopback interface

when we release an ip address  after we have move it onto loopback, 
use 60.nfs to kill off the server side (the local part) of the tcp 
connection   so that the tcp connections dont survive a 
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip 
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public 
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)
2007-09-10 07:20:44 +10:00
Ronnie Sahlberg
0ebd7beb4b set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
10.interfaces startsup

this setting makes the system only respond to APR requests from the NIC 
where the ip address is tied to and adds to the 
"principle of least surprise" when using multihoming servers

(This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)
2007-09-08 08:09:02 +10:00
Ronnie Sahlberg
d91b28f8b7 ctdb ip must loop over all connected nodes to pull hte public ip list
and merge into a big list   since with the deassociation between a node 
and a public ipaddress    the /etc/ctdb/public_addresses files can 
differ between nodes and no node know about all public addresses that a 
cluster can use

(This used to be ctdb commit e208294fed183977cacc44b2cd1195c11d967c18)
2007-09-07 16:45:19 +10:00
Ronnie Sahlberg
3cad21d6be remove the ctdb publicip command
this command no longer makes sense when there is no on-to-one mapping 
between a node and its default public ip

(This used to be ctdb commit 91280db7f6dd3d659edd86fae21ba347d6f9da9e)
2007-09-07 15:39:26 +10:00
Ronnie Sahlberg
d0dd8df752 update web nfs with the new NFS_HOSTNAME variable we need to be able to
stat notify using the correct hostname

(This used to be ctdb commit 1498e33e48a4654e02b74a00ef7473fed3225d69)
2007-09-07 12:20:48 +10:00
Ronnie Sahlberg
eb7a15730e add a short delay after stopping nfslock to make it less likely that
"weird" things happen

(This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a)
2007-09-07 12:14:53 +10:00
Ronnie Sahlberg
68c37f9b41 merge from tridge
(This used to be ctdb commit 58c918b1bfe09c31049769dee266129cbad4cb20)
2007-09-07 09:21:40 +10:00
Ronnie Sahlberg
fa872de664 60.nfs:
we must always restart the lockmanager when the cluster has been 
reconfigured and ip addresses has changed. This is to make sure we get a 
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were 
direclty affected, a different client can take out a conflicting lock 
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!

statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for 
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients   but there is still 
the possibility that a cifs client can take out a conflicting lock 
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs 
lock manager "samba aware" and making samba aware of the internal state 
of the kernel lock manager so that they can cooperate.

we can not just stop/start the lockmanager back to back in rhel5 since 
if they are stopped/started too close to eachother then when the new 
lockmanager upon starting up sends out statd notifications two things 
can happen:
1, new lockmanager sends out notification BEFORE it has registered with 
portmapper leading to 
  lockmanager starts
  lockmanager sends notification to the client
  client tries to recover the lock and tries to portmap the lockmanager
  port on the server.
  server is not (yet) registered with portmapper and server responds
  "no such program" to hte clients request to discover where lockmanager
   is.
  client then just completely gives up reclaiming the lock and doesnt 
  even reattempt the portmapper call after some timeout.
  ==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
   lock  the lockmanager sometimes sends two responses back to back
   to the client.   one with status NLM_GRANTED (==you got the lock 
reclaimed) and one with status NLM_DENIED (==you could not get the lock 
reclaimed)
   This confuses the client and leads to the server thinking that the 
client does have the lock   and the client thinking it has not got the 
lock    and orphaned locks result.


We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.

(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)
2007-09-07 08:52:56 +10:00
Ronnie Sahlberg
82984577f1 we dont need the rpc.statd on shared directory neither do we need
PUBLIC_IP anymore

(This used to be ctdb commit fd571ac87f65928e92dde6977745083bf381df1a)
2007-09-06 11:32:18 +10:00
Ronnie Sahlberg
00453a375a improve the handling of hosts to notify with statd
(This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b)
2007-09-06 11:30:49 +10:00
Ronnie Sahlberg
19546fb007 specify the additional ports for nfs
(This used to be ctdb commit 1934163f0b393738615a05854082a7d488003e1c)
2007-09-06 10:26:44 +10:00
Ronnie Sahlberg
f7d193e9ce the event scripts for nfs are called 60.nfs and 61.nfstickle
(This used to be ctdb commit b15f1c25560320993b93aa3d943985dab4e47947)
2007-09-06 10:18:13 +10:00
Ronnie Sahlberg
0781616ef9 document NFS_TICKLE_SHARED_DIRECTORY on our web page
(This used to be ctdb commit 40ec29f602897e9b01a6747806f502ab38423d54)
2007-09-06 08:21:11 +10:00
Ronnie Sahlberg
46eecfea27 we dont use 'sendip' any more so dont check for it and exit from the
61.nfstickles script if it is missing from the host

(This used to be ctdb commit 8eac441e24f4ef33b55f9eaa4856b5c1e1c15213)
2007-09-05 15:39:51 +10:00
Ronnie Sahlberg
a9c8456ed6 we should always get data back from getnodemap
(This used to be ctdb commit ff999a4b56f714c58c81baa454a2d39d04944136)
2007-09-05 14:59:29 +10:00
Ronnie Sahlberg
e4eeceaf3a dont dereference vnn before we have assigned it a pointer value
(This used to be ctdb commit 2a8fc69aea8527b22a3fe57427677e4caff57338)
2007-09-05 14:29:44 +10:00
Andrew Tridgell
c572d3c226 added a diagnostics tool for ctdb
(This used to be ctdb commit 032a2238caf688656b00e06bf363182368e037e1)
2007-09-05 14:20:34 +10:00
Ronnie Sahlberg
77ec4d5248 allow different nodes in the cluster to use different public_addresses
files
so that we can partition the cluster into different subsets of nodes 
which each serve a different subset of the public addresses

(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)
2007-09-04 23:15:23 +10:00
Ronnie Sahlberg
8f819c6a0e get rid of the ctdb_vnn_list structure and just use a single list of
ctdb_vnn

(This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)
2007-09-04 18:20:29 +10:00
Ronnie Sahlberg
cf45c5096c we cant have takeover_ctx hanging off ctdb since it is freed/recreated
everytime we release an ip.
this context is used to hold all resources needed when sending out 
gratious arps and tcp tickles during ip takeover.

we hang it off the vnn structure that manages that particular ip address 
instead   so that we can have multiple ones going in parallell

this bug (or the same bug in different shape) has probably been in ctdb 
for very very long   but is likely to be hard to trigger

(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)
2007-09-04 14:36:52 +10:00
Ronnie Sahlberg
3e6be59f61 fix typo in debug output
(This used to be ctdb commit 011a777c6e538ca79f104c7884a4f0e222997382)
2007-09-04 14:21:35 +10:00
Ronnie Sahlberg
784eac9079 dont just always return 0 from the killtcp control.
return 0 or -1 so that the ctdb tool knows whether the control succeeded 
or not

(This used to be ctdb commit cace8b40090be5529ec6b463d3839d0e22f4039d)
2007-09-04 14:19:18 +10:00
Ronnie Sahlberg
a50e83448c change vnn to pnn in the traverse structure
(This used to be ctdb commit d56ae0963b420edea6a2d5eeb408a9811af3f3f6)
2007-09-04 10:49:21 +10:00
Ronnie Sahlberg
f69321edc8 change debug output from vnn to pnn
(This used to be ctdb commit 93a7cf759ae3f9af6671b9f8589e1399a669b46f)
2007-09-04 10:47:02 +10:00
Ronnie Sahlberg
d66d9cdd22 change debug output from vnn to pnn
change ctdb_daemon_send_message to take pnn as parameter isntead of vnn

(This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)
2007-09-04 10:45:41 +10:00
Ronnie Sahlberg
0c91261340 change ctdb_send_message to take pnn as parameter instead of vnn
(This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)
2007-09-04 10:42:20 +10:00
Ronnie Sahlberg
157be530dd change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn
(This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)
2007-09-04 10:38:48 +10:00
Ronnie Sahlberg
211b497818 change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn
change ctdb_ban_info.vnn to ctdb_ban_info.pnn

(This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)
2007-09-04 10:33:10 +10:00
Ronnie Sahlberg
6f693bbcbd change server_id.vnn to server_id.pnn
(This used to be ctdb commit 26f2ee2b754a9271454412f05111a19b3013c6eb)
2007-09-04 10:21:51 +10:00
Ronnie Sahlberg
583b6e6ba6 change ctdb_get_vnn to ctdb_get_pnn
(This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)
2007-09-04 10:18:44 +10:00
Ronnie Sahlberg
4ba9990143 change vnn to pnn in the ctdb tool
(This used to be ctdb commit 822556a4d4ba23459be3a25cbd3f48d1f64ba95f)
2007-09-04 10:14:41 +10:00
Ronnie Sahlberg
fc9d39c3a6 change ctdb_validate_vnn to ctdb_validate_pnn
(This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)
2007-09-04 10:09:58 +10:00
Ronnie Sahlberg
eb4cf6a686 change ctdb->vnn to ctdb->pnn
(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)
2007-09-04 10:06:36 +10:00
Ronnie Sahlberg
12ebb74838 change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each 
node.

this is a massive patch since we have previously made the assumtion that 
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file 
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)
2007-09-04 09:50:07 +10:00
Andrew Tridgell
7423bcaabe up the release number
(This used to be ctdb commit 71a6213c92a12bf794c17c30ae4987149b68fe1b)
2007-08-30 17:51:05 +10:00