1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

545 Commits

Author SHA1 Message Date
Martin Schwenke
3380c6ce1d Eventscript functions: add $CTDB_ETCDIR and hook service() functions.
* $CTDB_ETCDIR defaults to /etc but can be changed for testing.  All
  hard-coded instances of /etc have been changed to $CTDB_ETCDIR.
  This includes references to /etc/init.d and /etc/sysconfig.

* service() and nice_service() functions now call new function
  _service().  This makes it easier to override these functions (say,
  in rc.local) for testing and call most of the existing functionality
  using _service().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f43c9a7604b779bb6257ddb2bf3cbe266d496a63)
2011-08-03 16:45:54 +10:00
Martin Schwenke
d31fbcab4b Set $CTDB_VARDIR in the functions file.
This will be needed when eventscripts that use it are called
externally.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ebd53b66b0cc66d9d04830781886234167fc2164)
2011-08-03 16:44:49 +10:00
Martin Schwenke
652bf326e1 Eventscripts - 10.interfaces should not check orphaned interfaces.
If the last IP address on an interfaces is removed then that
interfaces should no longer be checked by 10.interfaces.  However,
"ctdb ifaces" still lists such interfaces so they are currently
checked.

The problem really needs to be addressed in ctdbd but a neat quick
eventscript fix will be minimally invasive...

This changes the code to use "ctdb -Y ip -v" instead of "ctdb -Y
ifaces".  The former includes details of all public addresses and
associated interfaces, so when an address is removed there is no
output for it.  This avoids orphaned interfaces from being listed.

The logic is also slightly improved so that $IFACES includes just a
(non-uniquified) list of interfaces, allowing an existing loop to be
removed.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443)
2011-08-02 16:53:14 +10:00
Ronnie Sahlberg
18af72f08f change the name for the key for the record where we stoire the public address config from public-addresses... to public_addresses...
CQ1019030

(This used to be ctdb commit 114d5034ff4880848588caf493382a537a1469ae)
2011-06-28 15:40:46 +10:00
Mathieu Parent
c262fe6a8f Fix bashism
... again ;-)

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 2266586c1839af032622be54dc7f71e39d2bd9ef)
2011-05-14 22:30:25 +02:00
Ronnie Sahlberg
d020b2c950 When using multiple VLANs, some funky stuff can sometimes happen when
adding/removing IP addresses causing routes might be dropped by the system.

The easiest workaround for this is to unconditionally try to reapply
all static routes for all interfaces once ipreallocation has finished,
not just adding them back on the affected interface.

This worksaround a funky issue in
CQ S1023538

(This used to be ctdb commit 84600d1f53632d5fe76c308727f31f61b5ec1010)
2011-05-12 12:06:45 +10:00
Ronnie Sahlberg
d1edf44e4f If samba fails to start for some reason, make this cause the startup event to fail too, so that ctdbd will re-try the startup event later.
Or else this will leave samba not running.

CQ S1023394

(This used to be ctdb commit f90485b08d32cbe56050718a3b28ca0fe1d64e0f)
2011-05-10 09:59:38 +10:00
Ronnie Sahlberg
ee9e137759 Dont exit from checking interfaces once we have found one interface that is not
in use by public addresses.   this can happen when we have removed existing interfaces/ip addresses and prevents us from verifying the status of other interfaces

(This used to be ctdb commit d67955b42f7627be9dae995230c8fcbb8a948ec2)
2011-05-10 07:53:43 +10:00
Ronnie Sahlberg
2e2e37fdd6 Remove logging of spam/errors from the 10.interfrace
script if/when we have for example NATGW configured but no public addresses defined on that interface

CQ S1023378

(This used to be ctdb commit 8837daa424732aeb5a20814b1709c345a97a0e09)
2011-05-09 08:10:49 +10:00
Ronnie Sahlberg
d97e42183e bonding mode 4 monitoring:
we can not just check if MII Status is up for bonding mode 4, since the kernel will always report the bond device as UP
even if all cables are disconneccted.

For mode 4, ignore the status of the bond device and instead chek if at least one slave interface is up
when determining if the device is good or bad

(This used to be ctdb commit a6930cec6d9503dba18b9d4839d87a1c1a8ddba2)
2011-04-13 09:05:58 +10:00
Ronnie Sahlberg
c04505724a IFACE handling. Assume links are always good on nstartup (they almost always
Simplify the handling of setting the links in the 10.interface eventscript
and remove the optimization to only call setifacelink on state change
to make the code simpler to read.

If a take ip event fails, flag the node as unhealthy.

Add a check to the interface script to check if the interface exists
or if it has been deleted.
So that we can capture and become UNHELTHY if someone deletes an interface
we are using to host public addresses.

(This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)
2011-04-11 07:40:05 +10:00
Ronnie Sahlberg
55853a4683 NATGW: dont set arp_ignore in 11.natgw anymore since we no longer
need this for the natgw functionality

(This used to be ctdb commit bf3bf2967e3781c918e33b3a210e68e0ccca0c51)
2011-04-06 11:33:11 +10:00
Michael Adam
c9dc10292e ctdb.init: print a warning when tdbdump is found but tdbtoo or "tdbtool check" is not available
(This used to be ctdb commit afb26e38b617b85cdac14a7cd6dd3c85b8fddbc4)
2011-04-05 13:50:00 +02:00
Michael Adam
faa6d8d7e2 ctdb.init: check for availability of "tdbtool check" and "tdbdump"
Print a warning if neither is available.

(This used to be ctdb commit 4137d2a7d31cdce22847cebfc0239cfe2d8e937c)
2011-04-05 13:43:56 +02:00
Mathieu Parent
a5a6140b7e Correction of spelling errors
* continous -> continuous
* activete  -> activate

(thanks to lintian)

See https://bugzilla.samba.org/show_bug.cgi?id=6935

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit fb6987c2f747d6dbf9bb3899a480124d1c242a90)
2011-03-23 00:35:23 +01:00
Ronnie Sahlberg
a453e79050 50.samba : Tell winbind about every time we add/remove and ip from the node
CQ S1021636

(This used to be ctdb commit 87b279027616cffbcedfd534ac0032cd51238dfe)
2011-02-18 11:29:35 +11:00
Ronnie Sahlberg
d32a4dd501 remove checking for filesystems and filesystem health from the cnfs script.
remove the gpfsmount and gpfsumount entry points

(This used to be ctdb commit 7db5a4832a9555be53c301f198f72b9e075a8ae7)
2011-02-18 10:11:56 +11:00
Ronnie Sahlberg
ef0ab7eee1 60.nfs
Dont update the statd settings that often.
When we have very many nodes and very many ips, this would generate
a lot of unnessecary load on the system

(This used to be ctdb commit 0c030c9384500f340d8382c20e1e91b11aa377e9)
2011-02-18 10:10:34 +11:00
Martin Schwenke
59c5a9f279 Eventscripts: lower the fail/restart limits for nfsd.
We were potentially leaving a node unable to serve requests for too
long.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)
2011-01-11 16:49:46 +11:00
Martin Schwenke
96378d6dc8 Eventscripts: use "startstop_nfs restart" to reconfigure NFS.
This was defaulting to just "service nfs restart", which doesn't have
the workarounds we need.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)
2011-01-11 16:49:14 +11:00
Martin Schwenke
3efd5ef77c Eventscripts: only autostart during a monitor event.
Otherwise we might short-circuit events that are run only once and
actually need to do something.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)
2011-01-11 16:48:50 +11:00
Martin Schwenke
fb8f199651 Eventscripts: print a message when reconfiguring a service.
Otherwise there can be strange error messages from services
stopping/starting, without any context.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)
2011-01-11 16:48:17 +11:00
Martin Schwenke
934ae76d38 Eventscripts: work around NFS restart failure under load.
"service nfs restart" can fail.  To stop nfsd it sends a SIGINT and
nfsd might take a while to process it if the system is loaded.
Starting nfsd may then fail because resources are still in use.

This does some /proc magic to tell nfsd to do no more processing.  It
then runs service stop, kills nfsd with SIGKILL, and then runs service
start.  This is much less likely to fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)
2011-01-11 16:47:43 +11:00
Ronnie Sahlberg
47aad74673 TYPO
(This used to be ctdb commit 38dc1ac2e87416a22c9356596286b773d601e71c)
2011-01-11 16:17:33 +11:00
Ronnie Sahlberg
2a3442d972 STATD is 100027 not 1000247
(This used to be ctdb commit f4cf15a2b06ffefde0cba803603b48040ad0fa05)
2011-01-11 16:16:28 +11:00
Ronnie Sahlberg
7e747aab8d 60.nfs Check if we have rpc.statd and if not, skip checking for statd
availability at all (since we cant restart it, there is not point checking
if it is alive)

(This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)
2011-01-06 15:49:15 +11:00
Ronnie Sahlberg
ded7c23122 41.HTTPD
Httpd can be very slow to start on some platforms,
wait 5 monitor intervals before we try to restart it if
it has not bound to port 80 yet.
After 10 failed intervals, flag the node as unhealthy.

(This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)
2010-12-22 10:31:41 +11:00
Ronnie Sahlberg
e9ff38be7d 60.nfs
Try to restart LOCKD after 10 failures and
flag the node as unhealthy after 15 failures

(This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)
2010-12-22 10:31:31 +11:00
Ronnie Sahlberg
57e74f6d8a Dont run net serverid wipe in the background
(This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)
2010-12-22 10:31:26 +11:00
Ronnie Sahlberg
97a6eccaf7 50.samba
Net serverid wipe can take a bit of time sometimes so background it.

Only perform auto start/stop of the managed service on the monitor event

(This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)
2010-12-14 21:19:28 +11:00
Ronnie Sahlberg
1e41ab5fa3 LVS
update lvs configuration on ipreallocated events too

(This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)
2010-12-13 14:24:16 +11:00
Ronnie Sahlberg
c26c6a01cf only run "serverid wipe" if we are actually running samba.
we dont need to run this on systems where we do run winbind but not samba

(This used to be ctdb commit fcb9e8d1e1c78439ea42adb8b05ad84fbca7f724)
2010-12-10 13:42:12 +11:00
Ronnie Sahlberg
8147d29598 add a missing part of the import of the previous ganesha patch
(This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)
2010-12-06 11:50:15 +11:00
Chandra Seetharaman
5e485d5ca0 make changes to ctdb event scripts to support NFS-Ganesha.
make changes to ctdb event scripts to support NFS-Ganesha.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 7298588ed54492f106954c893dd86b0a36783470)
2010-12-06 11:50:12 +11:00
Ronnie Sahlberg
8959c8e850 dont try starting samba through the "init" event
(This used to be ctdb commit e314a449606418a4c4eac6eb319bfcdf1c398cd3)
2010-12-03 11:40:38 +11:00
Ronnie Sahlberg
6ed0009125 When we are no longer the natgw master, dont put the natgw ip on loopback.
We put the ip on loopback just to make sure we would still interoperate with
non-standard configurations on unix-KDC, that are configured to verify the optional
HostAddresses field.
This is not required for AD, since AD does not use this field, and is replaced in
unix land with other/better mechanisms than this "dodgy" check.

This makes it "easier" for applications that have bound to the natgw address
to detect a socket problem and try to reconnect/recover if the ip address
is completely missing from the system.

At the same time, use the winbind specific hook that exists to explicitely tell winbindd : this address is gone, so if you have bound to it, this is a good time to close and rebind your socket.

cq 1020333

(This used to be ctdb commit 0da94869d2912b2a412ba3fbd2137d88ce4e4389)
2010-11-29 12:45:59 +11:00
Ronnie Sahlberg
ebcc866ae0 update autostart/stop to work for samba
(This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)
2010-11-22 20:42:26 +11:00
Ronnie Sahlberg
a3e7dfadca add an explicit _is_managed_service to iscsi eventscript
(This used to be ctdb commit 44f683a1ba15944d3306a0effd572de3280ff975)
2010-11-18 14:15:56 +11:00
Ronnie Sahlberg
193d9d50d1 Dont pollute the logs with a "file not found" message
CQ S1020745

(This used to be ctdb commit ea8bb7b26bb879a895c267d49672433182390d0d)
2010-11-18 13:54:15 +11:00
Martin Schwenke
c00db6f271 60.nfs eventscript should do nothing if NFS isn't managed by CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 582e5cd077501e8d4131a9c7981781471308edfd)
2010-11-18 13:36:40 +11:00
Martin Schwenke
a2af87482b Eventscript functions - catch failures in ctdb_service_start().
ctdb_service_start() currently succeeds if ctdb_counter_init()
succeeds.

This changes it to fail when a service start fails.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)
2010-11-18 12:15:05 +11:00
Martin Schwenke
3ab768e8d4 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)
2010-11-18 12:12:30 +11:00
Ronnie Sahlberg
4fe85e5be5 add a new support function ctdb_check_counter_equal()
update nfs to try to restart the service after 10 consecutive failures
and to flag the node unhealthy after 15

add similar function to mountd

(This used to be ctdb commit 1569a54bb82fc433895ed68f816cf48399ad9d40)
2010-11-17 13:54:57 +11:00
Martin Schwenke
8fe1ec3754 Eventscripts: make loadconfig() function hookable by the test suite.
Rename loadconfig() to _loadconfig().  Add a new loadconfig() that
simply calls _loadconfig().

This makes it easy for the test suite to override loadconfig().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1d77a3adfff893b3c01b87f791e72c0d3148425c)
2010-11-17 11:46:48 +11:00
Martin Schwenke
e23ca7dba5 Make a time comparison in 60.nfs eventscript more readable.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 26077e6c8eb126584af587e7416154ea4858aea2)
2010-11-17 11:44:26 +11:00
Martin Schwenke
6ab5ae2c9b 60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
These failures are sometimes the result of slow restarts so we want to
avoid dirtying the logs or marking a node unhealthy because of them,
unless they are excessive.

For these 2 cases we use the existing fail counting code but hack a
temporary service_name in a subshell to allow separate fail counts.

We also update ctdb_check_rpc() so that it captures the error output
from rpcinfo and we add a message including the service name to the
beginning.  The error is printed to stdout but is also stored in
ctdb_check_rpc_out to allow it to be conditionally used by the caller.
This function also now returns non-zero rather than exiting on
failure.

Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
for consistency.

Option handling code for service restarts is cleaned up so that fits
in 80 columns.  A more informative restart messageis now used in all
cases, printing the exact command being used to start a service.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79c25fe241cf5d8f92e23d3736823ebaf4e1769d)
2010-11-17 11:43:09 +11:00
Ronnie Sahlberg
055eafb790 this stuff is just so fragile that it will enter infinite recovery and fail loops
on any kind of tiny unexpected error

unconditionally try to remove ip addresses from both old and new interface
before trying to add it to the new interface to make it less
fragile

(This used to be ctdb commit 80acca2c91c9053c799365bae918db7ed8bdc56f)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
ebed26d755 delete from old interface before adding to new interface
this stops the script from failing with an error if
both interfaces are specified as the same, which otherwise breaks and leads to an infinite recovery loop

(This used to be ctdb commit 565de03a784ed441490f8cd0b137b5cec8716d55)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
76578b9533 dont delete all ips from the system during the initial "init" event
leave any ips as they are and let the recovery daemon remove them as required

(This used to be ctdb commit 8ab311719857847b4cf327507b0af1793551e73c)
2010-11-10 14:55:23 +11:00
Ronnie Sahlberg
a1cfa23d60 Both nfs and nfslock scripts can fail under redhat in very rare situations.
Ctdb can also be configured to ignore checking for knfsd and if it is alive.
In that situation, no attempt will be made to restart nfs, and sicne nfs is not running,  lockd can not be restarted either.

To workaround this, everytime we try to restart the lockmanager, also try to restart nfsd

(This used to be ctdb commit 953dbfbddad656a64e30a6aca115cb1479d11573)
2010-10-28 13:45:40 +11:00
Ronnie Sahlberg
0d75856bb7 When shuttind down, we always unconditionally try to remove the natgw address
even if we are not currently the natgw master.
This adds extra reliability in case we have stopped previously without removing it proper,
but does add spam messages to syslog everytime we shutdowm.

Remove these spam messages from pulluting the syslog upon normal shutdown

(This used to be ctdb commit cd84da6f247ee46bbab8318298d1cd3cfc87aba9)
2010-10-28 13:38:07 +11:00
Ronnie Sahlberg
14c8228292 Redirect the output from 00.ctdb pfetch to stdout.
Normally, the config.tdb database would not exist, so we do not need
to spam syslog with a "config.tdb does not exist" message every time we start ctdb

(This used to be ctdb commit 5792809b72e534161c5ca9ef5c9897abcb3b899c)
2010-10-28 13:35:55 +11:00
Stefan Metzmacher
ab6beb6b7f events.d/11.routing: handle "updateip" event
metze

(This used to be ctdb commit 034635418c7e5274d6bdf4cccc7a10e3b631e2d4)
2010-10-21 11:09:46 +11:00
Ronnie Sahlberg
b4e3a95039 try to restart NFS LOCKD if it failed to start
(This used to be ctdb commit 2913cc93a9a172caf9e0d6675cfa4de4cc957b13)
2010-10-14 08:13:09 +11:00
Ronnie Sahlberg
0de79c12ba Make sure the statd directory exist before trying to access the
"update trigger" file.

CQ 1020344

(This used to be ctdb commit 171f98f6f7ce7d01f47c44043ad599702711b12d)
2010-10-12 08:02:18 +11:00
Ronnie Sahlberg
842d9aab4e move extracting the config from config.tdb for public addresses
into its own function

(This used to be ctdb commit 2d478a39ed8303b0371112d61630660d12b7db2c)
2010-10-12 02:57:53 +11:00
Ronnie Sahlberg
f7febd28af dont stop checking interfaces after the first bond device
continue the loop to process all other interfaces too

(This used to be ctdb commit 500ade4e6a58ea786a665f6be7cf30f43c882570)
2010-10-09 10:55:43 +11:00
Ronnie Sahlberg
51a38dc4a4 Spotted by rusty.
Add a missing $
so we delete $_ip   and not _ip

(This used to be ctdb commit e9d04c5f419eaa0338a3beefba32c52be00242a8)
2010-10-08 15:53:36 +11:00
Ronnie Sahlberg
f5c0539dc6 Change how NATGW is configured to allow special nodes that do not have
network connectivity outside of the cluster to still be able to
participate in a natgw group.
These nodes can not become natgw master since they lack external network
connectivity.

These nodes are configured just the same way as for any other node with
NATGW, with the following two exceptions :
* we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes.
  since these ndoes lack external network we should not check the interface
  for link.
* we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that
  can not become natgw master.

(This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)
2010-09-08 09:20:16 +10:00
Ronnie Sahlberg
dc2f87737d Dont store temporary runtime data in $CTDB_BASE/state
since that will usually be /etc/ctdb/state and storing this under /etc is just
wrong.

Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead.

(This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1)
2010-09-03 12:43:28 +10:00
Ronnie Sahlberg
c7df27e32d make sure all statd state directories exist before we try to reference them
or else tar and friends will throw an error in the log

(This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5)
2010-09-01 15:49:57 +10:00
Ronnie Sahlberg
8be5bf1567 dont print a lot of log information about shutting down vsftpd
(This used to be ctdb commit 1a41cd7332703629001201eea8ae9b94f1341c9d)
2010-09-01 13:29:38 +10:00
Ronnie Sahlberg
9ef21f1c07 ouch, remove a dummy debug printout that snuck in there somehow
(This used to be ctdb commit 14c4d99513b4bdb94f60c3e9c4823e04b0833e60)
2010-08-30 19:48:41 +10:00
Ronnie Sahlberg
2b4d9170c2 Merge commit 'martins/master'
(This used to be ctdb commit cc8c851e2e0b46f00b18a6dc61fd2774e97850dd)
2010-08-30 18:22:05 +10:00
Ronnie Sahlberg
12cc826231 Remove the dependency on the underlying cluster filesystem for handling
the clusterwide persistent data associated with the lock manager and
statd notifications.

Use persistent databases to store this data instead of a shared directory.

(This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16)
2010-08-30 18:14:41 +10:00
Ronnie Sahlberg
c95f4258d8 Add a new event "ipreallocated"
This is called everytime a reallocation is performed.

    While STARTRECOVERY/RECOVERED events are only called when
    we do ipreallocation as part of a full database/cluster recovery,
    this new event can be used to trigger on when we just do a light
    failover due to a node becomming unhealthy.

    I.e. situations where we do a failover but we do not perform a full
    cluster recovery.

    Use this to trigger for natgw so we select a new natgw master node
    when failover happens and not just when cluster rebuilds happen.

(This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)
2010-08-30 18:09:30 +10:00
Martin Schwenke
a104d1d823 NFS tickles: use addtickle/deltickle instead of shared tickle directory.
This adds a new function update_tickles() that tracks tickles for a
given port using the new ctdb addtickle/deltickle commands.  This
function is used in events.d/60.nfs to handle NFS tickles.

events.d/61.nfstickle is removed.  The
/proc/sys/net/ipv4/tcp_tw_recycle setup is also moved to
events.d/60.nfs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit dca4c4ebf3c35f8db3ae208efb7a83abbf726ed6)
2010-08-26 14:59:59 +10:00
Ronnie Sahlberg
3edec07807 Add a configuration database, implemented as a persistent database.
This database can be used, as an option, to store
the public address assignment instead of editing the /etc/ctdb/public-addresses file manually.

This configuration is stored in one record per key, with a key-name of
public-addresses:node#<pnn>
where <pnn> is the node number.

The content of this record is the same syntax as the /etc/ctdb/public-addresses file.

When ctdbd starts, if this key exist and contains data. It is extracted from the database and compared with the normal file /etc/ctdb/public-addresses.

If the content differs, the config database "wins" and is used to overwrite/update the /etc/ctdb/public-addresses file, after which ctdbd is restarted.

The main benefit with this option is that it can be used to update the public address configuration for nodes that are offline/unreachable by updating their configuration in the persistent database.
Once the offline node is available again, it will resync its databases with the rest of the cluster, find out that the config has changed, apply the changes and restart ctdbd automatically.

The command to store the public address configuration for a node into the persistent database is :

ctdb pstore config.tdb public-addresses:node#<pnn> <filename>

where <pnn> is the node# we wish to update the config for, and <filename> is a file containing the new content for  that nodes public address configuration.

(This used to be ctdb commit 292d7435a360efd7f15a7a99f658a605e07c0a81)
2010-08-25 11:49:56 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Ronnie Sahlberg
729f1ddea0 On RHEL, "service nfs stop;service nfs start" and "service nfs restart"
sometimes (very rarely) fails to restart the service.

    Add a function to restart NFSd on SLES and RHEL-like systems.

    If we detect the system is unhealthy due to kNFSd not running,
    try to restart the service again "service nfs restart" and
    hope for the best.

CQ1019372

(This used to be ctdb commit 25c4ce7e919f13226219f036bcffd2be76b2f06c)
2010-08-19 07:18:22 +10:00
Martin Schwenke
6ce1501aa1 Move NAT gateway firewall rules to recovered|updatenatgw events.
The existing code wasn't working as designed in the start event.  It
should work here.

BZ: 62613
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit aeb70c7e7822854eb87873a5c7783e27e6e72318)
2010-08-18 11:40:07 +09:30
Martin Schwenke
b930c885b3 initscript: wait until we can ping ctdbd before setting tunables.
Currently we do a "sleep 1" after starting and before running
set_ctdb_variables to set the tunables.  This is too arbitrary and
might fail if the system is heavily loaded.  This, for example, could
result in some nodes running with DeterministicIPs and some without,
in which case a different IP allocation algorithm would run depending
on who is the recmaster!

This makes the start function wait until "ctdb ping" succeeds (with 10
second timeout) before trying to run set_ctdb_variables.  If a timeout
occurs then the start function attempts to kill ctdbd before exiting
with a failure.

It also cleans up the status reporting code for Red Hat and SUSE so
that the final status code is reported.  Currently there are cases
where a correct status is prematurely reported before a failure
occurs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cdcd05662a30b51caaeeab4ac44138cac2474e0a)
2010-08-05 15:29:40 +10:00
Martin Schwenke
fe64a8f87a Optimise 61.nfstickle to write the tickles more efficiently.
Currently the file for each IP address is reopened to append the
details of each source socket.

This optimisation puts all the logic into awk, including the matching
of output lines from netstat.  The source sockets for each for each
destination IP are written into an array entry and then each array
entry is written to the corresponding file in a single operation.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 6549e9b01538998d51a5f72bfc569776d232b024)
2010-07-30 16:50:18 +10:00
Stefan Metzmacher
794230775c events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them
metze

(This used to be ctdb commit 1e08d1578d1960fcfc5fdd85492fbd6d194e5e94)
2010-07-30 16:33:27 +10:00
Stefan Metzmacher
7b1345d446 config/interface_modify.sh: do the echo before running the script
metze
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)
2010-07-15 15:06:51 +09:30
Stefan Metzmacher
3b9eeb1049 config/interface_modify.sh: before calling a script check if it exists and is executable
For non bash shells $_s_script might end with '/*'.

We do the workarround this way, because it makes sense to check
that a script is executable, before trying to execute it.

metze

[ This actually applies to any shell -- Rusty Russell ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)
2010-07-15 15:06:39 +09:30
Rusty Russell
34ce8a4f02 config: wrap iptables in flock to avoid concurrancy.
When doing a releaseip event, we do them in parallel for all the separate
IPs.  This creates a problem for iptables, which isn't reentrant, giving
the strange message:
	iptables encountered unknown error "18446744073709551615" while initializing table "filter"

The worst possible symptom of this is that releaseip won't remove the rule
which prevents us listening to clients during releaseip, and the node will be
healthy but non-responsive.

The simple workaround is to flock-wrap iptables.  Better would be to rework
the code so we didn't need to use iptables in these paths.

CQ:S1018353
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)
2010-07-15 10:45:24 +09:30
Ronnie Sahlberg
004b849feb Dont check linkstatus for loopback. This interface never has
issues with the physical layer

(This used to be ctdb commit d938b80a1c409a9ec4b554ddca5b0d949be53d9e)
2010-06-01 14:51:09 +10:00
Ronnie Sahlberg
db9e00eec8 Prevent clients from connecting to the natgw address.
This address is dedicated for outgoing connections.

BZ62613

(This used to be ctdb commit f0e48dd833a4408449083148c172c2136b934e5b)
2010-06-01 12:43:32 +10:00
Ronnie Sahlberg
ad2b7c28b6 Add monitoring of quorum and make the node UNHEALTHY when quarum is lost
(This used to be ctdb commit d58b575e15015c5ef9493ab3ad3e8657c5787e2c)
2010-05-25 12:46:28 +10:00
Ronnie Sahlberg
03b112cb33 in 62.cnfs, lines in /etc/exports can have hte exports quoted,
so strip off any initial " on the exports line

(This used to be ctdb commit dce2244e8ac6617c335cfcd721c3795071b9f2b2)
2010-05-25 12:46:08 +10:00
Michael Adam
b40fa22239 functions: when checking for a directory also check whether it can be accessed.
Thanks to "waKKu" on irc for this improvement.

Michael

(This used to be ctdb commit 81e1483dd0ce2cd091721e456c0c194cc58442f3)
2010-05-11 11:29:45 +02:00
Ronnie Sahlberg
1cb2b0b2d0 Add a new eventscript 62.cnfs to integrate better with gpfs/cnfs
(This used to be ctdb commit 4a679422dc231aa98605b9cc322e4ab442f7bde4)
2010-05-04 13:56:55 +10:00
Ronnie Sahlberg
d6ae1c4173 If the admin makes a configuration mistake and configures NATGW to use the
same ip address as a normal public-address,
check for this in the natgw script and warn the user.

Also prevent ctdb from starting up since this configuration will not work.

BZ60933

(This used to be ctdb commit 480af69b63b9162c85d8e04461ca9e4a083c04a4)
2010-04-28 08:51:06 +10:00
Ronnie Sahlberg
2d9fee4f85 Add a setting where CTDB will monitor and warn for low memory conditions.
CTDB_MONITOR_FREE_MEMORY_WARN

BZ 59747

(This used to be ctdb commit 83446b2e7e28e3ed6627c1950053018b8799984a)
2010-04-23 09:08:38 +10:00
Ronnie Sahlberg
8ef5db522a In the example script to remove all ip addresses after a ctdb crash,
add the NATGW address as one to be removed in addition to the
public addresses.

(This used to be ctdb commit 234b86fb19aae7a43f1dd2c0f69b03164fe5aaca)
2010-04-23 09:08:26 +10:00
Ronnie Sahlberg
4f191982ca add an example script that can be called from crontab to cleanup
and release public ip addresses if ctdbd is no longer running

(This used to be ctdb commit 1cdaaa0a3f53d1b075340a33dfdc42b534e99187)
2010-04-22 14:23:02 +10:00
Ronnie Sahlberg
40434a7c98 add a missing ||
to make the 10.interface script not fail with a syntax error

(This used to be ctdb commit a9831070344a6dcf46c55250f9d74a5870f37dfe)
2010-04-22 14:22:46 +10:00
Martin Schwenke
f765f0ceca Fix a thinko in 2ea0a9f1a93781a0d036feb9fcc0d120b182922f.
If the driver is virtio_net then we assume that the link is up rather
than ignoring the check altogether.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3044d07da2a58260fa06bf489890b279bcf3ec39)
2010-04-20 10:52:31 +10:00
Ralph Wuerthner
d2f7bf804c ethtool does not support virtio_net devices.
Skip link test for this type of devices

Signed-off-by: Ralph Wuerthner <ralph.wuerthner@de.ibm.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2ea0a9f1a93781a0d036feb9fcc0d120b182922f)
2010-04-15 16:38:19 +10:00
Michael Adam
df77489477 events:50.samba: wipe the local part of the serverid db before starting winbind/smnd/nmbd
This is necessary for the new serverid approach.

Michael

(This used to be ctdb commit 8956f32e571093db7f285b83e4dd32960f8afc7c)
2010-03-29 17:05:06 +11:00
Stefan Metzmacher
940e58bf3f config: let 13.per_ip_routing use a flock for generate_auto_link_local()
metze

(This used to be ctdb commit dc2d0d0e559308ad2676f9ad973746c147d65eb9)
2010-03-18 11:57:16 +01:00
Ronnie Sahlberg
d4f7a59960 Merge root@10.1.1.27:/shared/ctdb/ctdb-git
(This used to be ctdb commit e59310132d8126ee3afc191b5db56e80a32986e8)
2010-03-11 18:15:41 +11:00
Wolfgang Mueller-Friedt
e26a26fd7a ctdb_setstatus in /etc/ctdb/functions was not working correctly because it was called with a wrong parameter list
(This used to be ctdb commit e1e285d9f7fa3237dbbacca52a4eb2b264fa5986)
2010-03-11 17:52:42 +11:00
Mathieu Parent
c57c06df8c Fix some more bashisms
(This used to be ctdb commit 3d82ca5b1b8ba2770c739493aa0cdd34bb4827d8)
2010-03-10 17:41:40 +11:00
Mathieu Parent
e7bca0dcfc Correct nice_service()
nice takes a binary as argument and not a function or builtin command

(This used to be ctdb commit e21b40db64b314a24caa2bc611cb48b93decb5aa)
2010-03-10 17:39:56 +11:00
Michael Adam
ff48fc3933 fix bug #7152: check NFS-Shares, fails with to long path-names
Thanks to Thomas Sesselmann <t.sesselmann@dkfz.de> .

Michael

(This used to be ctdb commit da5fc07baa9aa806c3cba52c00fb10cf8b7f2dc5)
2010-02-23 21:08:23 +11:00
Stefan Metzmacher
e44c2396a7 config/13.per_ip_routing: fix typo in error message
metze

(This used to be ctdb commit 4b06665b77cb24d488f4ef03cc9ad5fd5d0feb0e)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
d79a70bca3 config/13.per_ip_routing: use better names for release_script and setup_script
As the basename of the script will be used for the readd script
from setup_iface_ip_readd_script, it's know easier to identify
what script is called by delete_ip_from_iface() while readding
ips to the interface.

metze

(This used to be ctdb commit 3ee225b0b6ed37c22478bd145ced56b1b9b86842)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
08d69d2cec config/13.per_ip_routing: register the setup script with setup_iface_ip_readd_script()
This is needed because we need to resetup the routing table when
the delete_ip_from_iface() function readds the ip to the interface.

metze

(This used to be ctdb commit ea87185ec9977006ef72d5a68c875154e4c84099)
2010-02-23 10:38:50 +01:00
Stefan Metzmacher
3a0d830e4c config/13.per_ip_routing: add a setup_per_ip_routing() function
This combines the logic into a shell function which can be used by the
"takeip" and "updateip" hooks.

We check the return values of the "ip" commands now
instead of ignoring them.

We now create a setup_script.sh similar to the release_script.sh
which makes it easier to analyze problems.

metze

(This used to be ctdb commit 624e8878851b4957cc7c02e922ec86926d6927ee)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
3419e9c4dd server: add "setup" event
This is needed because the "init" event can't use 'ctdb' commands.

metze

(This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
061c2a7182 config/10.interface: use delete_ip_from_iface also in the "init" event
metze

(This used to be ctdb commit e2bc5c25116747c58505fe1cb3e2d164257377d1)
2010-02-23 10:38:49 +01:00
Stefan Metzmacher
90769bf4eb config/11.natgw: use delete_ip_from_iface() instead of remove_ip()
This also initializes the variables correctly for the
shutdown|removenatgw code path to delete_all.

metze

(This used to be ctdb commit 2c2cbed4fcbc868a990fa6b32fc96126ffc61bb5)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d71c40cad7 config: make remove_ip() a wrapper of delete_ip_from_iface()
metze

(This used to be ctdb commit e66d6636b80e3614f183366ec92fc3c6d5c323da)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
3bd1910428 config: interface_modify states in a $CTDB_BASE/state/interface_modify directory
metze

(This used to be ctdb commit 756c8b953fef7132dae74b5b244baeb3108dec54)
2010-02-23 10:38:48 +01:00
Stefan Metzmacher
d8ab328ee1 config: add setup_iface_ip_readd_script() helper function
This adds a generic infrastructure to register scripts which will
be called when the delete_ip_from_iface() funtion needs to readd
secondary ips to an interface.

metze

(This used to be ctdb commit ac97d65f44e8dc8bf2ec8f68e4db3448521755a2)
2010-02-23 10:38:47 +01:00
Stefan Metzmacher
feebd033eb config: readd ips with a broadcast address in delete_ip_from_iface()
metze

(This used to be ctdb commit e7a6f64cf5bce5abdc47f5db96b286c5a8d66aff)
2010-02-23 10:38:47 +01:00
Ronnie Sahlberg
af79d2c08b Make sure that the natgw eventscript also triggers on the "stopped" event
to remove the natgw configuration and ip assignments used.

BZ61036

(This used to be ctdb commit 344b1f95b126ecabeb4576330038b08bf88e8cb8)
2010-02-23 10:16:17 +11:00
Ronnie Sahlberg
6091dce975 From Sumit Bose <sbose@redhat.com>
Fixes for init script to meet guidelines

(This used to be ctdb commit 9f484404030211df85a215fd2280568a2ec020fb)
2010-02-22 14:06:52 +11:00
Ronnie Sahlberg
5439401dd2 try to restart rpc-rquotad if it is not running
bz60317

(This used to be ctdb commit 2263cd74d511247debadd0f6602bc6396b46ac5e)
2010-02-16 11:02:37 +11:00
Ronnie Sahlberg
70c1e39e64 Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.

For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.

(This used to be ctdb commit db87849bf3380914a63a626412bec209dbea7d20)
2010-02-16 11:01:39 +11:00
Ronnie Sahlberg
64111bb02b Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.

bz59626

(This used to be ctdb commit b0bf3794753c5bb898295b5109707953cc3dcec5)
2010-02-16 10:59:53 +11:00
Martin Schwenke
d25ab9eca0 Merge commit 'origin/master'
(This used to be ctdb commit 19523fbb12db1ec1e5ee38de1b2d3b99a74c6ca4)
2010-02-10 20:24:28 +11:00
Rusty Russell
34b8b98078 event scripts: add logging for low memory conditions
We should never enter swap; if we do, show the memory state of the machine and the process list.  This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 627a6d67a0e9e61f8713e62695b3518c51909230)
2010-02-09 12:46:35 +10:30
Martin Schwenke
56b178e1a2 eventscripts: stop loadconfig function from loading ctdb config file twice.
If "$1" was empty than loadconfig would load the ctdb config twice.
This stops that from happening.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0406d406da70aaee7ad6aac236114905c5d03ed2)
2010-01-22 17:19:12 +11:00
Martin Schwenke
407a8f7205 eventscript: Use of $NFS_TICKLE_SHARED_DIRECTORY must be after loadconfig.
Proper fix for 085d1bea78fabf754ef6dd6d323f74a1d361e45c's workaround.
$NFS_TICKLE_SHARED_DIRECTORY was being used before it is set via
loadconfig.

Ronnie actually spotted this one.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ee8b2e298351d05197a2e1494f3331433644c1e6)
2010-01-22 17:14:50 +11:00
Martin Schwenke
02e68340e8 initscript: Remove bash-ism.
Also, change the order of the comparison so it is consistent with
others in the script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44696e15cdb23e7656d3bb0ead54f509495738a7)
2010-01-22 17:13:17 +11:00
Martin Schwenke
d6b0578cfb initscript: handle spaces in option values inserted into $CTDB_OPTIONS.
This puts single quotes around everything and uses eval on the
command-lines that actually start ctdbd.  The eval causes the single
quotes to be interpreted.

The "redhat" init style no longer uses the Red Hat daemon function.
It loses the quoting and re-splits on spaces.  Instead we add an extra
line that uses the success/failure functions to keep things pretty.
Note that this means that we don't respect daemon's
$DAEMON_COREFILE_LIMIT variable but we do our own core file handling
with $CTDB_SUPPRESS_COREFILE anyway.  daemon's core file handling was
probably overriding what we were doing anyway, so this can be regarded
as a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 522fbb012524fe41a67dbe43589a282dda6bcbe2)
2010-01-22 15:34:21 +11:00
Stefan Metzmacher
12c8dd215c config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
This is very useful for testing, I use such a script:

cat ~/bin/ethtool
 #!/bin/sh

 IFACE=$1

 case "$IFACE" in
        Neth2)
                ;;
        Neth3)
                ;;
        Neth4)
                ;;
        Neth5)
                ;;
        *)
                exec /usr/sbin/ethtool $@
                ;;
 esac

 ip link set down $IFACE

 exec /usr/sbin/ethtool $@

metze

(This used to be ctdb commit 3bab985cf615720eded4d47b4f9f37a9c28840aa)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
ea5843075c events: add updateip event to 13.per_ip_routing
metze

(This used to be ctdb commit 829150e814a5e6c85d0f21421f46f41e81d74c53)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
6a818e66ae events: 10.interface handle updateip event
metze

(This used to be ctdb commit a5cdf1277387f8c6292153c37fa9ceb64707d04f)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
50bff8c886 config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.

metze

(This used to be ctdb commit d054eb33c6ae92560cddb40732e5dcf622591a3c)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
5d2c3ef656 config: 10.interfaces call monitor_interfaces on startup
metze

(This used to be ctdb commit 615dec051c26aac628f120e96bf12fb39fc6d28a)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
94e7101070 config: 10.interfaces call ctdb ifaces and ctdb setifacelink for monitoring
metze

(This used to be ctdb commit c465f63585c419ba59a6b04cbbf78ae615a7259d)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9c89dd9210 events: splitout a monitor_interfaces function in 10.interface
metze

(This used to be ctdb commit b5ba56dea57db97d6c6ba3e7582e74fe0e3041fc)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
9a43f5e42b events: 10.interfaces allow multiple interfaces per public address
metze

(This used to be ctdb commit f9837f8b6f887d28f29aeb3eeffe8cfb423b40b4)
2010-01-20 11:10:58 +01:00
Stefan Metzmacher
628ac65709 config: add 13.per_ip_routing event script
With this script it's possible to generate routing tables
per public ip address.

metze

(This used to be ctdb commit ff5678fbec2daef461143acf00cef3f94d7655fc)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
2ecf8053f9 config: add some ipv4 helper shell functions
Many thanks to Michael Adam <obnox@samba.org>
for the basic work.

metze

(This used to be ctdb commit ff9c641763702ae99632bbf4d0825d578440c074)
2010-01-20 11:10:57 +01:00
Stefan Metzmacher
4493ba6ffa config: add interface_modify.sh and call it under flock to make modification on interfaces atomic
When two releaseip events run in parallel it's possible that the 2nd script
readds a secondary ip that was removed by the 1st script.

metze

(This used to be ctdb commit e02417b2a55c45ac2c125b1b3463c9c39e7bc07a)
2010-01-20 11:10:48 +01:00
Stefan Metzmacher
c251ac20fa events/10.interfaces: move some parts to helper functions
metze

(This used to be ctdb commit 24cd42769d8f32b90a8876a6a08a36ab23076cd1)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
d01870f138 config/functions: add tickle_tcp_connections()
metze

(This used to be ctdb commit 2397f13d7b5ca3847ef148187c6b179d06f6a47a)
2010-01-20 09:44:37 +01:00
Stefan Metzmacher
fd06167caa server: add "init" event
This is needed because the "startup" event runs after the initial recovery,
but we need to do some actions before the initial recovery.

metze

(This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)
2010-01-20 09:44:36 +01:00
Stefan Metzmacher
9cba540514 lib/util: import fault/backtrace handling from samba.
metze

(This used to be ctdb commit 8171d66f0061fe23ed6dfef87ffe63bfc19596eb)
2010-01-20 09:44:36 +01:00
Ronnie Sahlberg
21e5b44673 source the nfs sysconfig file from the 61.nfstickles script
(This used to be ctdb commit 085d1bea78fabf754ef6dd6d323f74a1d361e45c)
2010-01-20 10:35:02 +11:00
Ronnie Sahlberg
a1d60b1511 Make the size of the in memory ringbuffer for keeping the recent log messages
configureable using --log-ringbuf-size=<num-entries>.

Add an entry in the sysconfig file to set this persistently.

(This used to be ctdb commit c79c2da69bc352f509e7fca4b9172a4b7f23c0f8)
2010-01-15 15:38:56 +11:00
Martin Schwenke
b65a44a4ec Revert "Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state."
This reverts commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit b70021856e76df1ba407c83cfc19bf332fbfc869)
2010-01-12 21:02:44 +11:00
Martin Schwenke
96066d8816 Revert "events/50.samba: only use wbinfo --ping-dc if available"
This reverts commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b.

wbinfo --ping-dc is proving too unreliable.

(This used to be ctdb commit 178f429a7b6d1008d35e857b6ca1df6adb60d255)
2010-01-12 21:02:11 +11:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
96977cc5c4 config: add CTDB_MAX_PERSISTENT_CHECK_ERRORS option
metze

(This used to be ctdb commit fc5f556d488488040303438aefecb5ae2a8e54bc)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c735f03d4 config: try to use tdbtool <tdb> check instead of tdbdump for persistent db checks
metze

(This used to be ctdb commit 52e6d81f4d8a4035272d9256d01bafb8ed593027)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c907f4965 config: load 'ctdb' config before 'nfs' config in statd-callout
All other scripts do 'loadconfig ctdb' before any other 'loadconfig foo'
call. I think we should do the same in statd-callout.

Otherwise it's very confusing, if you have configured some Options
in /etc/sysconfig/ctdb, but /etc/ctdb/statd-callout doesn't notice
them.

metze

(This used to be ctdb commit 10d95581fb90bfdf58ec32345c4e36c27acf4f37)
2009-12-16 08:03:55 +01:00
Ronnie Sahlberg
50820f9e18 Bond devices can have any name the user configures, so
when checking link status for an interface, first
check if this interface is in fact a bond device
(by the precense of a /proc/net/bonding/IFACE file)
and use that file for checking status.

Othervise assume ib* is an infiniband interface which we donnt know how
to check, or otherwise it is an ethernet interface and ethtool should
hopefully work.

(This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)
2009-12-09 11:33:04 +11:00
Ronnie Sahlberg
3ca3f4c771 make sure to also check that interfaces used for NATGW are ok
and have a link.
if not the node should become unhealthy

(This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)
2009-12-09 11:13:29 +11:00
Stefan Metzmacher
af170d1a8a events/50.samba: only use wbinfo --ping-dc if available
metze

(This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)
2009-12-08 07:38:00 +11:00
Ronnie Sahlberg
cdabe16777 Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state.
(This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)
2009-12-07 18:27:46 +11:00
Martin Schwenke
b17bf38c64 Eventscripts: Fix syntax error in 00.ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)
2009-12-01 18:08:57 +11:00
Martin Schwenke
50a26cf75e Eventscripts: Remove executable bit accidently set on some scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)
2009-12-01 17:54:45 +11:00