1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

3416 Commits

Author SHA1 Message Date
Ronnie Sahlberg
ea0df6d882 Revert scheduling back to use real-time processes
Revert this patch:
commit 482c302d46e2162d0cf552f8456bc49573ae729d

We may need to use real-time processes for the main daemon and the recovery daemon to handle the cases where systems come under very high loads.

(This used to be ctdb commit 08bef9dcab6e4da15fc783f8624e5ed09aa060b5)
2011-01-11 07:40:35 +11:00
Ronnie Sahlberg
7e747aab8d 60.nfs Check if we have rpc.statd and if not, skip checking for statd
availability at all (since we cant restart it, there is not point checking
if it is alive)

(This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)
2011-01-06 15:49:15 +11:00
Ronnie Sahlberg
ded7c23122 41.HTTPD
Httpd can be very slow to start on some platforms,
wait 5 monitor intervals before we try to restart it if
it has not bound to port 80 yet.
After 10 failed intervals, flag the node as unhealthy.

(This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)
2010-12-22 10:31:41 +11:00
Ronnie Sahlberg
e9ff38be7d 60.nfs
Try to restart LOCKD after 10 failures and
flag the node as unhealthy after 15 failures

(This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)
2010-12-22 10:31:31 +11:00
Ronnie Sahlberg
57e74f6d8a Dont run net serverid wipe in the background
(This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)
2010-12-22 10:31:26 +11:00
Ronnie Sahlberg
97a6eccaf7 50.samba
Net serverid wipe can take a bit of time sometimes so background it.

Only perform auto start/stop of the managed service on the monitor event

(This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)
2010-12-14 21:19:28 +11:00
Ronnie Sahlberg
99d7e39efc ctdb addip:
After finishing "ctdb addip"  wait for an implicit "iptakeover" to complete
the assignment to a node.

This makes it more wasteful and timeconsuming when adding multiple ips
at once, or the same ip to multiple nodes,
but makes it easier to script the use of this command.

(This used to be ctdb commit d86cbf3d7d426c558d110d67dc985634c754a522)
2010-12-13 14:24:30 +11:00
Ronnie Sahlberg
1e41ab5fa3 LVS
update lvs configuration on ipreallocated events too

(This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)
2010-12-13 14:24:16 +11:00
Ronnie Sahlberg
a9a6ae064d When assigning the single-public-ip during startup,
flag the interface as initially being "link ok"
so that we can add it and startup.

The eventscript can later drop the flag if required

(This used to be ctdb commit 720849b756c825fb8b285f09972a8c39f1888a99)
2010-12-13 14:24:04 +11:00
Ronnie Sahlberg
220c5371c7 Revert "server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag"
This reverts commit 17e231abf5ade83d7fa624b5cf54ae876e2795aa.

(This used to be ctdb commit 23f81ba39ee7cd8a7360f4602b3eb264eb221552)
2010-12-13 14:23:48 +11:00
Ronnie Sahlberg
dff88a8a6a Revert "Add a new header flag for "migrated with data" and set this to 1"
This reverts commit a8cc35191df1cd4b866897df71d317ce5f198cb5.

(This used to be ctdb commit 7c37435fb517a621c45b21a21b4eb15f8bbd3c83)
2010-12-13 14:23:32 +11:00
Ronnie Sahlberg
f815237f8f libctdb
fix a compile problem after renaming a structure field

(This used to be ctdb commit f44c02f45dbc13e3cc2e89ee1c96bd0d57042fcc)
2010-12-10 14:19:53 +11:00
Ronnie Sahlberg
e42fc0d4f5 LibCTDB
Add an input queue where we keep received pdus we have not yet processed
This allows us to perform SYNC calls from an ASYNC callback

(This used to be ctdb commit c111e98d3ad7bd3d09f4081e9bb1443d3722672f)
2010-12-10 13:42:25 +11:00
Ronnie Sahlberg
c26c6a01cf only run "serverid wipe" if we are actually running samba.
we dont need to run this on systems where we do run winbind but not samba

(This used to be ctdb commit fcb9e8d1e1c78439ea42adb8b05ad84fbca7f724)
2010-12-10 13:42:12 +11:00
Rusty Russell
4b9e5fbe46 idtree: fix overflow for v. large ids on allocation and removal
(Imported from SAMBA commit 09a6538969ac).

Chris Cowan tracked down a SEGV in sub_alloc: idp->level can actually
be equal to 7 (MAX_LEVEL) there, as it can be in sub_remove.

(We unfairly blamed a shift of a signed var for this crash in commit
 2db1987f5a).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 73764104356d3738d9d20a9d06ce51535f74f475)
2010-12-07 15:37:25 +11:00
Ronnie Sahlberg
8e53df6f41 Add a new header flag for "migrated with data" and set this to 1
when we migrate a non-empty record onto the node
or a non-empty record off the node

When we migrate a record back to the lmaster and yield the dmaster role,
inspect this flag if if it is still not set, we can delete the record from
the local database as soon as we have migrated it back to the lmaster.

(This used to be ctdb commit a8cc35191df1cd4b866897df71d317ce5f198cb5)
2010-12-07 15:33:41 +11:00
Ronnie Sahlberg
a75bf138ab add new command line functions
ctdb readkey <dbid> <key>
ctdb writekey <dbid> <key> <value>

these are mainly intended for debugging of databases and dmaster migration issues

(This used to be ctdb commit 70c2e7dd04727371590fb94579ffd20318fbeb58)
2010-12-07 15:33:08 +11:00
Ronnie Sahlberg
c69ada0090 add a new ctdb_ltdb function to delete a record in a normal database
(This used to be ctdb commit fe9070ec9be69e6a6fcbf9899e7ced24541c9c3a)
2010-12-07 15:32:30 +11:00
Michael Adam
6f77811cb1 server: when we migrate off a record with data, set the MIGRATED_WITH_DATA flag
(This used to be ctdb commit 17e231abf5ade83d7fa624b5cf54ae876e2795aa)
2010-12-07 15:31:57 +11:00
Ronnie Sahlberg
1df9ed58b0 Add 60.ganesha to what gets installed by make install as well as by the RPM
(This used to be ctdb commit 8a6da384f3fa08b1c5eba79d6febc7af7b3d9229)
2010-12-06 11:50:18 +11:00
Ronnie Sahlberg
8147d29598 add a missing part of the import of the previous ganesha patch
(This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)
2010-12-06 11:50:15 +11:00
Chandra Seetharaman
5e485d5ca0 make changes to ctdb event scripts to support NFS-Ganesha.
make changes to ctdb event scripts to support NFS-Ganesha.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>

(This used to be ctdb commit 7298588ed54492f106954c893dd86b0a36783470)
2010-12-06 11:50:12 +11:00
Ronnie Sahlberg
c2c53db49d during ip allocation, there are failure modes where a node might hold a ip address
but thinks it is still unassigned (-1).

add code to the recovery daemon to detect this case and trigger a reallocation
so that the ip gets covered

and change the takeip code to allow for this condition, taking on an ip address that is
already hosted.

cq s1021073

(This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
2010-12-03 13:30:39 +11:00
Ronnie Sahlberg
8959c8e850 dont try starting samba through the "init" event
(This used to be ctdb commit e314a449606418a4c4eac6eb319bfcdf1c398cd3)
2010-12-03 11:40:38 +11:00
Ronnie Sahlberg
6ed0009125 When we are no longer the natgw master, dont put the natgw ip on loopback.
We put the ip on loopback just to make sure we would still interoperate with
non-standard configurations on unix-KDC, that are configured to verify the optional
HostAddresses field.
This is not required for AD, since AD does not use this field, and is replaced in
unix land with other/better mechanisms than this "dodgy" check.

This makes it "easier" for applications that have bound to the natgw address
to detect a socket problem and try to reconnect/recover if the ip address
is completely missing from the system.

At the same time, use the winbind specific hook that exists to explicitely tell winbindd : this address is gone, so if you have bound to it, this is a good time to close and rebind your socket.

cq 1020333

(This used to be ctdb commit 0da94869d2912b2a412ba3fbd2137d88ce4e4389)
2010-11-29 12:45:59 +11:00
Ronnie Sahlberg
ebcc866ae0 update autostart/stop to work for samba
(This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)
2010-11-22 20:42:26 +11:00
Ronnie Sahlberg
a3e7dfadca add an explicit _is_managed_service to iscsi eventscript
(This used to be ctdb commit 44f683a1ba15944d3306a0effd572de3280ff975)
2010-11-18 14:15:56 +11:00
Ronnie Sahlberg
193d9d50d1 Dont pollute the logs with a "file not found" message
CQ S1020745

(This used to be ctdb commit ea8bb7b26bb879a895c267d49672433182390d0d)
2010-11-18 13:54:15 +11:00
Martin Schwenke
c00db6f271 60.nfs eventscript should do nothing if NFS isn't managed by CTDB.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 582e5cd077501e8d4131a9c7981781471308edfd)
2010-11-18 13:36:40 +11:00
Martin Schwenke
a2af87482b Eventscript functions - catch failures in ctdb_service_start().
ctdb_service_start() currently succeeds if ctdb_counter_init()
succeeds.

This changes it to fail when a service start fails.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)
2010-11-18 12:15:05 +11:00
Martin Schwenke
3ab768e8d4 50.samba eventscript should stop/start services when they become (un)managed.
When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or
corresponding changes are made to $CTDB_MANAGED_VERSIONS), the
associated service should be started or stopped as necessary.

This add calls to ctdb_start_stop_service() to manage
starting/stopping samba and winbind.

An associated cleanup is made to the initial checks that one of
$CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them
with calls to is_ctdb_managed_service().

To handle the winbind cases ctdb_start_stop_service() and
is_ctdb_managed_service() are updated to take an optional service name
parameter.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)
2010-11-18 12:12:30 +11:00
Ronnie Sahlberg
4fe85e5be5 add a new support function ctdb_check_counter_equal()
update nfs to try to restart the service after 10 consecutive failures
and to flag the node unhealthy after 15

add similar function to mountd

(This used to be ctdb commit 1569a54bb82fc433895ed68f816cf48399ad9d40)
2010-11-17 13:54:57 +11:00
Martin Schwenke
8fe1ec3754 Eventscripts: make loadconfig() function hookable by the test suite.
Rename loadconfig() to _loadconfig().  Add a new loadconfig() that
simply calls _loadconfig().

This makes it easy for the test suite to override loadconfig().

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1d77a3adfff893b3c01b87f791e72c0d3148425c)
2010-11-17 11:46:48 +11:00
Martin Schwenke
e23ca7dba5 Make a time comparison in 60.nfs eventscript more readable.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 26077e6c8eb126584af587e7416154ea4858aea2)
2010-11-17 11:44:26 +11:00
Martin Schwenke
6ab5ae2c9b 60.nfs only fails or warns after 10 consecutive nfsd/statd failures.
These failures are sometimes the result of slow restarts so we want to
avoid dirtying the logs or marking a node unhealthy because of them,
unless they are excessive.

For these 2 cases we use the existing fail counting code but hack a
temporary service_name in a subshell to allow separate fail counts.

We also update ctdb_check_rpc() so that it captures the error output
from rpcinfo and we add a message including the service name to the
beginning.  The error is printed to stdout but is also stored in
ctdb_check_rpc_out to allow it to be conditionally used by the caller.
This function also now returns non-zero rather than exiting on
failure.

Other direct rpcinfo calls are relaced by called to ctdb_check_rpc()
for consistency.

Option handling code for service restarts is cleaned up so that fits
in 80 columns.  A more informative restart messageis now used in all
cases, printing the exact command being used to start a service.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 79c25fe241cf5d8f92e23d3736823ebaf4e1769d)
2010-11-17 11:43:09 +11:00
Martin Schwenke
44d66153af Test suite: fix typo in ctdb ping test grep pattern.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ad18bfa398e582474afe25340368e39d4e74e3c6)
2010-11-17 11:39:51 +11:00
Martin Schwenke
0ff7924d15 Test suite: match changed output for ctdb ping to disconnected node.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a9f5ae2a548e1096c086888adc886cb604d372fa)
2010-11-17 11:38:45 +11:00
Martin Schwenke
bcb6e726b6 Test suite: make statistics test cope with changes to statistics output.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9e88466a99b5245d5f0ebab553be8d2b9b9a58e2)
2010-11-17 11:35:05 +11:00
Ronnie Sahlberg
714901e103 initialize the statistics to the current time, not start of epoch
this makes "ctdb statistics" show correct "start of starts collection"

(This used to be ctdb commit 7303058616fdb1d7f58cce2349c034e9f611275e)
2010-11-15 16:44:21 +11:00
Ronnie Sahlberg
dbcf0de18c Dont exit the update ip function if the old and new interfaces are the same
since if they are the same for whatever reason this triggers the system
to go into an infinite loop and is unrobust

The scriptds have been changed instead to be able to cope with this
situation for enhanced robustness

During takeover_run and when merging all ip allocations across the cluster
try to kepe track of when and which node currently hosts an ip address
so that we avoid extra ip failovers between nodes

(This used to be ctdb commit cf778b5aaf6356401e3985acccc7df9e08ab6930)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
34958c284b change the takeover script timeout to 9 seconds from 5
(This used to be ctdb commit cd09c3f8fd9700261f77779aee9cf71dbd4e441e)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
7e29fd6093 Dont check remote ip allocation if public ip mgmt is disabled
(This used to be ctdb commit 441ad00af842a8b7b5291de60d8ab08a064f5327)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
055eafb790 this stuff is just so fragile that it will enter infinite recovery and fail loops
on any kind of tiny unexpected error

unconditionally try to remove ip addresses from both old and new interface
before trying to add it to the new interface to make it less
fragile

(This used to be ctdb commit 80acca2c91c9053c799365bae918db7ed8bdc56f)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
ebed26d755 delete from old interface before adding to new interface
this stops the script from failing with an error if
both interfaces are specified as the same, which otherwise breaks and leads to an infinite recovery loop

(This used to be ctdb commit 565de03a784ed441490f8cd0b137b5cec8716d55)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
83e68b62dd delay loading the public ip address file until after we have started the transport and discovered ouw own pnn number
(This used to be ctdb commit 1b57fc866fc836b5dbd3ef7b646e5a0f4280e81e)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
6fa8e1fddb when we load the public address file, at the same time check if we are already hosting the public address, if so, set ourselves up as the pnn for that address
(This used to be ctdb commit 0f2a2dac91a61be188c3578c8bb89d47cbf9a0f8)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
a6ed66dfd0 dont check the public ip assignment or if even we are hosting them and shouldnt
when public ips have been disabled

(This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5f76f3c0e2 Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.

(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
93058e7976 change the default for how long to waqit before dropping all ips to 120 seconds
(This used to be ctdb commit e5f03346133157734b4759d43c3ab8203028d5c2)
2010-11-10 14:55:23 +11:00
Ronnie Sahlberg
76578b9533 dont delete all ips from the system during the initial "init" event
leave any ips as they are and let the recovery daemon remove them as required

(This used to be ctdb commit 8ab311719857847b4cf327507b0af1793551e73c)
2010-11-10 14:55:23 +11:00