1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00
Commit Graph

227 Commits

Author SHA1 Message Date
Stefan Metzmacher
3aa5c979f3 recoverd: try to become the recovery master if we have the capability, but the current master doesn't
metze
(cherry picked from commit 6ba8af28f8a8f79db65120a97d7157dcc5c7e083)

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit ccd67cf7f26713e695000d89d9ce8cfa78bfe00f)
2011-11-29 10:28:52 +01:00
Ronnie Sahlberg
b18a22b820 This breaks the build since the recovery loop is different in master
compared to old 1.0 branches
This must have been mistakenly applied to master when you intended to push
for a different branch i guess.

Revert "recoverd: try to become the recovery master if we have the capability, but the current master doesn't"

This reverts commit a97d417aba85e901540147a4dff4794249442939.

(This used to be ctdb commit c19cb751077b78cf4b6e28a1e3746d4ffedbfd68)
2011-11-29 14:38:02 +11:00
Stefan Metzmacher
b02b55bd12 recoverd: try to become the recovery master if we have the capability, but the current master doesn't
metze

(This used to be ctdb commit a97d417aba85e901540147a4dff4794249442939)
2011-11-26 23:47:00 +01:00
Stefan Metzmacher
7a962685d3 recoverd: let async_getcap_callback() also update ctdb->capabilities
metze

(This used to be ctdb commit ef5b47d1183ee99c39ae63045a994d35255ac829)
2011-11-26 23:30:33 +01:00
Martin Schwenke
02612ea2bc Clean up warnings: remove changed_flags in monitor_helper
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3e4fa518f02db75e4e4a7f326a71df226913f8a8)
2011-11-09 14:45:01 +11:00
Ronnie Sahlberg
0dc5584101 Merge branch 'master-readonly-records' into foo
Conflicts:

	Makefile.in
	tools/ctdb.c

(This used to be ctdb commit 0fedef0ffba4178126eee9544c5e2db52f5db893)
2011-09-12 09:34:34 +10:00
David Disseldorp
5296da5609 client: add timeout argument to ctdb_attach
Rather than using a fixed 2 second CTDB_CONTROL_GETDBPATH timeout.

(This used to be ctdb commit 9e178671560cb95121e11d718a76b05380ecd6c5)
2011-09-06 13:57:04 +02:00
Ronnie Sahlberg
63dc96cdb2 ReadOnly: Change the ctdb_db structure to keep a uint8_t for flags instead of a boolean for
the persistent flag.
This is the same size as the original boolean but allows ut to add additional flags for the database

(This used to be ctdb commit 7462761638d25880ad46024ad4ef21667eb99a98)
2011-09-01 10:21:55 +10:00
Ronnie Sahlberg
10caf186e1 remove log message we dont need
S1026492

(This used to be ctdb commit c5f6e44b92210519d4bfc24611cae3f9978cc2e5)
2011-08-04 13:49:57 +10:00
Ronnie Sahlberg
ae35e9e5b2 Cleanup of logging messages/spamming
Reduce an infomational message about not performing ip reallocation
from NOTICE(the default) to INFO.
These messages are normal during startup or when stopped/banned when
we will be in recovery mode for a while.

Remove a messager in the loop waiting for initial startup to complete about
the generation being invalid. It is always invalid at this stage before we have
finished initial recovery.

Rate-limit the informational messages for CTDB_WAIT_UNTIL_RECOVERED
so that we only print them once per second for the first 60 seconds and after that only once per 10 minutes.
These messages are normal during startup, but we should not be logging them every second for cases where we will remain in recovery mode during startup for an extended period of time.
Such as if suspended or permabanned.

CQ S1023302

(This used to be ctdb commit 3a0af8780dc595acbed880f288fcbc4f62c862fb)
2011-05-04 10:42:32 +10:00
Michael Adam
2ad1c3f6c7 server: in the VACUUM_FETCH handler, add the VACUUM_MIGRAION to the call flags
This way, the records coming in via this handler, can be treated appropriately.
Namely, they can be deleted instead of being stored when the meet the fast-path
vacuuming criteria (empty, never migrated with data...)

(This used to be ctdb commit fb5d832104970320359b3e474eb291ca3d629380)
2011-03-14 13:35:44 +01:00
Michael Adam
89f27f9424 recoverd: in a recovery, set the MIGRATED_WITH_DATA flag on all records
Those records that are kept after recovery, are non-empty, and
stored identically on all nodes. So this is as if they had been
migrated with data.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 101be642e492a3a54231e2e3e6553a59380fe702)
2011-03-14 13:35:43 +01:00
Ronnie Sahlberg
49a30783d3 If/when the recovery daemon terminates unexpectedly, try to restart it again from the main daemon instead of just shutting down the main deamon too.
While it does not address the reason for recovery daemon shutting down, it reduces the impact of such issues and makes the system more robust.

(This used to be ctdb commit 0566ef3d6cef809bda204877c493c80ff9eb2c40)
2011-03-01 12:13:58 +11:00
Ronnie Sahlberg
d236c970d0 recoverd: avoid triggering a full recovery if just some ip allocation
has failed.
We dont need to rebuild the databases in this situation, we just
need to try again to sort out the ip address allocations.

(This used to be ctdb commit 044c398ffea23d36ee033c8ddf07d11028197346)
2011-01-11 07:40:49 +11:00
Ronnie Sahlberg
c4006ce844 Add ctdb_fork(0 which will fork a child process and drop the real-time
scheduler for the child.

Use ctdb_fork() from callers where we dont want the child to be running
at real-time privilege.

(This used to be ctdb commit 58795a4c9e0624e20fa3e0023b65127053edd103)
2011-01-11 07:40:41 +11:00
Ronnie Sahlberg
c2c53db49d during ip allocation, there are failure modes where a node might hold a ip address
but thinks it is still unassigned (-1).

add code to the recovery daemon to detect this case and trigger a reallocation
so that the ip gets covered

and change the takeip code to allow for this condition, taking on an ip address that is
already hosted.

cq s1021073

(This used to be ctdb commit 9020baf27cab7821c9094cda185206fb7af0fee7)
2010-12-03 13:30:39 +11:00
Ronnie Sahlberg
7e29fd6093 Dont check remote ip allocation if public ip mgmt is disabled
(This used to be ctdb commit 441ad00af842a8b7b5291de60d8ab08a064f5327)
2010-11-10 14:55:25 +11:00
Ronnie Sahlberg
a6ed66dfd0 dont check the public ip assignment or if even we are hosting them and shouldnt
when public ips have been disabled

(This used to be ctdb commit 7d07a74dc7f907ac757d20626f68e257d7ba16be)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
5f76f3c0e2 Add a new tunable : DisableIPFailover that when set to non 0
will stopp any ip reallocations at all from happening.

(This used to be ctdb commit d8d37493478a26c5f1809a5f3df89ffd6e149281)
2010-11-10 14:55:24 +11:00
Ronnie Sahlberg
107d020cfa update/improve the log message related to rerecovery timeouts
(This used to be ctdb commit 8b4d1df3abcae03cf7a339d8390c816682a43019)
2010-09-28 08:47:12 +10:00
Stefan Metzmacher
5e46150490 server/recoverd: if we can't get the recovery lock, ban ourself
metze

(This used to be ctdb commit 80b8889267339b870868841ff077e850bc5b52e2)
2010-09-14 15:49:01 +10:00
Stefan Metzmacher
ff77985f38 server/recoverd: do takeover_run after verifying the reclock file
metze

(This used to be ctdb commit 93df096773c89f21f77b3bcf9aa90bf28881b852)
2010-09-14 15:48:37 +10:00
Ronnie Sahlberg
2e8aac6689 Merge commit 'rusty/ports-from-1.0.112' into foo
(This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)
2010-08-19 13:17:56 +10:00
Rusty Russell
9fbb191b78 logging: give a unique logging name to each forked child.
This means we can distinguish which child is logging, esp. via syslog where we have no pid.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 68b3761a0874429b90731741f0531f76dcfbb081)
2010-08-18 11:46:32 +09:30
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Rusty Russell
8f8959a145 speed startup: with --sloppy-start, cut initial election timeout to 1/2 second.
Seconds between ctdbd first log message and node healthy:
BEFORE:	4.03
AFTER: 2.02

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8f17731dea4287d4f9b21dc58c1bdf26c8a0e628)
2010-06-22 22:55:20 +09:30
Rusty Russell
fabeea6197 speed startup: don't wait a full recovery interval if we've already waited
We currently sleep for one second, whether or not we've already slept.
Change this to sleep for the remainder of the second, if at all.

Seconds between ctdbd first log message and node healthy:
BEFORE:	18.09
AFTER: 17.08

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fde760b5f39c77172308a583da4c2443b71541c9)
2010-06-22 22:50:35 +09:30
Rusty Russell
f7efc1f8e8 speed startup: alter recovery loop
We do a recovery on startup.  But the code does:
   Sleep for ctdb->tunable.recover_interval.
   Check for recovery.
   
We want to do it in the other order.  This is best done by extracting
the loop into a separate "main_loop" function.

Seconds between ctdbd first log message and node healthy:
BEFORE:	24.09
AFTER: 23.58

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 097046025176b9fcb670839d1a9f100f890e7ed2)
2010-06-22 22:50:23 +09:30
Ronnie Sahlberg
bc208bc916 rename ctdb_set_message_handler to ctdb_client_set_message_handler
to avoid a colission with the function of the same name in libctdb

(This used to be ctdb commit 41dbdd4fc0ab560420fb0e24a3179ff7c94c5bb7)
2010-06-02 09:51:47 +10:00
Ronnie Sahlberg
761a075de9 rename ctdb_send_message to ctdb_client_send_message to resolve colission with the function of the same name in libctdb
(This used to be ctdb commit ac3292c12832484a22715f1d46aa23f3b7c8a6f6)
2010-06-02 09:45:21 +10:00
Rusty Russell
d5f6026a22 libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h
ctdb_client.h is the existing internal client interface (which was mainly
in ctdb.h), and ctdb_protocol.h is the information needed for the wire
protocol only.

ctdb.h will be the new, shiny, libctdb API.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)
2010-05-20 15:18:30 +09:30
Ronnie Sahlberg
7a62592fc5 when performing a recovery,
ensure that all nodes use the same reclock file setting as the recovery master

(This used to be ctdb commit 26793ad42b77c2328a00ac9a12bca813c7425245)
2010-05-06 09:33:08 +10:00
Ronnie Sahlberg
62742bd337 Dont check ip assignment across the cluster while ip-verification
checks are disabled

(This used to be ctdb commit 189f4a5af1053271b0834522e35c336df959aa03)
2010-05-03 15:52:02 +10:00
Ronnie Sahlberg
4a43428440 The recent change to the recovery daemon to keep track of and
verify that all nodes agree on the most recent ip address assignments
broke "ctdb moveip ..." since that call would never trigger
a full takeover run and thus would immediately trigger an inconsistency.

Add a new message to the recovery daemon where we can tell the recovery daemon to update its assignments.

BZ62782

(This used to be ctdb commit e7069082e5f0380dcddee247db8754218ce18cab)
2010-05-03 15:47:17 +10:00
Ronnie Sahlberg
06885ea9a7 In the recovery daemon, keep track of which node we have assigned public ip
addresses and verify that the remote nodes have/keep a consistent view of
assigned addresses.

If a remote node has an inconsistent view of addresses visavi the recovery
master this will trigger a full ip reallocation.

(This used to be ctdb commit f3bf2ab61f8dbbc806ec23a68a87aaedd458e712)
2010-04-08 14:25:26 +10:00
Ronnie Sahlberg
3f226d0c8e Lower the loglevel for "Recovery lock successfully taken"
from ERR to NOTICE

BZ62086

(This used to be ctdb commit 7fa8486f9ffe2a039360b07423f734bdd884fe1d)
2010-04-07 10:45:03 +10:00
Volker Lendecke
184ca81bcd Fix a typo in run_startrecovery_eventscript
(This used to be ctdb commit 4f807b3a2d859f13c3e59e1ae737e9b145d7d613)
2010-03-29 17:06:28 +11:00
Ronnie Sahlberg
d7c00d8d7e Drop the debug level for logging fd creation to DEBUG_DEBUG
(This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)
2010-02-04 06:37:41 +11:00
Stefan Metzmacher
dbe912793e server: reload the public addresses before doing a takeover run
metze

(This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
5fa6a51388 server: monitor interfaces in verify_ip_allocation()
metze

(This used to be ctdb commit 965a65520693e3731b5b0250127b04c777087808)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
22ade0e456 server: only trigger one takeover run in verify_ip_allocation()
metze

(This used to be ctdb commit 10bc087d0280057962177721bdd6d4f28743b311)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
37880b0d0a server: use CTDB_PUBLIC_IP_FLAGS_ONLY_AVAILABLE during a takeover run
We know ask for the known and available interfaces.
This means a node gets a RELEASE_IP event for all interfaces
it "knows", but doesn't serve and a node only gets a TAKE_IP event
for "available" interfaces.

metze

(This used to be ctdb commit a695a38e49e7c3e15a9706392dc920eeab1f11ba)
2010-01-20 11:10:59 +01:00
Stefan Metzmacher
2f36e78d88 server: add missing goto again after do_recovery()
metze

(This used to be ctdb commit 898894d3acbcc0add2ab0706a3172a446622f687)
2010-01-20 09:44:35 +01:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
8fbb5b7915 server/recovery: update flags on nodes before syncing dbs
metze

(This used to be ctdb commit 49d2dca9ad837e1b397294fb0e966bf0b77f751c)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
77d43d01aa server: create recdb.tdb.X in /var/ctdb/state/
metze

(This used to be ctdb commit 92e05282d6c4f16e55d914cc3bde3738ea2d44ad)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
003985acfd ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d)
2009-12-16 08:03:55 +01:00
Ronnie Sahlberg
0982299bed Revert "Make fetch_locked more scalable"
This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d.

(This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)
2009-12-15 14:26:28 +11:00
Michael Adam
b41d9a2bcc Revert "recovery: add special pull-logic for persistent databases"
This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222.

This special recovery logic is wrong now with the transaction rewrite.
The treatment of persistent databases will later be rewritten to use the
database sequence number.

Michael

(This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be)
2009-12-12 00:45:40 +01:00