1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-27 14:04:05 +03:00

2238 Commits

Author SHA1 Message Date
Ronnie Sahlberg
80be59d35e when we change state between healthy/unhealthy, make sure we ask the recovery
master to perform an explicit ip reallocation.

This is more reliable and faster than having the recovery dameon track these
changes, and since we now have an explicit method to ask the recovery daemon
to perform an explicit ip reallocation, we should use this.

(This used to be ctdb commit 3807681e74f4bfe92befdae6ed616ff5f1a99880)
2009-10-14 11:59:16 +11:00
Ronnie Sahlberg
4b7a208b16 allow a pre .95 version of a recovery master to freeze databases on a post .95 node by remapping priority numbers and log this to log.ctdb
(This used to be ctdb commit 343c005367789e108c0320e95d7a264535d68dd8)
2009-10-14 10:14:03 +11:00
Ronnie Sahlberg
070f781e39 always create the nfs state directories during the monitor event.
this allows us to configure and enable nfs at runtime without having to restart ctdbd

(This used to be ctdb commit f6e39d35713475defaa08a623e194f3f2f8f7d53)
2009-10-14 09:15:24 +11:00
Ronnie Sahlberg
3ac5a52969 Port Volkers deadlock avoidance patch to HEAD.
This patch ensures that we lock all non-notify related databases first and
then the notify databases to avoiud a deadlock where samba needs to lock records on two databases at once (and notify being the second database).

Newer versions of samba would instead use the set-db-prio control to set this explicitely on a database per database basis instead of relying on  hardcoded database names. This patch will be reverted in the future when all updated versions of samba has been pushed out.

(This used to be ctdb commit 70e7781df1f118a0e2632a9c634f3fd388fa6c8c)
2009-10-14 08:17:49 +11:00
Ronnie Sahlberg
98b5caf003 we must break the loop as soon as we find a suitable recmaster does exist
otherwise "tdb ipreallocate" will silently fail to update the addresses.

(This used to be ctdb commit 346fa055f4106497b87df97da5ebd6e51fa1ef8c)
2009-10-13 09:49:05 +11:00
Ronnie Sahlberg
2cb9580464 new version 1.0.95
(This used to be ctdb commit 3501d6b70bd905d6fdc4e74fe2cedc3ba77e4b86)
2009-10-12 18:53:20 +11:00
Ronnie Sahlberg
d66c77d960 use the correct expected size for thew _cancel control
(This used to be ctdb commit 5974b5f7998ef96aeadb7377f32ef1ab85bb5943)
2009-10-12 18:41:57 +11:00
Ronnie Sahlberg
44f1d1fea7 add a dispatch to the recovery transaction cancel call
(This used to be ctdb commit c1d7c11978d27d2ee41a2129b31d9ab61a43f8da)
2009-10-12 18:31:59 +11:00
Ronnie Sahlberg
df0dba1862 Merge commit 'martins/master'
(This used to be ctdb commit 5f14874c5c705dd637f88a77f30c930fea1201d2)
2009-10-12 16:51:36 +11:00
Ronnie Sahlberg
122c423b82 add a new control for explicitely cancelling recovery transactions, i.e. the
transactions we start across all tdb databased during the recovery.

this allows us to properly clean up and delete these tdb transactions on a
recovery failure.

(This used to be ctdb commit b2ce8b900a7d00944c84e0574fea5b371064a06d)
2009-10-12 16:48:05 +11:00
Martin Schwenke
ab98c1b0f1 Clean up ctdb_check_directories* eventscript functions.
There are 2 problems with this code:

* The loop in ctdb_check_directories_probe() breaks on filenames
  containing whitespace.

  The fix to protect them is to pass "$@" to this function and have it
  operate on "$@".

  Note that there's still a problem with whitespace in filenames in
  the 50.samba eventscript.  To fix this ctdb_check_directories_probe
  should read the filenames from stdin.  Another time...

* The check for '%' in filenames in ctdb_check_directories_probe()
  ends up involving several forks.  On a modern machine this can cost
  a couple of minutes when checking a large number of directories.

  The fix is to use a case statement.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit eb1fecaef9aa5cb85dff7d4f7af8a9878deabed8)
2009-10-12 16:32:49 +11:00
Martin Schwenke
d8e2ddc5a8 40.vsftpd: reset the fail counter in the "recovered" event.
Each recovery that involves IP reassignments results in a restart of
vsftpd in the "recovered" event.  Currently, we can have several
recoveries in quick succession and the "monitor" event following each
can fail because vsftpd isn't ready yet.  This results in cumulative
failures, so the node is marked unhealthy, even though vsftpd has
never had a proper opportunity to become ready.

This resets the fail count after each recovery.

While we're here, also move the delete of the restart flag file into
the body of the conditional.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 318abeb4b913a8d846e7eaf4cf5c2a67b61ce974)
2009-10-12 16:17:37 +11:00
Ronnie Sahlberg
771802b212 allow setting the recmode even when not completely frozen.
we sometimes have to do this when we want to trigger a recovery

(This used to be ctdb commit 46194e87e189521375b39b4ef33da2b493429fd8)
2009-10-12 13:06:16 +11:00
Ronnie Sahlberg
73c0adb029 initial attempt at freezing databases in priority order
(This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)
2009-10-12 12:08:39 +11:00
Ronnie Sahlberg
d4c98516a2 uptade the freeze/thaw commands to be able to send the requested database priority to freeze/thaw to the daemon.
this is encoded in the srvid field of the request header

(This used to be ctdb commit 0cb3d33caa42ed783e03bc825b181dde4cf63616)
2009-10-12 09:22:17 +11:00
Ronnie Sahlberg
ae57e54566 during recovery, update all remote nodes so they use the same priorities
for the databases as this node.

(This used to be ctdb commit 465dc95fef0ff6651ff49fa94e4cf2ebd1036ac4)
2009-10-10 16:28:20 +11:00
Ronnie Sahlberg
3219f81710 add a control to read the db priority from a database
(This used to be ctdb commit ca6d045e419f308f57e74d4c978907afb05ddb85)
2009-10-10 15:04:18 +11:00
Ronnie Sahlberg
6cf7d8e131 add a control to set a database priority. Let newly created databases default to priority 1.
database priorities will be used to control in which order databases are locked during recovery in.

(This used to be ctdb commit 67741c0ee01916d94cace8e9462ef02507e06078)
2009-10-10 14:26:09 +11:00
Ronnie Sahlberg
e8e2f35985 verify the DISABLED flag and compare with the previous flag we have registered for that node and not what the node says is the difference.
this prevents a situation where the remove node may cause spurious ip reallocations.

(This used to be ctdb commit dd122351efaeef5475cdec111eb900110d83ec35)
2009-10-10 13:55:11 +11:00
Ronnie Sahlberg
05137e4718 Fix bug spotted by Metze,
the argument to ctdb_control_event_Script_disabled() is a string not a uint32

(This used to be ctdb commit 687535b51622d1fac7ccb38fa640bf1febd69fd8)
2009-10-09 22:22:11 +11:00
Ronnie Sahlberg
eb9a77c887 version 1.0.94
(This used to be ctdb commit 5cb4d63bf6887d15aba37fafc3f6b6ba38027f13)
2009-10-08 19:17:57 +11:00
Ronnie Sahlberg
342148628f if a node fails to become frozen during recovery, mark it up with as a culprit so it will soon get banned
(This used to be ctdb commit f72d33ac73ebb1af802bacdfb30279df3cd8b8f9)
2009-10-08 16:45:25 +11:00
Ronnie Sahlberg
d29c4b5c4d version 1.0.93
(This used to be ctdb commit e77bf5708df6782b4516f698b9981a1d27e2f10b)
2009-10-06 17:05:14 +11:00
Ronnie Sahlberg
42193cbff8 update natgw eventscript to allow you to fore it to update and / or to remove the configuration at runtime
(This used to be ctdb commit deed52b7e4aac94b4d11a8d89d08739e1dfd4ed7)
2009-10-06 16:09:24 +11:00
Martin Schwenke
2fa921ba92 Merge commit 'origin/master'
(This used to be ctdb commit 7d91de8a837a12082c343980428153720dcad741)
2009-10-06 13:39:31 +11:00
Martin Schwenke
47f5347963 Document CTDB_NODES_FILE environment variable used by onnode.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 22f0065cd6b66fa0f623f465aaca98883955ac79)
2009-10-06 13:38:00 +11:00
Ronnie Sahlberg
134ed842fa always send the release/take ip controls to make sure all nodes are updated
(This used to be ctdb commit 789703ea684717781c176fd3a2a24d96abde220b)
2009-10-06 12:25:44 +11:00
Ronnie Sahlberg
166b1c97b4 add a new message to ask the recovery daemon to temporarily disable checking ip address consistency.
This is useful when we are moving addresses using moveip in the cluster since otherwise if we collide with the recovery daemons own check we could cause a recovery

(This used to be ctdb commit 9c63858c0b22c81eaccb9865a414af0bbb2833d4)
2009-10-06 12:11:32 +11:00
Ronnie Sahlberg
617e393f6b update addip/moveip/delip to make it less likely to trigger an accidental recovery
(This used to be ctdb commit 3befe5526e147d49451fddc930aaafc3dbe2e9c1)
2009-10-06 11:41:18 +11:00
Ronnie Sahlberg
50712d48d3 change some loglevels and also pront the pnn of the ip for takeip/releaseip logging
(This used to be ctdb commit 9d95dfbd12898975ba0d8560d95a974210d3de7c)
2009-10-06 11:40:38 +11:00
Ronnie Sahlberg
71e4259150 add a new function to collect a list of all active nodes EXCEPT a certain node
(This used to be ctdb commit be52954d921e7d443304cf49fbd488c619a9c4ec)
2009-10-06 10:52:31 +11:00
Ronnie Sahlberg
3133dadd8f allocate takeoverip state as a child of vnn and also make the takeocerip context a child of vnn
(This used to be ctdb commit 804e5905be51f43c8a338bfbe216fd8d5718850f)
2009-10-06 09:35:15 +11:00
Ronnie Sahlberg
709fc77878 When adding a public ip to a node, make sure to push the assignment of ip addresses out to all nodes so all nodes become aware who currently holds the ip.
(This used to be ctdb commit e8df6fc301fb7faf72c72eb39ea68d44d1526b00)
2009-10-06 08:19:25 +11:00
Ronnie Sahlberg
1d60064139 version 1.0.92
(This used to be ctdb commit 9ffb0d08d34cbafed0e49350a3a72b15d92c8ea7)
2009-10-02 14:38:16 +10:00
Ronnie Sahlberg
f8334e2f68 we should close this file on exec
(This used to be ctdb commit c1c0ebb8da9a6c29ee83868a311f07f30cb4ed16)
2009-10-02 13:41:54 +10:00
Ronnie Sahlberg
2ab8f6a368 Merge commit 'martins/master'
(This used to be ctdb commit 9b206d96da3341836cc25aee5693f551f6f3a80e)
2009-10-01 15:46:01 +10:00
Martin Schwenke
3edf5532d5 Test suite: The ctdb ping test should allow time to go backwards.
Time can actually go backwards during this test if ntpd happens to
adjust it little bit.  So we should cope...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 23ae9e9863ea90c6fb3f105403fd098041fa73f4)
2009-10-01 15:39:09 +10:00
Ronnie Sahlberg
dfc2500a1f dont exit on a commit failure
(This used to be ctdb commit 4e9a3a5dc232bac12ab387ea0cf4f1b279bed5c1)
2009-10-01 14:53:35 +10:00
Ronnie Sahlberg
63278ad040 Revert "Revert "allow the transaction commit to fail""
This reverts commit 74e416108df6934f45ca646d709785dd76ab3c35.

(This used to be ctdb commit d1d370033d5007ad1c2c34cd9eeac53001f4b13e)
2009-10-01 14:51:32 +10:00
Ronnie Sahlberg
32286b08ac document how to use the notification script
(This used to be ctdb commit b77e4698e7f83443243965f93b84237f2903cd46)
2009-10-01 14:31:55 +10:00
Ronnie Sahlberg
e90dd8015f add a new notification to trigger on when ctdb has started
(This used to be ctdb commit b1fe04f2e9447f762a0b805763deb29296585ff8)
2009-10-01 14:05:30 +10:00
Martin Schwenke
b27600253d Minor fixes to 01.reclock eventscript.
test -z really needs its argument to be quoted.  Simplified a status
test.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit fe26da7780545b1ecc0a7da5bc1cf8beaeea94cc)
2009-09-30 21:21:56 +10:00
Martin Schwenke
78b7043411 40.vsftpd monitor event only fails after 2 failures to connect to port 21.
Change the monitor event in 40.vsftpd so it only fails if there are 2
successive failures connecting to port 21.  This reduces the
likelihood of unhealthy nodes due to vsftpd being restarted for
reconfiguration due to node failover or system reconfiguration.

New eventscript functions ctdb_counter_init, ctdb_counter_incr,
ctdb_counter_limit.  These are used to count arbitrary things in
eventscripts, depending on the eventscript name and a tag that is
passed, and determine if a specified limit has been hit.  They're good
for counting failures!

These functions are used in 40.vsftpd and also in 01.reclock - the
latter used to do the counting without these functions.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cfe63636a163730ae9ad3554b78519b3c07d8896)
2009-09-30 21:05:16 +10:00
Martin Schwenke
e169ba85f3 Merge commit 'origin/master'
(This used to be ctdb commit 803cfb4cd2f6d139f466053a6d7e104fcb772ef5)
2009-09-30 19:22:59 +10:00
Ronnie Sahlberg
11c56dfd56 New version 1.0.91
(This used to be ctdb commit d1332f4d5d3d3e4b4e0cd362a6903d09e0d5fcbb)
2009-09-29 13:31:41 +10:00
Ronnie Sahlberg
c971d934a9 From Wolfgang Mueller-Friedt
Remove the explicit vacuum/repack commands from the 00.ctdb eventscript
and implement this in the ctdb daemon.

Combine vacuuming and repacking into one
cheap read traverse to enumerate all candidate records
and one write traverse that both repacks the database and also deletes the record locally where we are lmaster and where the records have already been deleted remotely.

this code also adds initial autotuning heuristics for the vacuum intervals and how many records to delete in each iteration.

minor stylish changes made by ronnie s

(This used to be ctdb commit 95a3ee551241aa164967991fe5efe078e1714bde)
2009-09-29 13:27:19 +10:00
Martin Schwenke
e976209996 Merge commit 'origin/master'
(This used to be ctdb commit 096cdc0c12d22d99f8405bee5cb9f05c616c8492)
2009-09-29 12:59:10 +10:00
Ronnie Sahlberg
9bac6f2e2c change the reclock fail count to 19 monitor intervals before we shut down ctdbd
(This used to be ctdb commit 6e35feb06ec036b9036c5d1cdd94f7cef140d8a6)
2009-09-28 14:12:59 +10:00
Ronnie Sahlberg
4f0f2cc196 add a new eventscript 01.reclock
if the reclock file has been set, then this script will test that the
    reclock file can actually be accessed.
    if the file does not exist, or if the attempts to stat the file hangs,
    the node will be marked unhealthy after the third failed monitoring event
    and after the tenth failure, ctdb itself will shutdown.

(This used to be ctdb commit 2cb04747887674def299e574fccb827c1c3194e7)
2009-09-28 14:06:40 +10:00
Ronnie Sahlberg
22dde50be3 add machinereadable output for the ctdb getreclock command
(This used to be ctdb commit 5e7dc36f1649824db2f9dab34bede8b388502a57)
2009-09-28 13:39:54 +10:00