1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-26 10:04:02 +03:00

4845 Commits

Author SHA1 Message Date
Amitay Isaacs
d2411e74f1 recoverd: Update capabilities only if the current node is active
Since we do an early return if a node is stopped or banned, move update
capabilities code below the early return and just before we check the
capabilities of current recovery master.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 93bcb6617e1024f810533e12390a572f51703ca0)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
73e6cc765d recoverd: No need to check if node is recovery master when inactive
If a node is stopped or banned, it will cause early return from the
main_loop, so this check is redundent.  The election will called by an
active node.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 815ddd3341b7e9db39e05a3a3fcd9a1420f053bc)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
870409ed1c recoverd: Always do an early exit from main_loop if node is stopped or banned
A stopped or banned node cannot do anything useful.  So do not participate
in any cluster activity and do not cause any unnecessary network traffic.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
7b761c4b97 recoverd: Do not set banning credits on a node if current node is inactive
If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 38304f88e0c634e97d4687c25adef975f71537b8)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
5deebd3b75 banning: Do not come out of ban if databases are not frozen
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a60f228f8380f222f838eb619d2ab55f96f11ac2)
2013-07-02 12:59:09 +10:00
Amitay Isaacs
9a944d71dc banning: No need to check if banned pnn is for local node
If the banned pnn is not the local node, the function returns early.
So no need for additional check.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 297d93cecc3c0655e72ecac38508e113bdbeab9c)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
c6914e3891 banning: Make ctdb_local_node_got_banned() a void function
When this function is called, we are already committed to banning
and there is no point in failing this function.  In case, freezing of
databases fails, it will be fixed from recovery daemon.

(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
cf1d4bfde3 recoverd: Also check if current node is in recovery when it is banned
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
3052006bf9 recoverd: Set node_flags information as soon as we get nodemap
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 8d622660a14c929e365d306147b378ea6ab92175)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
36d8d25b6c recovered: Remove old comment as the code corresponding to that has gone away
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 34af2cdf686d5d77854cbaa7bbcd8f878e9171c7)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
ea00a5ecf5 banning: Log ban state changes for other nodes at higher debug level
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c6f8407648abb37f2ed781afa5171dad8c9f59e9)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
622ccd09f9 freeze: Make ctdb_start_freeze() a void function
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
cf17247d31 freeze: If priority is invalid here, it's time to abort
ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
priority if it's 0 and return error if it's invalid.  Other callers of
ctdb_start_freeze() are internal to CTDB.  So if priority is invalid in
ctdb_start_freeze(), definitely something is seriously wrong.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 87716e8f504d659515d3dbcf93badbf106873bc8)
2013-07-02 12:59:08 +10:00
Amitay Isaacs
6fe0089bc0 freeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()
This ensures that whenever databases are frozen either via sending
control or by calling ctdb_start_freeze(), the action is logged.
Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
message in early return condition if databases are already frozen.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 478e24bceda3fedfba54ccb48faa115df726b819)
2013-07-02 12:57:03 +10:00
Amitay Isaacs
d439aa05a8 recoverd: Print banning message only after verifying pnn
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4be8dff3a4451192f838497b4747273685959bed)
2013-06-28 14:20:12 +10:00
Amitay Isaacs
6960bf78ff recoverd: When updating flags on nodes, send updated flags and not old flags
This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
was sent to the local daemon which in turn informed the recovery daemon.
And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc)
2013-06-28 14:20:12 +10:00
Martin Schwenke
442953c540 tools/ctdb: Add "force" option to "recover" command
At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs.  This option is added without
documentation because it is dangerous until the bugs are fixed!  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4f87925a287f612a6ab3b5da1a387a31c7bea28f)
2013-06-28 14:18:00 +10:00
Amitay Isaacs
f9191c061a client: Exit with non-zero status when unix socket is closed
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 733fc909425860f6a02c205c2d8f34a731853922)
2013-06-25 17:48:23 +10:00
Martin Schwenke
55de6c56ce doc: Fix ctdb ping entry in manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit abeb65ef02d018a7c14d4f8cea71e15c6cf9e357)
2013-06-22 15:54:19 +10:00
Martin Schwenke
356647949b doc: Fix documentation for NoIPTakeover in ctdbd manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d0215be5aefe492258a92c7bff2d41960379580)
2013-06-22 15:54:19 +10:00
Martin Schwenke
ed45a2e115 doc: Update notification script section in ctdbd manpage
The example notification script is now much more useful.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ba7c73eeab98296c9168e0b0fed1f6bb9f32733)
2013-06-22 15:54:19 +10:00
Martin Schwenke
017b966669 doc: Add nodestatus command to the ctdb manpage
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4369c8e6ead9062ef7855ada375df74262acf925)
2013-06-22 15:54:19 +10:00
Martin Schwenke
51150c7727 doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cd6227aa38d3bb4e5043faeffe436004e27b6d06)
2013-06-22 15:54:14 +10:00
Martin Schwenke
16d374f75e tests: Integration tests use "ctdb nodestatus" for healthy cluster check
Also check that we're not in recovery mode.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b7aaa28b3a6a2de923417f3d143f8d516447711e)
2013-06-22 15:51:17 +10:00
Martin Schwenke
0a80d65c2e tests: Integration test infrastructure should do only a single recovery
No need for 2 recoveries after a restart.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b953524185632d7f96a76d8f3bbed7ac1d143d40)
2013-06-22 15:51:17 +10:00
Martin Schwenke
44e885e98e ctdbd: Fix panic on overlapping shutdowns
The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown.  This regression was introduced in
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)
2013-06-22 15:51:16 +10:00
Martin Schwenke
6a52a87028 ctdbd: Refactor shutdown sequence
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b32fd04bfbf33062d45365b37a7247e272a76ceb)
2013-06-22 15:51:02 +10:00
Martin Schwenke
01d879806b eventscripts: "setup" event doesn't need to wait for SETUP runstate
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e)
2013-06-20 13:01:10 +10:00
Martin Schwenke
3b2f7330cc tests/eventscripts: New tests for 00.ctdb "init" event
These test dropping of IPs and TDB checking.

New stubs for date, tdbdump, tdbtool.

Enhance ip stub to handle "ip addr show to ..."

Tweak some infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit aabf0bf41cb8ec344f06b69492fb6c2a27f9e900)
2013-06-20 13:01:10 +10:00
Martin Schwenke
4eed91b54a eventscripts: 13.per_ip_routing should not try hard to find public_addresses
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!

The test code has been fixed instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332)
2013-06-20 13:01:10 +10:00
Martin Schwenke
2ceed3b0c8 tests/eventscripts: setup_ctdb() should always set $CTDB_PUBLIC_ADDRESSES
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c3e7a6e10d486ba0dbafdf110db540675b2317bc)
2013-06-20 13:01:10 +10:00
Martin Schwenke
58d499d3ae logging: Notify parent when logging daemon is up
Messages are lost until it is really up because syslogd_is_started is
set too early.  Adding a pipe to do the notification allows the parent
to wait and only set syslogd_is_started when the logging daemon is
actually ready.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f3dd2eec200d6eeada2ea19cd7e76f1edfad6167)
2013-06-20 13:01:10 +10:00
Martin Schwenke
6317285c4f scripts: Move TDB checking from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0)
2013-06-20 13:01:10 +10:00
Martin Schwenke
961468146e scripts: Move dropping of all IPs from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b77cceb49a30a181063adc7868d42d2851318e8)
2013-06-20 13:01:09 +10:00
Martin Schwenke
bee02e06e6 scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5ffce65a1ad659b198ddf647622b899bdde45c72)
2013-06-20 13:01:09 +10:00
Martin Schwenke
a1eb516f0a scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0b67397ef5419c781a35916575151da7b7e7cc27)
2013-06-20 13:01:09 +10:00
Martin Schwenke
26d0746b5d ctdbd: "init" event should run earlier in daemon initialisation
It should run before:

* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0a0c8543f167e11b75a622513367b083e42cbd3f)
2013-06-20 13:01:09 +10:00
Amitay Isaacs
a4f4e391f0 tools/ctdb: Do not exit prematurely on control timeout if retrying in a loop
This avoids premature exits from "ctdb stop" and "ctdb continue" due to
intermittent control (e.g. getpnn, getnodemap) timeouts.

This needs a proper fix to distinguish between timeout and failure
conditions and take appropriate action.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c48583fd238496a81ddc46a21892f0b49559036a)
2013-06-20 12:52:00 +10:00
Amitay Isaacs
585a2715a6 packaging: Update the minimum required library versions
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5f8547b1531bba4950b3d873a997585c3a16d31e)
2013-06-17 10:44:31 +10:00
Amitay Isaacs
4a9ed315c7 build: Enable VERBOSE option to display build command line
make V=1 or make VERBOSE=1 will display build commands.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 02c63c591cc273122b3a547bb301b92f0e4bd217)
2013-06-14 16:45:27 +10:00
Mathieu Parent
d82b9ae410 build: Fix tdb.h path to enable building with system TDB library
(This used to be ctdb commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86)
2013-06-14 16:45:27 +10:00
Mathieu Parent
ecaf710193 libctdb: Include config.h in libctdb/ctdb.c
Bug-Debian: http://bugs.debian.org/703551

(This used to be ctdb commit 14a79c0f3967c88f8ffc8200d122f6c5ffdb63a8)
2013-06-14 16:45:27 +10:00
Amitay Isaacs
d0c858f211 ctdbd: Make sure we don't kill init process by mistake
If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1).  This does not happen on RHEL, but does on AIX.

Reported-by: Chris Cowan <cc@us.ibm.com>

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)
2013-06-14 16:39:48 +10:00
Martin Schwenke
27ba5b44b6 tests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS
Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit cd4358b01c6c3d413b431f5760029d2b163b9c03)
2013-06-14 15:15:07 +10:00
Martin Schwenke
d82c0ef923 tests/eventscripts: Fix -X tracing in iterate_test()
... and delete a bogus comment.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3)
2013-06-14 15:15:07 +10:00
Martin Schwenke
02dd1bf00f tests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT
Includes minor test infrastructure updates.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14)
2013-06-14 15:15:07 +10:00
Martin Schwenke
45878d4363 eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped.  This can be useful for trying to determine
why nfsd is stuck.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2503245db10d567af708a04edd3a3b488c24f401)
2013-06-14 15:15:06 +10:00
Martin Schwenke
f408caea2a eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Consider the following example:

1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
   underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.

Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2)
2013-06-13 20:01:22 +10:00
Martin Schwenke
7513f0ba61 recoverd: Log node that causes takoever run to fail
Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL.  Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea)
2013-06-13 15:55:48 +10:00
Martin Schwenke
896c5d5bf4 doc: Add release notes for 2.2
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac0892d3a57adb0587a37de0f94fa686bed8970f)
2013-05-30 12:30:32 +10:00