1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-25 23:21:54 +03:00
Commit Graph

2876 Commits

Author SHA1 Message Date
Martin Schwenke
5e92afeb33 Test suite: allow settign of timeout triggers for all events not just monitor.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f319bd54369a2bc7d32c3bda7fc22f2ef1a51c3a)
2009-12-18 14:42:58 +11:00
Ronnie Sahlberg
01a0824288 Version 1.0.110
(This used to be ctdb commit 859e18db681dabe0990793d03f58e59a061aa8bb)
2009-12-18 12:32:58 +11:00
Rusty Russell
4dce0690de eventscript: fix cleanup path when setting up script list
We shouldn't set ctdb->current_monitor until we set destructor: that's
what cleans it up.

Also, free state->scripts on no-scripts exit path: it's not a child of
state because we need it in the destructor.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 843a2ed5ef85f628788b0caf7417c6b61b5c6d3f)
2009-12-18 12:31:34 +11:00
Stefan Metzmacher
77c4a86351 server: add set_close_on_exec() on more fds
metze

(This used to be ctdb commit 7101ae80bf4e530f48e31e4c58707aa45a9fd3d5)
2009-12-17 14:41:07 +01:00
Stefan Metzmacher
bbfa4402e4 server: fix fd leaks in the new logging code
metze

(This used to be ctdb commit 140070dd81b39545fe2d56f70e9b9c96bfdae07f)
2009-12-17 13:05:39 +01:00
Ronnie Sahlberg
9b507abd6e version 1.0.109
(This used to be ctdb commit 99894a70fe2ebfe43daae7e88ff0fc9cab33e0fb)
2009-12-17 15:49:01 +11:00
Rusty Russell
8aec7e5656 eventscript: remove cb_status, fix uninitialized bug when monitoring aborted
Previously we updated cb_status a each script finished.  Since we're storing
the status anyway, we can calculate it by iterating the scripts array
itself, providing clear and uniform behavior on all code paths.

In particular, this fixes a longstanding bug when we abort monitor
scripts to run some other script: the cb_status was uninitialized.  In
this case, we need to hand *something* to the callback; 0 might make
us go healthy when we shouldn't.  So we use the last status (normally,
this will be the just-saved current status).

In addition, we make the case of failing the first fork for the script
and failing other script forks the same: the error is returned via the
callback and saved for viewing through 'ctdb scriptstatus'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 5d50f0e16948d18009f6623f132113f7273efc7f)
2009-12-17 15:39:46 +11:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
af2613e16f ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
2009-12-16 20:57:20 +10:30
Rusty Russell
c488ba440a Remove RT priority, use niceness.
1) It's buggy.  Code needs to be carefully written (ie. no busy
   loops) to handle running with it, and we fork and run scripts.[1]

2) It makes debugging harder.  If ctdbd loops (as has happened recently)
   it can be extremely hard to get in and see what's happening.  We've already
   seen the valgrind hacks.

3) We have seen recent scheduler problems.  Perhaps they are unrelated,
   but removing this very unusual setup is unlikely to hurt.

4) It doesn't make anything faster.  Under all but the most perverse of
   circumstances, 99% of the cpu gives the same performance as 100%, and
   we will always preempt normal processes anyway.

[1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for
    each script" by removing the switch_from_server_to_client() which
    restored it, but even that was only for monitor scripts.  Others were
    run with RT priority.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)
2009-12-16 19:26:22 +10:30
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
4626665b7e doc: regenerate manpages
metze

(This used to be ctdb commit e3825407a509110c786d618efcdfa56ba93380a5)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
fffb9d55b1 doc: fix docbook warnings for ctdb.1 and onnode.1 manpages
metze

(This used to be ctdb commit 0d1300aa2325c94d8fb1c3cf8d454e5eee43dde9)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
b22d893e91 doc/ctdb.1: update example "ctdb listvars" output
metze

(This used to be ctdb commit 33ec6943fb2d01b6df0ce4515d37c671b18d237f)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
685806d47f doc/ctdb.1: make clear the database is specified by name for "ctdb backupdb"
metze

(This used to be ctdb commit 9b4c76973a8cf03ddc1a9b3777a350f739c00892)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
3bfc58d47a doc/ctdb.1: document "ctdb getdbstatus <dbname>"
metze

(This used to be ctdb commit a90f3dd25a22f9b8777ff6946ce1721859e9479a)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
58dd03f43d doc/ctdb.1: add "See also" for ctdb getdbmap
metze

(This used to be ctdb commit bf48ae41ef5fb8e4675be448d13db522465d8d72)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
7809aa9a5d doc/ctdb.1: document "ctdb dumpdbbackup <file>"
metze

(This used to be ctdb commit 8e6b8be51fd1bda789675650a94df0115ee9e238)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
fa6d8641d9 doc/ctdb.1: document -Y output fot ctdb getdbmap
metze

(This used to be ctdb commit c09acd0896089a612ee3a1e78711abd98bd9cc99)
2009-12-16 08:08:34 +01:00
Stefan Metzmacher
0056cec9b4 doc/ctdb.1: document UNHEALTHY for "ctdb getdbmap"
metze

(This used to be ctdb commit 3cdb8be02acc23074c8137a54faea62fee4567a0)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
1634b44c74 doc/ctdb.1: document "ctdb wipedb"
metze

(This used to be ctdb commit fce390194dadb4961b46c706a1826442eef8c63d)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
96977cc5c4 config: add CTDB_MAX_PERSISTENT_CHECK_ERRORS option
metze

(This used to be ctdb commit fc5f556d488488040303438aefecb5ae2a8e54bc)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
0c735f03d4 config: try to use tdbtool <tdb> check instead of tdbdump for persistent db checks
metze

(This used to be ctdb commit 52e6d81f4d8a4035272d9256d01bafb8ed593027)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
a03cf0040b ctdb: print out some hints how to debug a "ctdb catdb" failure
metze

(This used to be ctdb commit 504cf78d00d1120b556124340b9312f890b8b8b9)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
965c000c6e ctdb: add machinereadable output fot "ctdb -Y getdbmap"
metze

(This used to be ctdb commit 45cfcd44093c7d2681e2ffd5cfb402823e8809f4)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
aa07a46bf5 ctdb: disallow "ctdb backupdb" on unhealthy databases
metze

(This used to be ctdb commit ecf799093c1989f5499c9d61ce8cc8a98d759160)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
c4bc231267 client: add "ctdb dumpdbbackup <filename>"
metze

(This used to be ctdb commit c63a0368d9d4b526ac1e49d891d3a1b7b8d20320)
2009-12-16 08:08:33 +01:00
Stefan Metzmacher
aa658b6777 client: make ctdb_dumpdb_record() public
metze

(This used to be ctdb commit 1cdc8dbb9cb971cf6dd6cd22b1adaf70ddc77e65)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
fb50e08942 tools/ctdb: let "ctdb restoredb" and "ctdb wipedb" mark the db as healthy on all
nodes

metze

(This used to be ctdb commit d1b10b0c0c323c39742a18e98a1dab7e82ddc7be)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
c56ce3d2f2 tools/ctdb: add "ctdb getdbstatus <dbname>"
metze

(This used to be ctdb commit 910c19f12448d293a755d1eb46d20f9591f8da7a)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
927dd3d9e5 tools/ctdb: display db health in "ctdb getdbmap"
metze

(This used to be ctdb commit c34535ff4dc6a44909283641596e0ed7c2316fbd)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
0e436b46c6 client: add ctdb_ctrl_getdbhealth()
metze

(This used to be ctdb commit 5abe44d0113839d3a45c9a31d30856aa70c2ea1f)
2009-12-16 08:08:32 +01:00
Stefan Metzmacher
f1f0af2b67 server: add CTDB_CONTROL_DB_SET_HEALTHY and CTDB_CONTROL_DB_GET_HEALTH
metze

(This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)
2009-12-16 08:08:29 +01:00
Stefan Metzmacher
94bc40307a server: Use tdb_check to verify persistent tdbs on startup
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.

The default is 0 which means to reject a startup with
unhealthy dbs.

The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.

Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".

metze

(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
2009-12-16 08:06:10 +01:00
Stefan Metzmacher
9069d3a7fb server: move error handling to a 'fail' label in ctdb_control_transaction_commit()
metze

(This used to be ctdb commit d874463235fa299e83fe562291c688aca3b85cf3)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
8fbb5b7915 server/recovery: update flags on nodes before syncing dbs
metze

(This used to be ctdb commit 49d2dca9ad837e1b397294fb0e966bf0b77f751c)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
b74918b465 server: open /var/ctdb/state/persistent_health.tdb.X on startup
This node internal tdb will store the HEALTH state of persistent
tdbs.

metze

(This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
7f05a423e2 server: create vactune.tdb.X with 0600 permissions
metze

(This used to be ctdb commit 21677ed6fb8c589f348321533c608cad58c4ec93)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
473f02ed48 server: create vactun.tdb.X under /var/ctdb/state
metze

(This used to be ctdb commit 1db17f312558fe59983a3465680e56c9f0c19e36)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
77d43d01aa server: create recdb.tdb.X in /var/ctdb/state/
metze

(This used to be ctdb commit 92e05282d6c4f16e55d914cc3bde3738ea2d44ad)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
9a96ae0c97 server: only do the mkdir() calls for db_directory* once at the start
metze

(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
b48228e7f9 server: add db_directory_state to ctdb_context
metze

(This used to be ctdb commit 656a6ec5ed81ccfbb86144156a3158e48f105ee4)
2009-12-16 08:03:55 +01:00
Stefan Metzmacher
cda5884854 server: create tdbs with 0600 permissions in ctdb_local_attach()
metze

(This used to be ctdb commit 6529a1328b9ec304ad306674651b2a67e4426e23)
2009-12-16 08:03:55 +01:00
Stefan Metzmacher
0c907f4965 config: load 'ctdb' config before 'nfs' config in statd-callout
All other scripts do 'loadconfig ctdb' before any other 'loadconfig foo'
call. I think we should do the same in statd-callout.

Otherwise it's very confusing, if you have configured some Options
in /etc/sysconfig/ctdb, but /etc/ctdb/statd-callout doesn't notice
them.

metze

(This used to be ctdb commit 10d95581fb90bfdf58ec32345c4e36c27acf4f37)
2009-12-16 08:03:55 +01:00
Stefan Metzmacher
003985acfd ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d)
2009-12-16 08:03:55 +01:00
Simo Sorce
2d24073e97 Fix release script with newer versins of git
(cherry picked from commit 4334092cba)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 093f57a2c00f2d629a3b58e58202f1a7e1bbd406)
2009-12-16 08:03:54 +01:00
Matthias Dieter Wallnöfer
6e3a572135 tdb tools: Mostly cosmetic adaptions
Signed-off-by: Stefan Metzmacher <metze@samba.org>
(cherry picked from samba commit 9776cb0345)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit d1873bd81bfc9f486b88f3a38c65c7de8f5a0909)
2009-12-16 08:03:54 +01:00
Stefan Metzmacher
859ffb09b6 tdb: change version to 1.2.0 after adding TDB_*ALLOW_NESTING
metze
(cherry picked from samba commit 5ca0a4bfd6)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 04aeac728f56c65b973762f09977de1b1b99099e)
2009-12-16 08:03:54 +01:00
Stefan Metzmacher
5cbf0183f3 tdb: add TDB_DISALLOW_NESTING and make TDB_ALLOW_NESTING the default behavior
We need to keep TDB_ALLOW_NESTING as default behavior,
so that existing code continues to work.

However we may change the default together with a major version
number change in future.

metze
(cherry picked from samba commit 3b9f19ed91)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit c1c0ede32dc00ed619d1cf5fda40a9de43995f3a)
2009-12-16 08:03:54 +01:00
Stefan Metzmacher
b768146ef5 tdb: always set tdb->tracefd to -1 to be safe on goto fail
metze
(cherry picked from samba commit 85449b7bcc)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 855391c1e37012b0d6c673a304bb8da8a1efcd72)
2009-12-16 08:03:53 +01:00
Volker Lendecke
e0f59b0a19 tdb: Fix a C++ warning (cherry picked from samba commit be88a126ea)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 6126f04bd4982b66564dcccd92a15baf9cb856f3)
2009-12-16 08:03:53 +01:00
Kirill Smelkov
a96af9815b tdb: update README a bit
While studying tdb, I've noticed a couple of mismatches between readme
and actual code:

- tdb_open_ex changed it's log_fn argument to log_ctx
- there is now no tdb_update(), which it seems was transformed into
  non-exported tdb_update_hash()

There were other mismatches, but I don't remember them now, sorry.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit 83de5c8263)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 7a88f1df9190674deaf5dcbedad02ae4120a5263)
2009-12-16 08:03:53 +01:00
Kirill Smelkov
ad6e28c032 tdb: add tests for double .close() in pytdb
The reason I do it is that when using older python-tdb as shipped in
Debian Lenny, python interpreter crashes on this test:

    (gdb) bt
    #0  0xb7f8c424 in __kernel_vsyscall ()
    #1  0xb7df5640 in raise () from /lib/i686/cmov/libc.so.6
    #2  0xb7df7018 in abort () from /lib/i686/cmov/libc.so.6
    #3  0xb7e3234d in __libc_message () from /lib/i686/cmov/libc.so.6
    #4  0xb7e38624 in malloc_printerr () from /lib/i686/cmov/libc.so.6
    #5  0xb7e3a826 in free () from /lib/i686/cmov/libc.so.6
    #6  0xb7b39c84 in tdb_close () from /usr/lib/libtdb.so.1
    #7  0xb7b43e14 in ?? () from /var/lib/python-support/python2.5/_tdb.so
    #8  0x0a038d08 in ?? ()
    #9  0x00000000 in ?? ()

master's pytdb does not (we have a check for self->closed in obj_close()),
but still...

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit 71a21393dd)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 03372b4ea8ba2938468a5c0fc234d604966ce070)
2009-12-16 08:03:53 +01:00
Kirill Smelkov
5ea8bd3851 tdb: reset tdb->fd to -1 in tdb_close()
So that erroneous double tdb_close() calls do not try to close() same
fd again. This is like SAFE_FREE() but for fd.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit b4424f8234)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit f5c992bdaeb73ef726ff4728a9922721474cd6f5)
2009-12-16 08:03:53 +01:00
Kirill Smelkov
8af1b8bf96 tdb: fix typo in python's Tdb.get() docstring
It's Tdb.get(), not Tdb.fetch().

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit cfed5f946d)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 76aacdd8e1106f26565e25903091a757b59cd7e2)
2009-12-16 08:03:53 +01:00
Andrew Tridgell
a51ad4a3be tdb: detect tdb store of identical records and skip
This can help with ldb where we rewrite the index records
(cherry picked from samba commit d4c0e8fdf0)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 470750fa2e3cf987f10de48451b1ee13aab03907)
2009-12-16 08:03:52 +01:00
Stefan Metzmacher
2b10c14e35 tdb: rename 'struct list_struct' into 'struct tdb_record'
metze
(cherry picked from samba commit 3b62e250c0)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 03b3682e3fa53c9f5fdf2c4beac8b5d030fd2630)
2009-12-16 08:03:52 +01:00
Rusty Russell
f836eeb79b lib/tdb: make tdbtool use tdb_check() for "check" command
Also, set logging function so we get more informative messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(cherry picked from samba commit 0944931159)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 6ac7ef8bf4d384f880c7f483ace70f8e08c15a8b)
2009-12-16 08:03:52 +01:00
Rusty Russell
b0a4a82370 lib/tdb: add tdb_check()
ctdb wants a quick way to detect corrupt tdbs; particularly, tdbs with
loops in their hash chains.  tdb_check() provides this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(cherry picked from samba commit 022b4d4aa6)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit df1a3ce0380fa9d8722b2f9b16f65557095e4c83)
2009-12-16 08:03:52 +01:00
Kirill Smelkov
b4bb5257f6 tdb: kill last bits from swig
We no longer use swig for pytdb, so there is no need for swig make
rules. Also pytdb.c header should be updated.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit ecbe5ebd8d)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 27611d6a0c313732e438cb24c82b9de126e50156)
2009-12-16 08:03:52 +01:00
Stefan Metzmacher
42d131d951 lib/tdb: sync build files from samba master
metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 1d5c5a221c28f1dc652a80ed516a0f18ba588d9f)
2009-12-16 08:03:52 +01:00
Stefan Metzmacher
4576b9aae8 s3 build: Remove unused fstat check to fix a bunch of HAVE_FSTAT warnings (cherry picked from samba commit 2c2545d45a)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit c40d14b1b7ba7c9ae40c0306a2e552504e0f92a6)
2009-12-16 08:03:51 +01:00
Stefan Metzmacher
a60db0255b lib/tdb: include replace.h and system/filesys.h in pytdb.c
This fixes the build on Tru64.

metze
(cherry picked from samba commit 3718cf294a)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 5652e403be099f35cdd29fda8ba4fe2c35de8035)
2009-12-16 08:03:51 +01:00
Stefan Metzmacher
c0131bbcc7 Avoid using a utility header for Python replacements included in Samba, since this will not be shipped with talloc/tdb/tevent/etc. (cherry picked from samba commit ba5d6e6d70)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit da47169c4d3bc1b446b49610d892df05638e912c)
2009-12-16 08:03:51 +01:00
Stefan Metzmacher
c5827f2c2d s3/s4 build: Fix Py_RETURN_NONE to work with python versions < 2.4 (cherry picked from samba commit 61a23c5eea)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 4130c5dd10869b071124e2bf04d6807bbb11ab1f)
2009-12-16 08:03:51 +01:00
Stefan Metzmacher
31bc9f3ee1 py: Properly increase the reference counter of Py_None. (cherry picked from samba commit d2c70d24e1)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit e7242221c3318a5c312e17ff4074bef80b639ca8)
2009-12-16 08:03:51 +01:00
Jelmer Vernooij
fc9d260310 Make sure to not close tdb database more than once. (cherry picked from samba commit 6fe6983e4c)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 52f78e040749d24058ee1f575d949b57d15f5987)
2009-12-16 08:03:51 +01:00
Jelmer Vernooij
c9907f2634 Implement missing functions in pytdb. (cherry picked from samba commit 2da551bbcc)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 3a671b11770057c91e0ae646499d4714f52bc5c0)
2009-12-16 08:03:51 +01:00
Stefan Metzmacher
50ceb83180 Add simple manually written replacement for the tdb module. (cherry picked from samba commit 2a61fd41e9)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 2231ee0aa163d68383dd9636f25f033fe7c1f3e7)
2009-12-16 08:03:50 +01:00
Jelmer Vernooij
4c69b5e0fd tdb: Add simple reimplementation of tdbdump in Python as an example of the tdb Python bindings. (This used to be commit 47d797f788) (cherry picked from samba commit 6bdd1425b7)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 363c34d4bb488609317794cd3153d85c12643110)
2009-12-16 08:03:50 +01:00
Jeremy Allison
e58068d666 Remove unecessary msync. Jeremy. (cherry picked from samba commit 0bae1ef3de) (This used to be commit db2acaf46f) (cherry picked from samba commit a1cf3ad5d6)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 236dc2fa29b3c0caec51859dbd469f0a13f5917e)
2009-12-16 08:03:50 +01:00
Stefan Metzmacher
61c7444943 The msync manpage reports that msync *must* be called before munmap. Failure to do so may result in lost data. Fix an ifdef check, I really think we meant to check HAVE_MMAP here. (cherry picked from samba commit 74c8575b3f) (This used to be commit 8fd54bb55f) (cherry picked from samba commit b39e332bd7)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 5aa0ab328c36ecd4d7ec03f921e6027340c2ef13)
2009-12-16 08:03:50 +01:00
Volker Lendecke
a3b4f9b59e Attempt to fix bug 5684
With the ctdb checkin dde9f3f006 tdb optimized out write lock checks for
write-enabled transaction. Sadly, this also removed the possibility to ever
remove dead records left over from tdb_delete calls within a transaction.

Tridge, please check this! Did dde9f3f006 have any reason beyond performance
optimizations?

Thanks,

Volker
(cherry picked from samba commit 3f884c4ae3)
(This used to be commit 1d85e0647e)
(cherry picked from samba commit 8c88209c6f)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit b02bf7659f04f1fa203834bd75a2392b48e56c16)
2009-12-16 08:03:50 +01:00
Slava Semushin
06ab3cfe60 lib/tdb/tools/tdbtorture.c: fixed memory leak.
Found by cppcheck:
[lib/tdb/tools/tdbtorture.c:326]: (error) Memory leak: pids
(cherry picked from samba commit 497b9e460b)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 5d4cc4b018a538dc3f1d79fe091f3e6e67003daf)
2009-12-16 08:03:50 +01:00
Andrew Tridgell
3e04b100c1 added basic testing of tdb_transaction_prepare_commit() in tdbtorture (cherry picked from samba commit 84547b8dba)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 923b61fe722c0aec8a5b6ac8dd1df74957dc102b)
2009-12-16 08:03:50 +01:00
Andrew Tridgell
5b6b852691 make tdbbackup use transactions
tdbbackup was originally written before we had transactions, and it
attempted to use its own fsync() calls to make it safe. Now that we
have transactions we can do it in a much safer (and faster!) fashion
(cherry picked from samba commit 2e4247782b)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit cd23d36ada9631095ca68663516de0c8d8c3bbed)
2009-12-16 08:03:49 +01:00
Andrew Tridgell
6eaaa52a1d fixed tdbbackup to give tdb error messages (cherry picked from samba commit 08be1420ba)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 3d44412593b8748a5158e15b83cd9eb548231194)
2009-12-16 08:03:49 +01:00
Rusty Russell
b52a06ffc6 lib/tdb: add -t (always use transactions) option to tdbtorture
This means you can kill it at any time and expect no corruption.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(cherry picked from samba commit 0fc6800005)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit f7278a277ed91587cae5b5e3660dad7124bdb73f)
2009-12-16 08:03:49 +01:00
Rusty Russell
8dcc760f1e lib/tdb: wean off TDB_ERRCODE.
It was a regrettable hack which I used to reduce line count in tdb; in fact it caused confusion as can be seen in this patch.
In particular, ecode now needs to be set before TDB_LOG anyway, and having it exposed in
the header is useless (the struct tdb_context isn't defined, so it's doubly useless).
Also, we should never set errno, as io.c was doing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(cherry picked from samba commit b77f41d58b)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit a6620f6e74aadc708395b21b42303d1082192fcc)
2009-12-16 08:03:49 +01:00
Rusty Russell
7f857c4d14 lib/tdb: TDB_TRACE support (for developers)
When TDB_TRACE is defined (in tdb_private.h), verbose tracing of tdb operations is enabled.
This can be replayed using "replay_trace" from http://ccan.ozlabs.org/info/tdb.

The majority of this patch comes from moving internal functions to _<funcname> to
avoid double-tracing.  There should be no additional overhead for the normal (!TDB_TRACE)
case.

Note that the verbose traces compress really well with rzip.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit 703004340c)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit b01b756cb577f32a1ec4597efb00017241e01685)
2009-12-16 08:03:49 +01:00
Andrew Tridgell
805ef91707 tdb: fixed the intermittent failure of tdbtorture in the build farm
There was a race condition that caused the torture.tdb to be left in a
state that needed recovery. The torture code thought that any message
from the tdb code was an error, so the "recovered" message, which is a
TDB_DEBUG_TRACE message, marked the run as being an error when it
isn't.
(cherry picked from samba commit 5dcf0069b6)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 43c97b259b19c42b4edc7f83dbfc5e486568b4e3)
2009-12-16 08:03:49 +01:00
Michael Adam
8fd54bbbe1 tdb:tdbtool: fix indentation.
Michael
(cherry picked from samba commit e440a2e11e)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit c1b8d32b4ef87b9d8f37b451f47fcee2ea753d21)
2009-12-16 08:03:48 +01:00
Stefan Metzmacher
42648556a6 Fix all warnings in source3 with gcc4.3. Jeremy. (cherry picked from samba commit 07e0094365)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit e4d49c182e12c2d429d0414209cc2c8ccc19dc91)
2009-12-16 08:03:48 +01:00
Tim Prouty
a04fecb1c2 s3/s4: Fix "shadows a global declaration" warning (cherry picked from samba commit e48a5cd5d4)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 94f5728a77b8c772fb16c4744c24b45de8016e52)
2009-12-16 08:03:48 +01:00
Tim Prouty
c8366fcfb4 tdb: Fix some recently introduced warnings in tdbtool (cherry picked from samba commit c299833bf8)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 5830a2427b84e1cce74390b58fe12c45b5a056a6)
2009-12-16 08:03:48 +01:00
Andrew Tridgell
e0bed62820 added some more speed tests to tdbtool
This adds 3 simple speed tests to tdbtool, for transaction store,
store and fetch.

On my laptop this shows transactions costing about 10ms
(cherry picked from samba commit e15027155d)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 463279c972fa4538919bdd1dff48ca6b2fb8d49c)
2009-12-16 08:03:48 +01:00
Michael Adam
886cb3e86d tdb:tdbtool: add transaction_start/_commit/_cancel commands.
So one can perform tdbtool operations protected by transactions.

Michael
(cherry picked from samba commit 91e1bab2e9)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 35a5b874b925380f7c227e47aebb590c9db4739e)
2009-12-16 08:03:48 +01:00
Michael Adam
168bb40b4b tdb:tdbtool: add the "speed" command to the help text.
Michael
(cherry picked from samba commit 817383d88d)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit dc287a7d7420cca0b104049e689a73202bc535f8)
2009-12-16 08:03:47 +01:00
Holger Hetterich
0a0281444d Added a simple tdb integrity check to tdbtool. The command "check" runs traverse on the currently open tdb, and returns the number of entries if the integrity check is successful. (cherry picked from samba commit 42366bcbbd)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 02b35ba77672727c96ad004be37c7f6f1d3fe474)
2009-12-16 08:03:47 +01:00
Andrew Tridgell
09f7874151 tdb: allow reads after prepare commit
We previously only allowed a commit to happen after a prepare
commit. It is in fact safe to allow reads between a prepare and a
commit, and the s4 replication code can make use of that, so allow it.
(cherry picked from samba commit 46c99ec2a3)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 5ef5ddb8369e5e76173285fe9a08498dc8dc73ab)
2009-12-16 08:03:46 +01:00
Michael Adam
d05b49aaf2 tdb:mksigs: allow PRINTF_ATTRIBUTE(..) macros function types as funcion args
Michael
(cherry picked from samba commit 55dcf928eb)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit ef1dc585d869a9e48164cd65bafc92c1da245007)
2009-12-16 08:03:46 +01:00
Michael Adam
ab37ff7c04 tdb:mksigs: normalize bool -> _Bool
Michael
(cherry picked from samba commit cfa4e7ec75)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 0ae735b7a2096a40e5e47086ec41d9d45ef6d36b)
2009-12-16 08:03:46 +01:00
Michael Adam
a0dba36390 tdb:mksigs: ignore symbols (like _DEPRECATED_) after closing function parentheses
Michael
(cherry picked from samba commit 25939a627f)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 2e69647404c87c438ae7c180277ac3b532941efd)
2009-12-16 08:03:46 +01:00
Michael Adam
6d9ce0ef50 tdb:mksigs: correctly ignode multiline function typedefs
by first concatenating multilint parentheses and removing typefes afterwards.

Michael
(cherry picked from samba commit 13bfcd5a93)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 37225f1ed3f70d7259c2af2c51c671105c34476a)
2009-12-16 08:03:46 +01:00
Michael Adam
67da31e222 tdb:mksigs: ignore struct forward declarations.
Michael
(cherry picked from samba commit ecd12bfb38)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 66fffa577e051212ac7541be906b6c80f4a7c0c9)
2009-12-16 08:03:45 +01:00
Michael Adam
67d7709140 tdb:mksyms: allow characters after closing functions parenthesis.
Michael
(cherry picked from samba commit 400f08450b)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 310d673b7cb9000d76437d78e43bc2bf133e4e14)
2009-12-16 08:03:44 +01:00
Michael Adam
31b9126d29 tdb:mksyms: allow double pointer return value of functions.
Michael
(cherry picked from samba commit 907e05595f)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit f70e371d70e334a7254649b2bb09aa382e6f09bb)
2009-12-16 08:03:44 +01:00
Günther Deschner
9872a2faab tdb: fix c++ build warning.
Guenther
(cherry picked from samba commit 1c2f4919ab)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 9d5015e6fc68d3eb9e7b7178dbaf8c129dc79471)
2009-12-16 08:03:44 +01:00
Michael Adam
2b29e30df5 One would expect I could spell my name... (cherry picked from samba commit 0d120be36b)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit efa4a795db7fb2bddaab3969850d1554fc5f4da1)
2009-12-16 08:03:43 +01:00
Michael Adam
6e522eb198 tdb: add script/abi_checks.sh. check for abi changes without gcc magic.
USAGE: abi_checks.sh LIBRARY_NAME header1 [header2 ...]

This creates symbol signature lists using the mksyms and mksigs scripts
and compares them with the checked in lists.

Michael
(cherry picked from samba commit 9636e0d373)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 724d71dc838750fff91a45359feeb6e71bf0a4c7)
2009-12-16 08:03:43 +01:00
Michael Adam
52b657756a tdb: add script to extract signatures from header files.
This produces output like the output gcc produces when
invoked with the -aux-info switch.

Run like this: cat include/tdb.h | ./script/mksigs.pl

This simple parser is probably too coarse to handle all
possible header files, but it treats tdb.h correctly...

Michael
(cherry picked from samba commit 0760a04ef9)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 141422d9dc24b15b7b8bc7831adab90367a729f7)
2009-12-16 08:03:43 +01:00
Michael Adam
53336a9cb1 tdb: add scripts to extract library symbols (exports file) from headers
Michael
(cherry picked from samba commit 006fd0c43c)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit aed864dceaf6ec1e6e6066a587c708b485901200)
2009-12-16 08:03:43 +01:00
Rusty Russell
eb9d367843 lib/tdb: don't overwrite TDBs with different version numbers.
In future, this may happen, and we don't want to clobber them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit 398d0c2929)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit eebd467961dad6cfb38c2a5d6e4b4dbf86e55e63)
2009-12-16 08:03:43 +01:00
Jeremy Allison
41d4e2dc7c Add define guards around otherwise unused variable. Jeremy. (cherry picked from samba commit 4fc9f9c3f9)
Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 6f8614de0f20d4c507aecd744d9c3f6545078127)
2009-12-16 08:03:42 +01:00
Rusty Russell
e1217b7bdb There is one signedness issue in tdb which prevents traverses of TDB records over the 2G offset on systems which support 64 bit file offsets. This fixes that case.
On systems with 32 bit offsets, expansion and fcntl locking on these records
will fail anyway.  SAMBA already does '#define _FILE_OFFSET_BITS 64' in
config.h (on my 32-bit x86 Linux system at least) to get 64 bit file offsets.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(cherry picked from samba commit 252f7da702)

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 2d768f664e6db65b3b7e0c732f33ee2b806892f9)
2009-12-16 08:03:42 +01:00
Ronnie Sahlberg
640c48c844 Revert "cleanup: remove a tunable we no longer use in the eventscripts any more :"
This reverts commit 401f421fa003d9515df15e759b50b56e0c67d69c.

Conflicts:

	include/ctdb_private.h
	server/ctdb_tunables.c

(This used to be ctdb commit b883d19a495a41a22db37f9c2cf6250fee529de0)
2009-12-16 09:51:17 +11:00
Ronnie Sahlberg
fcd16342f6 Merge branch 'trans3'
(This used to be ctdb commit b765e12a5fb87a6121e49b349017b6a961929346)
2009-12-15 21:00:22 +11:00
Ronnie Sahlberg
b3104bd1d0 Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Tue Dec 15 15:53:30 2009 +1030

    eventscript: hack to avoid overloading valgrind

    Now we fork one child per script, when running under valgrind the
load
    gets quite high.  This is because valgrind does a lot of work after
exit,
    and we don't wait for the children to finish; we start the next one
when
    the child reports status via the pipe.

    This fix is ugly, but simple.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 6ed34d5320c39d8a55f2a36ad4c1ab574e0b0796)
2009-12-15 20:56:16 +11:00
Ronnie Sahlberg
842aa60d52 This is a dodgy patch.
I saw once where the master ctdbd logging structure was talloc freed
which caused issues.
So only free the structure if it is NOT the master structure.

This needs to be looked into in more detail.

(This used to be ctdb commit bcf494b81f4277dc75f05faccf0c446bd15f6e2b)
2009-12-15 19:04:52 +11:00
Ronnie Sahlberg
0982299bed Revert "Make fetch_locked more scalable"
This reverts commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d.

(This used to be ctdb commit 3d2d877d877146ca09a28a3a44f4840eb36fd377)
2009-12-15 14:26:28 +11:00
Ronnie Sahlberg
5a7e9900df Merge commit 'obnox/ctdb-wip-trans3' into trans3
(This used to be ctdb commit ac06a0e042e7d024060d6e87a49bda9ccc072c52)
2009-12-15 14:25:55 +11:00
Ronnie Sahlberg
3b53c02e34 add a new test tool that just locks and releases the same record over and over
(This used to be ctdb commit 24767be2eb9aed29704c2a4097bab5466cb6728f)
2009-12-15 12:14:49 +11:00
Ronnie Sahlberg
244bc5cc8f ctdb_fetch requires the number of nodes being specified.
Have it log an error and terminate if thie parameter was omitted

(This used to be ctdb commit 340be0179f55acfff77f8c3c8be958679227bde1)
2009-12-15 11:29:16 +11:00
Ronnie Sahlberg
e2e30df2e9 When setting up the logging, set the event to trigger a read of a log message from a child process as a child of the "log" structure and not the ctdb structure,
or else we can crash if we receive log messages from a child but the log structure has been freed()

(This used to be ctdb commit ea9e39369379939abf6a4076fa2014c10c1a9ad0)
2009-12-15 10:45:18 +11:00
Ronnie Sahlberg
db0d2a1b8f From rusty:
Subject: eventscript: fix spinning at 100% cpu when child exits.

ctdbd was spinning reading 0 from a pipe, as soon as the first
eventscript finishes.

This was caused by the intersection between a78b8ea7168e "Run only one
event for each epoll_wait/select call" and 32cfdc3aec34 "eventscript:
ctdb_fork_with_logging()".  Unavoidable mid-air collision, since both
worked fine and both were developed simultaneously.

When the script exits, we have two pipes open to it: one for any
stdout/stderr for logging (ctdb_log_handler), and one for the result
(ctdb_event_script_handler).  The latter frees everything, including
the log fd and event structure.

We used to get one callback to ctdb_log_handler, which got a harmless
0-length read, then one to ctdb_event_script_handler which cleaned up.
Now we only do one callback per poll, we need the logging function to
clean itself up so we can make process.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 211ea7907e8e96041aa6f7d086551d64d065a8a3)
2009-12-15 10:23:58 +11:00
Ronnie Sahlberg
649ba2631d Rename the tunable EventScriptBanCount to EventScriptTimeoutCount
since we no longer ban nodes when dodgy scripts continue to hang.

We now only mark nodes as unhealthy if monitor events fail or timeout. Never ban.

(This used to be ctdb commit 5c8e56fc7a518e115bceac257867739283cf6a1e)
2009-12-14 15:53:23 +11:00
Ronnie Sahlberg
ed6b5a8c68 cleanup: remove a tunable we no longer use in the eventscripts any more :
EventScriptUnhealthyOnTimeout

(This used to be ctdb commit 401f421fa003d9515df15e759b50b56e0c67d69c)
2009-12-14 15:48:47 +11:00
Rusty Russell
cab8da8dc4 ctdb: don't print OUTPUT: for DISABLED scripts
In other news, did you know ctime() returns a \n-terminated string?

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 1b4e7bb548976b99f122142b040494b6f9911962)
2009-12-14 15:46:49 +11:00
Rusty Russell
784fa9fd8a eventscript: fix monitoring when killed by another script command
Commit c1ba1392fe "eventscript: get rid of ctdb_control_event_script_finished
altogether" was wrong: there is one case where we want to free the script
without transferring their status to last_status.  This happens because we
always kill an running monitor command when we run any other command.

This still isn't quite right (and never was): the callback will be called
with status value 0, which might flip us to HEALTHY if we were unhealthy.
This is conveniently fixed in my next set of patches :)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 0ea0e27d93398df997d3df9d8bf112358af3a4a5)
2009-12-14 15:46:14 +11:00
Ronnie Sahlberg
e76561f544 remove the variable "disable when unhealthy"
there is no rational need for a setting where we permanently mark nodes as disabled everytime an eventscript fails

(This used to be ctdb commit 68a8ee99b128a5ec883600735626bdb3bbc9c503)
2009-12-14 15:40:54 +11:00
Michael Adam
b41d9a2bcc Revert "recovery: add special pull-logic for persistent databases"
This reverts commit 8aef46d2aab3efb322dda51eaa202653cefd5222.

This special recovery logic is wrong now with the transaction rewrite.
The treatment of persistent databases will later be rewritten to use the
database sequence number.

Michael

(This used to be ctdb commit c5a0aef668a63f927d6184612b13ce316eb4a0be)
2009-12-12 00:45:40 +01:00
Volker Lendecke
f6ea3e6bcf Make fetch_locked more scalable
This patch improves the handling of the fetch_lock operation on non-persistent
databases that ctdb clients have to do very frequently.

The normal flow how this goes is the following:

1. Client does a local fetch_lock on the database

2. Client looks if the local node is dmaster.
   If yes, everything is fine
   If no, continue here

3. Client unlocks the local record

4. Client issues a "get me the record" call to ctdbd

5. ctdbd goes out and fetches the dmaster role

6. ctdbd tells the client to retry

7. Client starts over again

The problem is between step 6 and 7: Before the client has had the chance to
retry (i.e. catch the record with a fetch_locked), another node might have come
asking ctdbd to migrate away the record again. This is a real problem, I've
seen >20 loops of this kind in real workloads.

This patch does the following: Whenever ctdb receives a record as result of
step 5, it puts the key on a "holdback list". As long as a key is on this list,
a request to migrate away the dmaster is put on hold. It is the client's duty
to issue the "CTDB_CONTROL_GOTIT" control when it has successfully done step 2
after having asked ctdb to fetch the record. This will release the key from the
"holdback list" and re-issue all dmaster migration requests.

As a safeguard against malicious clients, once a second (default 1000msecs,
tunable "HoldbackCleanupInterval" in milliseconds) ctdbd goes over the list of
held back keys, deletes them and releases all held back migration requests.

(This used to be ctdb commit 5736e17c139c9a8049e235429aeae0c6c9d0e93d)
2009-12-12 00:45:39 +01:00
Volker Lendecke
b664a86bc2 Import "talloc_array_length" from upstream talloc
(This used to be ctdb commit 844aa6300ee4d87561e698001ebc15ac1e455528)
2009-12-12 00:45:39 +01:00
Michael Adam
aea324336c tests: temporarily disable the transaction test tool.
Make it return success for make test.
This is temporarily disabled until the rewrite of the
transaction code (in samba and the daemon) using the global
lock feature has been ported to the ctdb client code.

Michael

(This used to be ctdb commit 78ca29352aa39f4ef4e41096b92d55cb2e0d348a)
2009-12-12 00:45:39 +01:00
Michael Adam
46de365e78 Add a new control CTDB_GET_DB_SEQNUM - fetch a persistent db's sequence number.
Michael

(This used to be ctdb commit a7e3b5fac6b3f5d74473f26eb86c067b35647996)
2009-12-12 00:45:39 +01:00
Michael Adam
8dedde81cd define CTDB_DB_SEQNUM_KEY - used with the new implementation of transactions.
Michael

(This used to be ctdb commit 4b1dbcf0853bdc4832d39a477823ae34f216da52)
2009-12-12 00:45:38 +01:00
Volker Lendecke
9f16f655fa Tiny simplification of ctdb_queue_packet()
(This used to be ctdb commit 1640da1cab7e8b545367824204c82931f3346848)
2009-12-12 00:45:38 +01:00
Volker Lendecke
24d04a3e89 Rename a struct member for clarity
(This used to be ctdb commit 6af5e74a21546d723008d69d6752ebebf898c947)
2009-12-12 00:45:37 +01:00
Michael Adam
faacd5ca79 server: add a new control CTDB_CONTROL_TRANS3_COMMIT
This is a simplified version of the trans2 commit control:
It just rolls out the marshall buffer to all active nodes.

It is the main ctdbd part of the re-implementation of the
persistent transactions. The client code is changed to
take a global lock to start a transactions and store into
the marshal buffer instead of writing to the local tdb
under a local transaction.

The old transaction implementation is going to be
removed in a later commit.

Michael

(This used to be ctdb commit f66428f9d2013080a414404c1ba6117888352fd6)
2009-12-12 00:43:26 +01:00
Ronnie Sahlberg
a8549ef700 From: Volker Lendecke <vl@samba.org>
Date: Wed, 9 Dec 2009 22:45:12 +0100
Subject: [PATCH] Revert an accidential commit

(This used to be ctdb commit af6656f2844d8fd72204a70358c9d589dbe1bd34)
2009-12-10 08:53:55 +11:00
Michael Adam
54b9a49e2e tests: remove the no_trans mode from ctdb_transaction.
Writes without transaction are not possible any more on
persistent databases.

Michael

(This used to be ctdb commit 59f46d7261dfdbdef900bf95dd9eb28ad22a46b2)
2009-12-09 22:04:48 +01:00
Michael Adam
332017925f tests: remove the persistent_unsafe writes test.
This is useless now that persistent write operations without
transaction are forbidden.

Michael

(This used to be ctdb commit b022863d44026c19d5aae54aa485b670bea0540e)
2009-12-09 21:57:00 +01:00
Michael Adam
aa6e42a4ba tests: remove persistent_safe write test.
This is useless now that persistent writes without transactions are forbidden.

Michael

(This used to be ctdb commit 9ac82311d796e1fab31f8de62b8ccc754445093c)
2009-12-09 21:56:59 +01:00
Michael Adam
c32ff2bbb0 test: add test 54_ctdb_transaction_recovery.sh
This is like the 53_ctdb_transaction test, but it additionally
runs a loop with recoveries while the transactions are running.

When called like this, the transaction loops run for 10 minutes:

CTDB_TEST_TIMELIMIT=600 tests/scripts/run_tests tests/simple/54_ctdb_transaction_recovery.sh

The default timelimit is 30 seconds.

Michael

(This used to be ctdb commit 2ff2679e8f3d50ebf735f2c420898a84268bdc95)
2009-12-09 21:56:59 +01:00
Michael Adam
edfc6a8c12 test: get value for --timelimit from environment var CTDB_TEST_TIMELIMIT in transaction test
Michael

(This used to be ctdb commit c13077ca64f6e6569c30ef7fcb044e5711dce1a3)
2009-12-09 21:56:59 +01:00
Michael Adam
c2c9a04cf2 client: lower level of commit retry message WARNING->DEBUG
This can happen frequently when recoveries intercept transactions.

Michael

(This used to be ctdb commit c46adb210e47530488503e20d682d4d182c0fb79)
2009-12-09 21:56:59 +01:00
Michael Adam
97d780bc20 client: lower debug level of transaction-active-retry message to DEBUG
This reduces some noise.

Michael

(This used to be ctdb commit 54d227811753f4a87f1a2c9dc0b1389f5ca2a12f)
2009-12-09 21:56:59 +01:00
Michael Adam
ea65e80223 call: lower the debug message "refusing migration while transction" to lvl INFO
This gets just too noisy on a busy system.
And it is purley informational anyways...

Michael

(This used to be ctdb commit 7f64a00c76203fdf6673c3f862a4bfd17fb848d7)
2009-12-09 21:56:59 +01:00
Volker Lendecke
a0d9bd3c13 Run only one event for each epoll_wait/select call
This might be a bit less efficient, but experience in winbind has shown that
event callbacks can trigger changes in the socket state in very hard to
diagnose ways.

(This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)
2009-12-10 07:52:16 +11:00
Christian Ambach
47f8c380d2 reduce vacuuming lognoise
syslog.h says:

LOG_NOTICE      5    normal but significant condition
LOG_INFO        6    informational

several vacuuming related logs logged at NOTICE level although I don't see
any real significance, these are just informational messages for me

Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>

(This used to be ctdb commit 142111983c103e90ccccbe26fd580c4eb28e949f)
2009-12-10 07:33:59 +11:00
Christian Ambach
4269d37ce8 improve time jump logging
add the __location__ macro to the logs to get a better idea
in which loop the problem occured

Signed-off-by: Christian Ambach <christian.ambach@de.ibm.com>

(This used to be ctdb commit dccb549fd6a6e338063699544e52f2a1a6a966b5)
2009-12-10 07:31:04 +11:00
Ronnie Sahlberg
839670253a Merge commit 'rusty/script-report'
(This used to be ctdb commit 6e8b279ed307eccac08386e98510361ba3ab3d36)
2009-12-09 14:26:42 +11:00
Ronnie Sahlberg
50820f9e18 Bond devices can have any name the user configures, so
when checking link status for an interface, first
check if this interface is in fact a bond device
(by the precense of a /proc/net/bonding/IFACE file)
and use that file for checking status.

Othervise assume ib* is an infiniband interface which we donnt know how
to check, or otherwise it is an ethernet interface and ethtool should
hopefully work.

(This used to be ctdb commit 8cc6c5de3d7abb0b72eaa6e769e70963b02d84cb)
2009-12-09 11:33:04 +11:00
Ronnie Sahlberg
3ca3f4c771 make sure to also check that interfaces used for NATGW are ok
and have a link.
if not the node should become unhealthy

(This used to be ctdb commit 03b5bbaae1b53830a4cd20d3079ab8f45ffce923)
2009-12-09 11:13:29 +11:00
Stefan Metzmacher
af170d1a8a events/50.samba: only use wbinfo --ping-dc if available
metze

(This used to be ctdb commit 7b73834ba3ac197cc8a3020c111f9bb2c567e70b)
2009-12-08 07:38:00 +11:00
Rusty Russell
a46c3b4f2a ctdb: scriptstatus can now query non-monitor events
We also no longer return an error before scripts have been run; a special
zero-length data means we have never run the scripts.

"ctdb scriptstatus all" returns all event script results.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9b90d671581e390e2892d3a68f3ca98d58bef4df)
2009-12-08 01:50:55 +10:30
Rusty Russell
5d99a1a47c eventscript: expost call names and enum
We're going to need this so ctdb can query non-monitor status.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 53bc5ca23ca55a3ac63a440051f16716944a2a51)
2009-12-08 01:47:13 +10:30
Rusty Russell
0dbe76f88f eventscript: lock logging on timeout.
Ronnie suggested this; seems like a very good idea.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 93153bca68926401dc9ae7fd77ed3f17be923344)
2009-12-08 01:32:36 +10:30
Rusty Russell
9e87377e7a ctdb: support --machinereadable (-Y) for scriptstatus
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 47ffe75848f216568ce3db0a60ca88cfe3d6903a)
2009-12-08 01:31:53 +10:30
Rusty Russell
b29067b02f eventscript: get rid of ctdb_control_event_script_finished altogether
We always have to call it before freeing the state; we should just do
this work in the destructor itself.

Unfortunately, the script state would already be freed by the time
the state destructor is called, so we make the script state a child of
ctdb, and talloc_free() it manually on the one path which doesn't use
the destructor.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c1ba1392fe52762960e896ace0aca0ee4faa94d5)
2009-12-08 12:29:10 +10:30
Rusty Russell
d3593c2f83 eventscript: save state for all script invocations
Rather than only tranferring to last_status for monitor events, do
it for every event (ctdb->last_status is now an array). 

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c73ea56275d4be76f7ed983d7565b20237dbdce3)
2009-12-08 12:27:48 +10:30
Rusty Russell
6960fa96eb eventscript: cleanup finished to take state arg
We only need ctdb->current_monitor so we can kill it when we want to run
something else; we don't need to use it here as we always know what script
we are running.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 4cf1b7c32bcf7e4b65aec1fa7ee1a4b162cac889)
2009-12-08 12:24:56 +10:30
Rusty Russell
e548a335bd eventscript: use wire format internally for script status.
The only difference between the exposed an internal structure now is
that the name and output fields were pointers.  Switch to using
ctdb_scripts_wire/ctdb_script_wire internally as well so marshalling
is a noop.

We now reject scripts which are too long and truncate logging to the
511 characters we have space for (the entire output will be in the
normal ctdbd log).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fd2f04554e604bc421806be96b987e601473a9b8)
2009-12-08 12:48:17 +10:30
Rusty Russell
9753b7e793 eventscript: rename ctdb_monitoring_wire to ctdb_scripts_wire
We're going to allow fetching status of all script runs, so this
name is no longer appropriate.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit f5cb41ecf3fa986b8af243e8546eb3b985cd902a)
2009-12-08 00:51:24 +10:30
Rusty Russell
3ff8bf8138 eventscript: get_current_script() helper
This neatens the code slightly.  We also use the name 'current' in
ctdb_event_script_handler() for uniformity.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e9661b383e0c50b9e3d114b7434dfe601aff5744)
2009-12-08 12:47:24 +10:30
Rusty Russell
cc678d572f eventscript: use an array rather than a linked list of scripts
This brings us closer to the wire format, by using a simple array
and a 'current' iterator.

The downside is that a 'struct ctdb_script' is no longer a talloc
object: the state must be passed to our log fn, and the current
script extracted with &state->scripts->scripts[state->current].

The wackiness of marshalling is simplified, and as a bonus, we can
distinguish between an empty event directory
(state->scripts->num_scripts == 0) and and error (state->scripts ==
NULL).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 76e8bdc11b953398ce8850de57aa51f30cb46bff)
2009-12-08 12:47:05 +10:30
Rusty Russell
1eda08ea29 eventscript: record script status for all events
This unifies almost everything: the state->current pointer points to
the struct ctdb_script where we record start, finish, status and
output.

We still only marshall up the monitor events; the rest disappear when
the state structure is freed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c476c81f3e3d8fc62f2e53d82fce5774044ee9ce)
2009-12-08 12:46:18 +10:30
Rusty Russell
9b50f7ee67 eventscript: use scripts array directly, rather than separate list
We rename ctdb_monitor_script_status to ctdb_script, and instead of
allocating them as the scripts are executed, we allocate them up front
and keep a "current" interator.

This slightly simplifies the code, though it means we only marshall up
to the last successfully run script.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b2a300768536d10bd867a987ad4cf1c5268c44bc)
2009-12-08 12:45:17 +10:30
Rusty Russell
23e24c503c eventscript: ctdb_fork_with_logging()
A new helper functions which sets up an event attached to the child's
stdout/stderr which gets routed to the logging callback after being
placed in the normal logs.

This is a generalization of the previous code which was hardcoded to
call ctdb_log_event_script_output.

The only subtlety is that we hang the child fds off the output buffer;
the destructor for that will flush, which means it has to be destroyed
before the output buffer is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 32cfdc3aec34272612f43a3588e4cabed9c85b68)
2009-12-08 12:44:30 +10:30
Rusty Russell
e84d2f7edb eventscript: pass struct ctdb_log_state directly to ctdb_log_handler().
The current logging logic assumes that any stdout/stderr belongs to
the currently running monitor script output.  This isn't quite right
anyway, and we'd like to capture stderr output of other script
invocations.

So we move towards multiple struct ctdb_log_state by handing it
directly to ctdb_log_handler to use, rather than having it assume
ctdb->log.  We need a ctdb pointer inside the log struct now though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 497766cf186442de00fb324343150442457be858)
2009-12-08 00:31:29 +10:30
Rusty Russell
c309d22f9a eventscript: remove unused ctbd_ctrl_event_script*
The child no longer uses ctdb_ctrl_event_script_init or
ctdb_ctrl_event_script_finished, and the others are redundant: it
doesn't need to tell us it's starting a script when it only runs one.

We move start and stop calls to the parent, and eliminate the RPC
infrastructure altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 391926a87a7af73840f10bb314c0a2f951a0854c)
2009-12-08 00:27:40 +10:30
Rusty Russell
69c30c6ba0 eventscript: refactor forking code into fork_child_for_script()
We do the same thing in two places: fire off a child from the initial
ctdb_event_script_callback_v() and also from the ctdb_event_script_handler()
when it's done.

Unify this logic into fork_child_for_script().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 814704a3286756d40c2a6c508c1c0b77fa711891)
2009-12-08 00:22:55 +10:30
Rusty Russell
dd53eee7a2 eventscript: fork() a child for each script.
We rename child_run_scripts() to child_run_script(), because it now
runs a single script rather than walking the list.  When it's
finished, we fork the next child from the ctdb_event_script_handler()
callback.

ctdb_control_event_script_init() and ctdb_control_event_script_finished()
are now called directly by the parent process; the child still calls
ctdb_ctrl_event_script_start() and ctdb_ctrl_event_script_stop() before
and after the script.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0fafdcb8d3532a05846abaa5805b2e2f3cee8f47)
2009-12-08 00:21:25 +10:30
Rusty Russell
640b22ff61 eventscript: store from_user and script_list inside state structure
This means all the state about running the scripts is in that structure,
which helps in the next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 020fd21e0905e7f11400f6537988645987f2bb32)
2009-12-08 00:15:18 +10:30
Rusty Russell
b8e347ec9c eventscript: use direct script state pointer for current monitor
We put a "scripts" member in ctdb_event_script_state, rather than using
a special struct for monitor events.  This will fit better as we further
unify the different events, and holds the reports from the child process
running each monitor script.

Rather than making the monitor state a child of current_monitor_status_ctx,
we just point current_monitor directly at it.  This means we need to reset
that pointer in the destructor for ctdb_event_script_state.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9a2b4f6b17e54685f878d75bad27aa5090b4571f)
2009-12-08 00:14:01 +10:30
Rusty Russell
a4c2a98ba9 eventscript: make current_monitor_status_ctx serve as monitor_event_script_ctx
We have monitor_event_script_ctx and other_event_script_ctx, and
current_monitor_status_ctx in struct ctdb_context.  This seems more
complex than it needs to be.

We use a single "event_script_ctx" as parent for all event script
state structures.  Then we explicitly reparent monitor events under
current_monitor_status_ctx: this is freed every script invocation to
kill off any running scripts anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0d925e6f2767691fa561f15bbb857a2aec531143)
2009-12-08 00:09:20 +10:30
Rusty Russell
68e224d9a4 eventscript: split ctdb_run_event_script into multiple parts
Simple refactoring in preparation for switching to one-child-per-script.
We also call the functions run by the child process "child_".

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit bfee777faff75e9bed4aedc1558957483616a6d3)
2009-12-07 23:55:03 +10:30
Rusty Russell
9a0c171fa7 eventscript: hoist work out of child process, into parent
This is the start of a move towards finer-grained reporting, with one
child per script.  Simple code motion to do sanity check and get the
list of scripts before fork().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 816b9177f51ae5b21b92ff4a404f548fe9723c96)
2009-12-07 23:53:35 +10:30
Rusty Russell
9914d3f561 eventscript: don't make ourselves healthy if we're under ban_count
If we've timed out, but we've not timed out more than
ctdb->tunable.script_ban_count, we pretend we haven't.

There's a logic bug in the way this is done: if we were unhealthy before,
this would set us to "healthy" again (status == 0).  I don't think this
would happen in real life, but it's a little surprising.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e6488c0e05bab5c4c2c0a6370930b0b27e5ed56e)
2009-12-07 23:52:01 +10:30
Rusty Russell
928b8dcb31 eventscript: handle banning within the callbacks
Currently the timeout handler in eventscript.c does the banning if a
timeout happens.  However, because monitor events are different, it has
to special case them.

As we call the callback anyway in this case, we should make that handle
-ETIME as it sees fit: for everyone but the monitor event, we simply ban
ourselves.  The more complicated monitor event banning logic is now in
ctdb_monitor.c where it belongs.

Note: I wrapped the other bans in "if (status == -ETIME)", though they
should probably ban themselves on any error.  This change should be a
noop.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 9ecee127e19a9e7cae114a66f3514ee7a75276c5)
2009-12-07 23:48:57 +10:30
Rusty Russell
5190932507 eventscript: expost ctdb_ban_self()
eventscript.c uses this now, but our next patch makes others use it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a305cb7743c24386e464f6b2efab7e2108bb1e7e)
2009-12-07 23:18:40 +10:30
Rusty Russell
0dd46797d6 eventscript: handle v. unlikely timeout race
If we time out just as the child exits, we currently will report an
uninitialized cb_status field.  Set it to -ETIME as expected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 024386931bda9757079f206238ae09bae4de6ea2)
2009-12-07 23:17:23 +10:30
Rusty Russell
d5d88ecaaf eventscript: replace other -1 returns with -errno
This completes our "problem with script" reporting; we never set cb_status
to -1 on error.  Real errnos are used where the failure is a system call
(eg. read, setpgid), otherwise -EIO is used if we couldn't communicate with
the parent.

The latter case is a bit useless, since the parent probably won't see
the error anyway, but it's neater.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1269458547795c90d544371332ba1de68df29548)
2009-12-07 23:15:56 +10:30
Rusty Russell
672e06f438 eventscript: simplify ctdb_run_event_script loop
If we break, we avoid cut & paste code inside the loop.  Need to initialize
ret to 0 for the "no scripts" case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit ec36ced9446da7e3bf866466d265ee8e18f606c1)
2009-12-07 23:13:12 +10:30
Rusty Russell
c70afe0cd4 eventscript: handle and report generic stat/execution errors
Rather than ignoring deleted event scripts (or pretending that they were "OK"),
and discarding other stat errors, we save the errno and turn it into a negative
status.

This gives us a bit more information if we can't execute a script (eg.
too many symlinks or other weird errors).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 5d894e1ae5228df6bbe4fc305ccba19803fa3798)
2009-12-07 23:12:19 +10:30
Rusty Russell
b9b75bd065 eventscript: use -ENOEXEC for disabled status value
This unifies code paths and simplifies things: we just hand -ENOEXEC to
ctdb_ctrl_event_script_stop().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit eadf5e44ef97d7703a7d3bce0e7ea0f21cb11f14)
2009-12-07 23:11:47 +10:30
Rusty Russell
ce378014c7 eventscript: enhance script delete race check
We currently assume 127 == script removed.  The script can also return 127;
best to re-check the execution status in this case (and for 126, which will
happen if the script is non-executable).

If the script is no longer executable/not present, we ignore it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0a53d6b5ac81daf0efa32f35e7758ede2a5bdb63)
2009-12-07 23:09:02 +10:30
Rusty Russell
8993d6f523 eventscript: check_executable() to centralize stat/perm checks
This is used later in the "script vanished" check.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8ddb97040842375daf378cbb5816d0c2b031fa65)
2009-12-07 23:09:39 +10:30
Rusty Russell
949803528d talloc: save errno over talloc_free
As we start to use errno more, it's a huge pain if talloc_free() can blatt
it (esp. destructors).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 76a0ca77feba14e1e1162c195ffbdf516e62aa4d)
2009-12-07 23:05:58 +10:30
Rusty Russell
066a791770 eventscript: use -ETIME for timeout status value
This starts the move toward more expressive encoding of return values:
positive values mean the script ran, negative means we had a problem with
the script (and the value is the errno).

This does timeout, but changes the ctdb tool to recognize it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 0eb1d0aa14e68b598d9e281c8a02b8f94a042fd9)
2009-12-07 23:09:42 +10:30
Rusty Russell
85a6f4a4dd eventscript: marshall onto last_status immediately
This simplifies the code a little: last_status is now read to go
(it's only used by the scriptstatus command at the moment).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6be931266a4e41fd0253f760936ad9707dd97c47)
2009-12-07 23:09:40 +10:30
Ronnie Sahlberg
2c80c91c87 version 1.0.108
(This used to be ctdb commit fff280878e670e93a818c0071f3172056214e8c4)
2009-12-07 19:04:41 +11:00
Ronnie Sahlberg
cdabe16777 Use wbinfo --ping-dc isntead of wbingo -p sicne this is a more reliable way to determine if winbindd is in a useful state.
(This used to be ctdb commit 7c95e56ba871a4e0cb893a5cb5d821e7ff6e6dd6)
2009-12-07 18:27:46 +11:00
Michael Adam
3420278b3a packaging: package tests/bin/ctdb_transaction under /usr/share/doc/tests/bin
For testing/diagnostic purposes.

Michael

(This used to be ctdb commit b796d736946856abfbe53de95dfcd73072ee8ccd)
2009-12-04 23:18:12 +01:00
Michael Adam
98c108fa33 client: improve two error messages in ctdb_transaction_commit().
Michael

(This used to be ctdb commit d971b2ca84c0451dc7e5acbf4a5ade06270a2044)
2009-12-04 15:06:54 +01:00
Michael Adam
c1039fba0e server:trans2_commit: move the check for active recovery down.
This needs to be done after the control-dispatcher:
In the TRANS2_COMMIT control, the client->db_id needs
to be set before bailing out, since otherwise the
next TRANS2_COMMIT_RETRY will fail...

Michael

(This used to be ctdb commit 59faf3f923a5989b5ee94ef02a12827412775bae)
2009-12-04 15:03:21 +01:00
Michael Adam
cc7438d87d client: increase the number of commit retries 10-->100
To cope with timeouts when recoveries and transactions collide.
Maybe 100 is too high.

Michael

(This used to be ctdb commit c23d804165e84bdf95ba960c953c736d361011d7)
2009-12-04 15:03:16 +01:00
Michael Adam
b3fd495522 client: untangle checks and produce more detailed error messages
in ctdb_transaction_fetch_start

Michael

(This used to be ctdb commit 428914377851a98b3fc893798783fbfebffc1c0d)
2009-12-04 15:03:16 +01:00
Michael Adam
7afefed6ae client: increase the rsn of the __transaction_lock__ when storing
So that it is correctly handled by recoveries.
Also explicitly set the dmaster field to the current node's pnn.

Michael

(This used to be ctdb commit 03a5bb727b9db1ba952632f08ceb5355f0df842d)
2009-12-04 15:02:41 +01:00
Michael Adam
ffe62722cb recovery: add special pull-logic for persistent databases
The decision mechanism which records of a persistent db
are to be pulled into the recdb during recovery is now
as follows:

* Usually a record with the higher rsn than that already
  stored is taken. (Just as for normal tdbs.)

* If a transaction is running on some node, then those
  nodes copies of all records are taken and are not
  overwritten later by other nodes' copies.

In order to keep track of whether a record's copy was obtained
from a node with a transaction running, the recovery mechanism
misuses the ctdb tdb header field 'lacount' in the recdb.
It is cleared later when pushing out the recdb database to the
other nodes.

This way, an incomplete transaction is not spoiled when
a recovery interrupts and the replay should usually succeed
(possibly after a few retries).

Michael

(This used to be ctdb commit 8aef46d2aab3efb322dda51eaa202653cefd5222)
2009-12-04 15:00:21 +01:00
Michael Adam
0635f8b98f make ctdb_ctrl_transaction_active public.
Michael

(This used to be ctdb commit e5496a83ef4a01604195b27c4b97f50d4979510e)
2009-12-04 11:30:22 +01:00
Michael Adam
9a8134e862 recovery: for persistent db's don't set the dmaster to the recmaster node number
It is important to keep track of the dmaster (i.e. the node that last committed
a transaction containing changes to this node).

Michael

(This used to be ctdb commit fe68972eb9cf3aa1f16ba1aacf57ade5d66e647c)
2009-12-04 11:30:21 +01:00
Michael Adam
f96e8166de recovery: pass the persistent flag to recover_database()
and further down to pull_remote_database(), pull_one_remote_database(),
and push_recdb_database().

This is in preparation of special handling of persistent databases
during recoveries.

Michael

(This used to be ctdb commit 90abc4ac7c16e854cf6e8f96b60a77bc92e35e07)
2009-12-04 11:30:21 +01:00
Michael Adam
814e3c501f tests:ctdb_transaction: print an extra counters when a commit fails
Michael

(This used to be ctdb commit 4113385865f53a57b18ea752a7dad8a08bed588e)
2009-12-04 11:30:21 +01:00
Michael Adam
27dc0adfb5 client: in catdb, print the keyname first, and separate records by a blank line
Michael

(This used to be ctdb commit b9882710e12f28c96a0af298e419160f00578241)
2009-12-04 11:30:21 +01:00
Michael Adam
f09090f9ba packaging: remove the lib/popt from the tarball in debian mode
Debian CTDB packaging fails when this is included.

Michael

(This used to be ctdb commit 574702f8d701fe3e493b31948420b2981eb36f93)
2009-12-04 11:30:21 +01:00
Michael Adam
522c60182e packaging: rework maketarball.sh to accept an arbitrary githas to pack
The githash can be specified through the environment variable "GITHASH"
that can contain a commit hash or a tag name, e.g.

The call syntax is now

[GITHASH=xyz] [USE_GITHASH=yes/no] [DEBIAN_MODE=yes/no] maketarball.sh

Michael

(This used to be ctdb commit 41aa9bdfa2934f564bdc14374362437dfad0045f)
2009-12-04 11:30:20 +01:00
Michael Adam
92c5d9eefc ctdb: add command "ctdb wipedb" to wipe the contents of an attached tdb
Michael

(This used to be ctdb commit 5a7c1e7f15693522bbf1c39a53be2304ece9a134)
2009-12-04 11:30:20 +01:00
Michael Adam
0213cb4d0b tests: turn printfs into DEBUG statements in the ctdb_transaction test
Michael

(This used to be ctdb commit 0e130d79ab71cf3aa65c40af91866823246a0283)
2009-12-04 11:30:20 +01:00
Martin Schwenke
7b6072b63d Merge branch 'status-test-2'
(This used to be ctdb commit 5fc297a6bd49d9366703eef3edb9bdf0fe8505cc)
2009-12-04 14:44:46 +11:00
Ronnie Sahlberg
e28c652cca Dont store debug level DEBUG_DEBUG in the in-memory ringbuffer.
It is unlikely we will need something this verbose for normal troubleshooting.
This allows us to keep a significantly longer time interval of log messages
in the 500k slots available in the ringbuffer.

(This used to be ctdb commit cc99c05c0c6484ad574039a454e6133852cb41fa)
2009-12-04 11:45:37 +11:00
Ronnie Sahlberg
8f442f1c0c Use statically allocated ringbuffer to store the last 500k log entries
in memory instead of dynamically allocated ones so that we reduce the pressure
on malloc/free.

(This used to be ctdb commit c5cbb95512f034abeec515579983bf7ac55eadd9)
2009-12-04 11:36:27 +11:00
Ronnie Sahlberg
daae501d91 Document the procedure to remove/change the NATGW configuration at
runtime without restarting the ctdb service

(This used to be ctdb commit 0a0526e03ef995b6b6634f5b75c7a17cb7b5df8f)
2009-12-04 08:33:56 +11:00
Rusty Russell
774bf144c1 eventscript: reduce code duplication for ending a script, and fix bug
Commit 50c2caed57c0 removed a gratuitous talloc_steal from the code in
ctdb_control_event_script_finished(), but not ctdb_event_script_timeout().

Easiest to call ctdb_control_event_script_finished() at the bottom of the
timeout routine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 17fa252d0d6981fbae8083a818f26d5ce9c5102e)
2009-12-02 16:15:57 +10:30
Ronnie Sahlberg
e56c5b2a67 lower the loglevel for the message that a client has attached to a persistent database
(This used to be ctdb commit 2027cf3881ba890648c543bacbfd5b06464efc10)
2009-12-02 14:53:21 +11:00
Ronnie Sahlberg
fab11acc65 lower the loglevel for the message that a client has attached through a domian socket
(This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)
2009-12-02 14:51:57 +11:00
Ronnie Sahlberg
6bad4a4836 Add a proper function to process a process-exist control in the daemon.
This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed.

If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw.

bz58185

(This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)
2009-12-02 13:58:27 +11:00
Ronnie Sahlberg
1c7de7a2ed Add a double linked list to the ctdb_context to store a mapping between client pids and client structures.
Add the mapping to the list everytime we accept() a new client connection
and set it up to remove in the destructor when the client structure is freed.

(This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)
2009-12-02 13:41:04 +11:00
Ronnie Sahlberg
bf27dc2d53 Use the PID we pick up from the domain socket when a client connects
and store this in the client structure.

There is no need to rely on the hack that samba sends some special message
handle registrations that encodes the pid in the srvid any more.

This might not work on AIX since I recall some issues to get the pid in
this way on that platform.

(This used to be ctdb commit b4a7efa7e53e060a91dea0e8e57b116e2aeacebf)
2009-12-02 13:17:12 +11:00
Ronnie Sahlberg
2b4fbe5c41 version 1.0.107
(This used to be ctdb commit 22f00368b4cb3a6bfb92033a7dbe693d31b41a54)
2009-12-02 11:28:42 +11:00
Rusty Russell
9e84872ecd ctdb_io: fix use-after-free on invalid packets
Wolfgang saw a talloc complaint about using freed memory in ctdb_tcp_read_cb.
His fix was to remove the talloc_free() in that function, which causes
loops when a socket is closed (as it does not get removed from the event
system), eg:
	netcat 192.168.1.2 4379 < /dev/null

The real bug is that when we have more than one pending packet in the
queue, we loop calling the callback without any safeguards should that
callback free the queue (as it tends to do on invalid packets).  This
can be reproduced by sending more than one bogus packet at once:
	# Length word at start: 4 == empty packet (assumed little endian)
	/usr/bin/printf \\4\\0\\0\\0\\4\\0\\0\\0 > /tmp/pkt
	netcat 192.168.1.2 4379 < /tmp/pkt

Using a destructor we can check if the callback frees us, and exit
immediately.  Elsewhere, we return after the callback anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 4d0523dd94fb07e860b3e8118691f93d1ef8d0fa)
2009-12-02 11:27:23 +11:00
Ronnie Sahlberg
6f045cad29 version 1.0.106
(This used to be ctdb commit b5a21fd39269a6e2a9d1c8182dd42a1773ccbb3f)
2009-12-02 11:26:51 +11:00
Martin Schwenke
b17bf38c64 Eventscripts: Fix syntax error in 00.ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9ea261f791ab919eb1ce5b37073b4f1d30694bb8)
2009-12-01 18:08:57 +11:00
Michael Adam
016d092169 packaging:maketarball.sh: add a DEBIAN_MODE to the tarball creation
It is triggered by setting DEBIAN_MODE=yes in the environment.
This creates a tarball suitable for use in debian packages.
The differences from the standard tarball are these:

* The tar ball file is called ctdb_VERSION.orig.tar.gz
* The base directory in the tar ball is ctdb-VERSION.orig/

Michael

(This used to be ctdb commit 83e7c161efa93cd7acdfc803142b4fb3bfde7538)
2009-12-01 18:02:20 +11:00
Michael Adam
15bd5fb8e7 configure:maketarball.sh: call autogen.sh and include configure in the tarball
Michael

(This used to be ctdb commit bc8aee079e09164e06533a1474f5e9d899795933)
2009-12-01 18:02:05 +11:00
Michael Adam
7430da3839 packaging:maketarball.sh: create the specfile from the ctdb.spec.in
Michael

(This used to be ctdb commit bb8d02abd88899d259085b9b23fa52accb222be9)
2009-12-01 18:01:46 +11:00
Martin Schwenke
50a26cf75e Eventscripts: Remove executable bit accidently set on some scripts.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4c6e68ae942c05224c5f8b683fbc2dc1adced8ee)
2009-12-01 17:54:45 +11:00
Martin Schwenke
db25ca69e5 Eventscript argument cleanups and introduction of ctdb_standard_event_handler.
The functions file no longer causes a side-effect by doing a shift.
It also doesn't set a convenience variable for $1.

All eventscripts now explicitly use "$1" in their case statement, as
does the initscript.  The absence of a shift means that the
takeip/releaseip events now explicitly reference $2-$4 rather than
$1-$3.

New function ctdb_standard_event_handler handles the status and
setstatus events, and exits for either of those events.  It is called
via a default case in each eventscript, replacing an explicit status
case where applicable.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3d55408cbbb3bb71670b80f3dad5639ea0be5b5b)
2009-12-01 17:43:47 +11:00
Ronnie Sahlberg
2000711cb1 when we detect a ip-allocation mismatch, just force a new ip reassignment
instead of a full blown recovery

(This used to be ctdb commit 4f50aa8bb8be544058523f2f544109a26c2b3b51)
2009-12-01 16:06:59 +11:00
Ronnie Sahlberg
698a0e4e9a When starting up ctdbd, wait until all initial recoveries have finished
and until we have gone through a full re-recovery timeout without triggering
any pending recoveries before we start up the services and start monitoring
the node.

(This used to be ctdb commit 821333afb458358f90446062b0242790695e5060)
2009-12-01 13:19:58 +11:00
Ronnie Sahlberg
569001afd0 Merge commit 'martins/status-test-2'
Conflicts:

	server/eventscript.c

(This used to be ctdb commit e9b3477a5b9a2eff18f727e7d59338bfb5214793)
2009-12-01 10:53:18 +11:00
Martin Schwenke
ad431c3520 Event scripts: functions file now intercepts status and setstatus.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1f37fdc5217e57d2d643d77a811afca747685e0)
2009-11-27 15:57:33 +11:00
Ronnie Sahlberg
3bc643b46b remove a stray ) so we compile
(This used to be ctdb commit 16db4882635d84b8410a77e2ea8b08d0a257b0ab)
2009-11-27 13:35:39 +11:00
Ronnie Sahlberg
266a163c89 dont use talloc_steal() on a object that is already a child of ctdb.
(This used to be ctdb commit 50c2caed57c0520f506eaaeeb0bba2c272da6ef6)
2009-11-27 13:28:31 +11:00
Ronnie Sahlberg
eaa6218def Merge commit 'martins/status-test' into status-test-2
(This used to be ctdb commit 937823cc73eb098230acff4b1583f6d01f26c21a)
2009-11-27 12:50:45 +11:00
Martin Schwenke
dc2c8dfde1 Merge commit 'martins-svart/status-test-2' into status-test
(This used to be ctdb commit 0e6c06ac38fd82adf124d111717502055501974a)
2009-11-27 12:49:31 +11:00
Martin Schwenke
ce06d3de46 Event script infrastructure: add reload event to check_options().
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c278c798d41a35f58ca81f8f0e08e4dab51eba9b)
2009-11-27 12:04:02 +11:00
Ronnie Sahlberg
09b9bb2f9f Merge commit 'martins/status-test' into status-test-2
(This used to be ctdb commit 28d0648725e7de4e4d0e8569e3fbfb0fa1d7f934)
2009-11-26 16:26:25 +11:00
Martin Schwenke
88cd194d6a Merge commit 'martins-svart/status-test-2' into status-test
(This used to be ctdb commit 143f1fa3cc4588505e3992c601153ea08be8432d)
2009-11-26 16:25:15 +11:00
Martin Schwenke
a64ccf07c1 Add flag to ctdb_event_script_callback indicating when called by client.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a1d654a982ca56fade82552f4e6b5586236d3233)
2009-11-26 15:49:49 +11:00
Ronnie Sahlberg
ed4f3ea3cc resolve some conflicts from merging from martins branch
(This used to be ctdb commit d3e7407dc9854ec358d081777c5450ec68b17862)
2009-11-26 13:42:12 +11:00
Ronnie Sahlberg
e17fa0fdee change the lock wait child handling to use a pipe isntead of a socketpair
remove a stray alarm(30) that caused databases to be unlocked after 30 seconds.

(This used to be ctdb commit 12b187f971d857353403393a9850503e0e558672)
2009-11-26 12:08:35 +11:00
Martin Schwenke
8029db6a91 Merge commit 'martins-svart/status-test-2' into status-test
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a2830594ebeb54eb51ff90999cb12370aeec6e8b)
2009-11-26 10:49:47 +11:00
Martin Schwenke
ece15620c0 Event scripts: use $script_name rather than $service name for status.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 517e9d9b188b18dffc712a8fecddb41540d27b8d)
2009-11-25 16:42:14 +11:00
Martin Schwenke
ee10ea202b Event scripts: Respect CTDB_MANAGES_NFS and add function log_status_cat.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5d97c07be13a8209a81dfc8f73e49371949e4dc3)
2009-11-25 16:34:49 +11:00
Martin Schwenke
1edcb89948 More eventscript cleanups. Initial smoke testing seems OK.
Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3c93336ab92c2e4829ff4dc360045bfa6df21d50)
2009-11-25 16:30:47 +11:00
Ronnie Sahlberg
926261aafc use a binary tree and sort all ipv4/v6 addresses before we assign them out on nodes.
(This used to be ctdb commit 862526e558099fad4c8259cb88da9b776aa7f80d)
2009-11-25 11:54:40 +11:00
Rusty Russell
3188df4a88 eventscript: check that ctdb forced script events correct
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 1d24a3869fe89fc9a109fd9e9b69df5fc665a5f6)
2009-11-25 11:02:29 +10:30
Ronnie Sahlberg
cd44c8b4e5 iIt is better to plainly disallow clietnts from connecting here
if the node is BANNED.
Dont even let them attach at all
to the database

Revert "temporarily try allowing clients to attach to databases even if
the node is banned/stopped or inactive in any other way."

This reverts commit 227fe99f105bdc3a4f1000f238cbe3adeb3f22f0.

(This used to be ctdb commit 10a3680fb3917ecafc824e73872eace321026172)
2009-11-25 08:03:42 +11:00
Martin Schwenke
1c7445d547 Merge commit 'origin/status-test' into status-test
(This used to be ctdb commit 2e60749de3714239224cc04170a9aeeee158153f)
2009-11-24 16:14:54 +11:00
Rusty Russell
ff59bb34af eventscript: check that ctdb forced script events correct
Now we're doing checking, we might as well make sure the commands from
"ctdb eventscripts" are valid.

This gets rid of the "UNKNOWN" event type.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 66b22980b14601f29fe8cc64bd8f29883c7ca1c0)
2009-11-24 11:24:22 +10:30
Rusty Russell
0b4b83aea0 eventscript: check that internal script events are being invoked correctly
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 83b7b233cb4707e826f6ba260bd630c8bc8f1e76)
2009-11-24 11:23:13 +10:30
Rusty Russell
187efa08ab eventscript: check that internal script events are being invoked correctly
This is not as good as a compile-time check, but at least we count the
number of arguments are correct.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit a6d353519932eee48f9241ad8887b692882906c9)
2009-11-24 11:23:13 +10:30
Rusty Russell
534c709cba eventscript: remove call name from state->options
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 33fb0e7ba047ca73969b59bccf70a04a17c25a0a)
2009-11-24 11:22:46 +10:30
Rusty Russell
0ef91a4e1f eventscript: remove call name from state->options
Finally, we remove the call name (eg. "monitor" or "start") from the
options field of the struct: it now contains only extra options.

This is clearer, and mainly involves adding some %s to debug statements.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b0648c7f08eba87ec3c9714e2525c9b621bfb4ef)
2009-11-24 11:22:46 +10:30
Rusty Russell
461f52736d eventscript: put call type into state struct.
This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6c79fa33e26cc4f0873577f8e122b1495b4c427e)
2009-11-24 11:19:58 +10:30
Rusty Russell
205011cb61 eventscript: put call type into state struct.
This means we can get rid of more strcmp; they can simply use the
state->call value instead.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 834c93b3e1b8f4151b8a2cd82c2dd8bacc17f66c)
2009-11-24 11:19:58 +10:30
Rusty Russell
2d9254404d eventscript: introduce enum for different event script calls.
Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 8001488be4f2beb25e943fe01b2afc2e8779930d)
2009-11-24 11:16:49 +10:30
Rusty Russell
e0c6e2f489 eventscript: introduce enum for different event script calls.
Rather than doing strcmp everywhere, pass an explicit enum around.  This
also subtly documents what options are available.  The "options" arg
is now used for extra arguments only.

Unfortunately, gcc complains on empty format strings, so we make
ctdb_event_script() take no varargs, and add ctdb_event_script_args().  We
leave ctdb_event_script_callback() taking varargs, which means callers
have to do "%s", "".

For the moment, we have CTDB_EVENT_UNKNOWN for handling forced scripts
from the ctdb tool.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 470822b329f9d3ca9bef518b56e9ce28d5fedda2)
2009-11-24 11:16:49 +10:30
Rusty Russell
2763df22de eventscript: put timeout inside ctdb_event_script_callback_v
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 533c3e053293941d2a9484b495e78d45f478bb08)
2009-11-24 11:09:46 +10:30
Rusty Russell
5dee5769d3 eventscript: put timeout inside ctdb_event_script_callback_v
Everyone uses the same timeout value, so just remove it from the API.
If we ever need variable timeouts, that might as well be central too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit fe8027309c1f7b987cd368fa98f9b28741baa786)
2009-11-24 11:09:46 +10:30
Rusty Russell
3845c6e5b8 eventscript: cleanup ctdb_event_script_v
ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
a better name, and fix comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 466beafadb37011fe273de8810ab0012e92a1fd8)
2009-11-24 11:09:01 +10:30
Rusty Russell
1d68bb35b2 eventscript: typo cleanups
1) ctdb_event_script_v doesn't take varargs.  ctdb_run_event_script is
   a better name, and fix comment.
2) Fix indentation on allowed_scripts.
3) Comment on run_eventscripts_callback is wrong; it's the callback
   for any ctdb forced event.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit e7d57d7ae678b24dab3364a348838c6a3398942c)
2009-11-24 11:08:39 +10:30
Rusty Russell
ab675516cc eventscript: fix bug in timeouts on forced eventscripts. Again.
In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit b90bdb07c1f6913ddbf11bde9684bdc8af61c549)
2009-11-24 11:06:53 +10:30
Rusty Russell
0339a83897 eventscript: fix bug in timeouts on forced eventscripts. Again.
In 15bc66ae801b0c69, Ronnie fixed a double-free race.  The problem was that
ctdb_run_eventscripts() hands a context to ctdb_event_script_callback() to
hang its data off, which gets freed in the callback.  This particularly
hurt in ctdb_event_script_timeout.

There's nothing wrong with this, but obviously we should make the callback
call last of all.  At the time, ctdb_event_script_timeout() carefully
extracted everything from the struct ctdb_event_script_state before
calling ->callback.

This was cleaned up in 64da4402c6ad485f (Ronnie again), and now state
was referred to after the callback again.  But the same change introduced
a direct use-after-free bug which caused an occasional oops.

So in our last episode (eda052101728cf92) Volker fixed this, and Michael
committed it.

But we still have the double free bug which 15bc66ae801b0c69 was supposed
to fix!  Let's try to fix this in a more permanent way, but always doing
the callback from the destructor.  This means we need to hold the status,
and don't send the KILL signal if ->child is set to 0.

Finally, add a comment about freeing ourselves in run_eventscripts_callback
and the structure definition.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 20b15de068d042b292725945927ceda1b01d07c0)
2009-11-24 11:06:53 +10:30
Rusty Russell
8723045c61 eventscript: clean up forked handler event code
Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 6804f880436645b52c09a78fa300377fa8058d0e)
2009-11-24 11:00:13 +10:30
Ronnie Sahlberg
e6b69fa760 rework and simplify the eventscript handling
This version has no trailing whitespace, and fixed 


(This used to be ctdb commit defbe318152fc479e8076ad70433cdb4971951af)
2009-11-25 11:00:11 +10:30
Rusty Russell
b320d434b2 eventscript: clean up forked handler event code
Write the whole int through the pipe, rather than quietly cutting it
off.  Also, use -2 as the result if the read fails; -1 comes from many
paths if the child fails before running the script.

Add a comment about why we don't need to check the write.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit c715746c2f40eb9b21dbf011d16f1f1b0b53fdf9)
2009-11-24 11:00:13 +10:30
Ronnie Sahlberg
a3d072049e reduce the log level for three vacuuming related log messages
(This used to be ctdb commit fbc453733d53359b9eba34a7ca9123237a7ecca5)
2009-11-24 09:27:22 +11:00
Ronnie Sahlberg
eb3b787394 rework and simplify the eventscript handling
(This used to be ctdb commit c5f798116bf3b7954e23c7267b056ee1f5560f45)
2009-11-24 07:40:51 +11:00
Martin Schwenke
d595f41f38 More eventscript cleanups. Initial smoke testing seems OK.
Apart from lots of cleanup work, this also fixes a bug where the share
checks didn't used to cope with directory names containing spaces.
The previous commit also loaded the config incorrectly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 35a60a63a9b5c7d98dde514ae552239506b691c9)
2009-11-20 16:45:36 +11:00
Martin Schwenke
a4a048b5cd Now vaguely tested initscript updates.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f1e350f9edb74cc44b6c5be4c062fd93e98ba8c4)
2009-11-19 16:48:19 +11:00
Martin Schwenke
ee513c1ba2 More untested eventscript factorisation.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ac655b0a65b32d809d47fec9821f7f31bb2fe2a7)
2009-11-19 15:00:17 +11:00
Martin Schwenke
4ea6069de4 Test suite: Make the CIFS tickle test wait until it sees the required tickle.
The test depended on the exit code of "ctdb gettickles", which always
succeeds.  This change wraps the command in a function that checks
whether the tickle we're interested in is registered.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c4b05a731e1bee8f5b46529773a4f5389b2b6064)
2009-11-19 14:54:05 +11:00
Ronnie Sahlberg
894a2f9c0b new version 1.0.105
(This used to be ctdb commit 5fdf842db09cd806248cdbdce2270f39ed213872)
2009-11-19 11:08:14 +11:00
Ronnie Sahlberg
ae209c74c8 dont reset the event script context everytime we start a new "ctdb eventscript ..."
command.
Use the existing context used for non-monitor events

Multiple concurrent uses of "ctdb eventscript ..." could otherwise lead to a SEGV

(This used to be ctdb commit 80a8d728e9680040e00d24361dfc9367dd372a56)
2009-11-19 11:03:51 +11:00
Ronnie Sahlberg
cc2d81a77c make the ringbuffer logging more efficient and marshall the data by writing to a tmpfile instead of continously talloc resizing a blob
(This used to be ctdb commit 6427f0b68d60b556a023f64e15e156000ba6f943)
2009-11-18 19:10:50 +11:00
Ronnie Sahlberg
bc2675119d add an in memory ringbuffer where we store the last 500000 log entries regardless of log level.
add commandt to extract this in memory buffer and to clear it

(This used to be ctdb commit 29d2ee8d9c6c6f36b2334480f646d6db209f370e)
2009-11-18 12:44:18 +11:00
Ronnie Sahlberg
24c593d21f create a new event context for the syslog daemon
(This used to be ctdb commit 354c0edacf2d6cec5b295e139d4fec618bad1b06)
2009-11-17 12:07:10 +11:00
Ronnie Sahlberg
61de178e0a set up a pipe betweent he main daemon and the child we use for syslogling so that we can clean up the childprocess when we stop ctdbd
(This used to be ctdb commit cb8df973ccd446d87fbdd9a27843e54841ba5d89)
2009-11-16 15:17:32 +11:00
Martin Schwenke
73cb65bf1a Eventscripts: Untested factorisations and introduction of status event.
This is the first stage of an experimental change to eventscripts.
Ronnie and I did a few hours of factorisation of 40.vsftpd and applied
many of the changes to 41.httpd.  Other eventscripts were also
modified.

At this stage this is completely untested.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 364e70b763f0ccd7714d15723ad3ea4d7e2968a1)
2009-11-13 18:28:25 +11:00
Ronnie Sahlberg
93d902e8f7 test of a change to make ctdbd use "status" event instead of the "monitor" event.
This allows running the actual monitoring asynchronously from ctdbd
and only using "status" to pick up the actual results.

(This used to be ctdb commit 1908bac812650ca25151051f5d86815e0b8ed319)
2009-11-13 12:37:55 +11:00
Ronnie Sahlberg
2861bbdd5a Merge commit 'martins/master'
(This used to be ctdb commit b6bde176af69354ccfb00e6a3169f6b355a59d15)
2009-11-13 12:25:31 +11:00
Martin Schwenke
386d23757b Test suite: Fix the NFS and CIFS tickle tests.
The NFS test sleeps for MonitorInterval to give CTDB time to record an
NFS tickle.  However, this isn't always long enough.  This changes the
test to wait until a monitor event has actually occurred.

The CIFS test assumes that Samba is able to register a tickle with
CTDB before it notices that netstat has registered the tickle and can
use onnode to ask CTDB about it.  That is an incorrect assumption -
sometimes we can get to the point of asking CTDB about the tickle
before Samba and CTDB have processed it.  This adds a timeout loop
that makes the CIFS test wait until the tickle has been registered or
fail after 10 seconds.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 20a9d35933d89dc7eb710075f360686a49d78609)
2009-11-13 09:44:34 +11:00
Martin Schwenke
9dabb86f3f Merge commit 'origin/master'
(This used to be ctdb commit ffb911896704ddf6bd5a66e43ba2ae8c382e68de)
2009-11-11 12:16:30 +11:00
Mathieu Parent
2a66b7dae4 Fix bashism in events.d/11.natgw
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 6ccb495d1110157c06596763c7e252f3182c251e)
2009-11-10 12:07:30 +01:00
Ronnie Sahlberg
14a6592511 version 1.0.104
(This used to be ctdb commit 5e13a25df5ccf184bd48595c99765a592bbc5969)
2009-11-06 11:16:05 +11:00
Ronnie Sahlberg
3cbaf935af sugegstion from metze,
use killtcp and kill both directions of the nfs connections.
we used to kill only one direction since the other direction was unkillble
but recent kernels allow us to kill both

(This used to be ctdb commit 8001ae580bcc28d45f6026b529d7ffc247cbba34)
2009-11-06 09:54:03 +11:00
Ronnie Sahlberg
f88fbb5f1e suggestion from Christian,
dont allow UNHEALTHY nodes to become natgw master, unless all nodes
are unhealthy

(This used to be ctdb commit e8e7129ff1371065fbd75e1aea844d6d04a96fa9)
2009-11-06 08:19:32 +11:00
Volker Lendecke
1fa1830f81 Fix a segfault in the eventscript timeout handler.
The state was freed too early.

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit eda052101728cf922ce892e3c53b4f37e7ceac42)
2009-11-05 11:13:53 +01:00
Michael Adam
85a4d9a943 ctdb.sysconfig: add a comment section about CTDB_RUN_TIMEOUT_MONITOR
Michael

(This used to be ctdb commit b7dc1e0720991cc65353e07cf87608acea21ba27)
2009-11-05 11:13:53 +01:00
Michael Adam
95333e0ee7 Add a 99.timeout event script to trigger monitor timeouts.
This just sleeps for twice the value of EventScriptTimeout
in the monitor action. It is not run by default, but
can be activated by setting CTDB_RUN_TIMEOUT_MONITOR
in /etc/sysconfig/ctdb .

Michael

(This used to be ctdb commit 1a3ecdee85b82bb3234a92ae6bcdeb92238eb7ee)
2009-11-05 11:13:47 +01:00
Ronnie Sahlberg
d8f7fd88ac dont use the pointer after it has been talloc_free()d.
(This used to be ctdb commit 1cbf06a126621b3e932925cdad2ef9c009f93d4e)
2009-11-05 16:07:23 +11:00
Ronnie Sahlberg
0d3bff5fa6 From Rusty
It's much nicer for post-mortem debugging to have a body to examine.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 058e21d96c3c02759833fd5ddfe7b43e6a5f5740)
2009-11-05 15:57:46 +11:00
Ronnie Sahlberg
c915f2e5d5 add an extra test for the bond devices and check that there is an active slave.
this to handle the case where all links do have a physical layer, but where all slaves have been disabled using ifdown

(This used to be ctdb commit bf50709630df000583f2b0ef0edc177c01d60eaf)
2009-11-05 12:12:06 +11:00
Ronnie Sahlberg
2501638e15 dont verify winbindd is running properly at startup
(This used to be ctdb commit 9e1b99221c8f257129641f6eda2795537b7ce9de)
2009-11-04 07:50:26 +11:00
Ronnie Sahlberg
666d1d019b new version 1.0.103
(This used to be ctdb commit 020e2e30e56b9675f345ee62d6bf585396208059)
2009-11-03 11:46:37 +11:00
Ronnie Sahlberg
4bf4e15379 move the check to skip vacuuming on persistent database to the ctdb_vacuuming_init() function
(This used to be ctdb commit fb83dba255fc91413a475b273e374e0c4d538137)
2009-11-03 10:48:27 +11:00
Michael Adam
e38dda00e7 packaging: use githash in rpm release by default.
setting USE_GITHASH=no in the environment makes
makerpms.sh omit the git hash

Michael

(This used to be ctdb commit 209ff041596e39688186c99995863ed3e816b8e4)
2009-11-03 00:16:28 +01:00
Michael Adam
fe9929165f server: disable vacuuming for persistent tdbs.
The vacuum process treats persistent databases the same as
non-persistent and thus ignores the extra state for transactions.
This way, it breaks the api-level transactions.

Michael

(This used to be ctdb commit f98fefbc566eefbfcc660646af6e25256ab82b13)
2009-11-03 00:16:28 +01:00
Michael Adam
c532347a45 client: randomize the transaction_start retry loop:
instead of sleeping 1 second, sleep between 1 and 100 milliseconds

Michael

(This used to be ctdb commit a5d90d8ed8b44355c4ffb9c32ded772025fcc174)
2009-10-30 22:02:21 +11:00
Michael Adam
de875c7eec Revert "dont exit on a commit failure"
This reverts commit 4e9a3a5dc232bac12ab387ea0cf4f1b279bed5c1.

Transaction commit should not be allowed to fail.
This is a real error.

Michael

(This used to be ctdb commit 825c506da76d7afd0714b75b8c8727874183a618)
2009-10-30 22:01:53 +11:00
Michael Adam
118185670d client: fix a race in the local race condition fix in transaction_start
The gap that remained is between checking whether a transaction commit
is in progress and taking the lock. Now we first take the lock and then
check whether a transaction commit is in progress. If so, we release the
lock, wait for one second and retry.

Michael

(This used to be ctdb commit b95524c08bf12914120cb6c818ecc1c99738fe37)
2009-10-30 22:01:16 +11:00
Michael Adam
c2855a11a8 client: add a debug message when a transaction_commit needs to be retried
Michael

(This used to be ctdb commit 9e4902c7d3ad1329c296f4196fcb1396f2a7a6a0)
2009-10-30 22:00:42 +11:00
Michael Adam
5fa3a2c96a packaging(RPM): don't touch the run levels in ctdb install/udpate.
We should really leave it up to the administrator to decide
whether ctdb should be started automatically at boot-time.

Michael

(This used to be ctdb commit c1d8496f9fd5e8046f3d990264258dfb054f3b32)
2009-10-30 21:42:34 +11:00
Ronnie Sahlberg
e33722a569 start the syslog child a little later, after we have forked and detached from the local shell
(This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)
2009-10-30 19:39:11 +11:00
Ronnie Sahlberg
5d73f19418 create a child process to write to syslog.
use a udp socket on the ctdbd port to send messages to teh syslog child process for loggign.

we need this when syslog becomes "slow",   like very slow, and on boxes where syslog is limited to 100 lines per second and starts to block after that

(This used to be ctdb commit 1446f4c247310e2ff2d522055bd8927d1a78d017)
2009-10-30 18:53:17 +11:00
Michael Adam
673a8588b1 server: fix debug message in trans2_commit (refusing persistent store during transaction)
log the right db_id
also log the client_id

Michael

(This used to be ctdb commit 48ac5c77698ab7a28d24629cc8a6985011c5d14d)
2009-10-30 09:29:25 +11:00
Michael Adam
45c17515c3 client: log db_id as 8-digit hex in ctdb_transaction_fetch_start()
Michael

(This used to be ctdb commit d7b9babda2f7c7f7b95ee19ec75c37200816c6ef)
2009-10-30 09:28:49 +11:00
Michael Adam
1de0c6f807 server: uniformly log db and client ids as 8-digit hex numbers in trans2_commit
Michael

(This used to be ctdb commit 2febdd23f754a2d4699bed36b941442ab362a376)
2009-10-30 09:28:06 +11:00