1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-25 06:04:04 +03:00

2734 Commits

Author SHA1 Message Date
Ronnie Sahlberg
6f80ff4109 From Elia Pinto <gitter.spiros@gmail.com>
We dont need to include getopt.h under AIX

(This used to be ctdb commit fcebbc3484ce56c57def745ea51c053dfb02a657)
2010-02-22 14:00:33 +11:00
Ronnie Sahlberg
68decc38ca Ignore any scripts that timesout for most events, except startup.
Threat hung scripts always (except startup) as success.

(This used to be ctdb commit b6d939c9758c7d2e39206838492f2f644dd61db7)
2010-02-16 11:21:27 +11:00
Ronnie Sahlberg
5439401dd2 try to restart rpc-rquotad if it is not running
bz60317

(This used to be ctdb commit 2263cd74d511247debadd0f6602bc6396b46ac5e)
2010-02-16 11:02:37 +11:00
Rusty Russell
435fb78d13 Leave sequence number alone when merely migrating records.
(Based on earlier version from Ronnie which modified tdb; this one
is standalone).

When storing records in a tdb that has "automatic seqnum updates"
also check if the actual data for the record has changed or not.

If it has not changed at all, except for possibly the header,
this is likely just a dmaster migration operation in which case
we want to write the record to the tdb but we do not want the tdb
sequence number to be increased.

This resolves the problem of notify.tdb being thrashed under load:
the heuristic in smbd to only reread this when the sequence number
increases (rarely) breaks down.

Before, running nbench --num-progs=512 across 4 nodes, we saw numbers like:
 512      1496  118.33 MB/sec  execute 60 sec  latency 0.00 msec
And turning on latency tracking, this was typical in the logs:
 ctdbd: High latency 9380914.000000s for operation lockwait on database notify.tdb

After this commit:
  512      2451  143.85 MB/sec  execute 60 sec  latency 0.00 msec
And no more latency messages...

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 9ed2f8b2fcb7e3f0d795eef22cfa317066490709)
2010-02-16 11:02:25 +11:00
Ronnie Sahlberg
96a61ca907 Reduce loglevel for two eventscript related debug messages
(This used to be ctdb commit f8994790e65baebb81bbfad646cdda6234b6d29a)
2010-02-16 11:02:11 +11:00
Ronnie Sahlberg
06fdfddf27 Reducing the log level for a debug message
DEBUG(DEBUG_DEBUG,("pnn %u starting migration of %08x t\

(This used to be ctdb commit 6ce4b21b00cce1530aff022584bf695c257a5d55)
2010-02-16 11:02:01 +11:00
Ronnie Sahlberg
ce9d57bc36 Reduce the log level for two debug messages
DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_has
       DEBUG(DEBUG_DEBUG,("pnn %u dmaster request on %08x for %u from %u\n",

(This used to be ctdb commit a3473e7a445b14520a49585c460429dfbfe1fce0)
2010-02-16 11:01:52 +11:00
Ronnie Sahlberg
70c1e39e64 Add a variable CTDB_CHECK_SWAP_IS_NOT_USED="yes"
to control whether or not to check if we are swapping, and produce
useful output into the logfile if we are.

For production systems with dedicated nas-heads we should never swap.
But for developer/test systems we often use smaller nondedicated systems where
we can no longer guarantee that we will not be using swap.

(This used to be ctdb commit db87849bf3380914a63a626412bec209dbea7d20)
2010-02-16 11:01:39 +11:00
Ronnie Sahlberg
7f2f7364ad lower the loglevel for a debug message for redundant releases of public ips
(This used to be ctdb commit cfc1a4f878b61c85063af649d2339431e799647d)
2010-02-16 11:01:09 +11:00
Ronnie Sahlberg
64111bb02b Add a new variable : CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK
when set to "yes" this will skip checking if knfsd has hung or not.

bz59626

(This used to be ctdb commit b0bf3794753c5bb898295b5109707953cc3dcec5)
2010-02-16 10:59:53 +11:00
Andrew Tridgell
c137725af8 fixed printing of high latency
(This used to be ctdb commit 88aacab30a36d66fe03d120bbf655edfe791ec32)
2010-02-16 10:58:24 +11:00
Martin Schwenke
47bebba8fe Test suite: Make "ctdb ip" test backward compatible with older ctdb versions.
Recent updates to the test meant that it only worked with the latest
ctdb versions.  This changes things so that we never bother matching
the machine readable header, just the actual data in the output.  It
also takes a slightly more liberal approach in massaging the human
readable output to ensure it matches the machine readable output.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8a1cb5dc1ddf82f3b9cbb23e40b3914b3d5c2783)
2010-02-10 20:27:53 +11:00
Martin Schwenke
d25ab9eca0 Merge commit 'origin/master'
(This used to be ctdb commit 19523fbb12db1ec1e5ee38de1b2d3b99a74c6ca4)
2010-02-10 20:24:28 +11:00
Ronnie Sahlberg
e01c8454ef commands that relate to manual failover of ip addresses (moveip)
can sometimes take long so allow for a longer timeout for the controls used.

(This used to be ctdb commit 144c69b633eeb17e120f962162feed6de3dc16a6)
2010-02-09 18:34:47 +11:00
Ronnie Sahlberg
ca9386a7f4 dont just exit(0) upon successful completion of waiting for an ipreallocate to finish.
return success back to the caller instead.

otherwise things like 'ctdb enable -n all' will just finish after the first disabled node has become enabled.

(This used to be ctdb commit f4eb41cd3a1099da8265351818fba9bd4688a188)
2010-02-09 14:35:10 +11:00
Rusty Russell
34b8b98078 event scripts: add logging for low memory conditions
We should never enter swap; if we do, show the memory state of the machine and the process list.  This will help us diagnose what caused the condition before it's too late and the box starts OOM-killing processes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

(This used to be ctdb commit 627a6d67a0e9e61f8713e62695b3518c51909230)
2010-02-09 12:46:35 +10:30
Andrew Tridgell
2406733ed2 ctdb: migrate to new dlinklist.h from Samba
(This used to be ctdb commit f63c091f12f8d582e9518673365c7c52479c470c)
2010-02-09 09:20:55 +11:00
Martin Schwenke
e128e590b3 onnode documentation - update documentation to reflect recent onnode changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 2fb2eb0fd7396de33474ce43fe95c66a5784d05b)
2010-02-05 15:30:39 +11:00
Martin Schwenke
240625fe9d Merge branch 'master' of git://git.samba.org/sahlberg/ctdb
(This used to be ctdb commit a442668923d4d8f8d624e00138fe37d76d593d21)
2010-02-05 14:00:23 +11:00
Andrew Tridgell
f23b82b58c ctdb: when we fill the client packet queue we need to drop the client
We can't just drop packets to the list, as those packets could be part
of the core protocol the client is using. This happens (for example)
when Samba is doing a traverse. If we drop a traverse packet then
Samba hangs indefinately. We are better off dropping the ctdb socket
to Samba.

(This used to be ctdb commit a7a86dafa4d88a6bbc6a71b77ed79a178fd802a6)
2010-02-04 15:37:59 +11:00
Andrew Tridgell
3eb9735be5 ctdb: move ctdb_io.c to use TLIST_*() macros
This will make large packet queues much more efficient

(This used to be ctdb commit e3f198056230073135ea6354bbef30c5bb022f8f)
2010-02-04 15:37:53 +11:00
Andrew Tridgell
5fd88a1c42 util: added TLIST_*() macros
The TLIST_*() macros are like the DLIST_*() macros, but take both a
head and tail pointer for the list. This means that adding an element
to the end of the list is efficient (it doesn't need to walk the
list).

We should move all uses of the DLIST_*() macros which use
DLIST_ADD_END() to use the TLIST_*() macros instead.

(This used to be ctdb commit 2d05a71349e9ade869b62cf261c2a9a21818a474)
2010-02-04 15:37:45 +11:00
Ronnie Sahlberg
7a889c5f1d When trying to enable/disable a node.
Check if the node is already enabled/disabled and log an information
message if so.

(This used to be ctdb commit c3eec8f10764a647106087099eeb47b7196f7aac)
2010-02-04 10:03:21 +11:00
Ronnie Sahlberg
a2857b1504 We only queued up to 1000 packets per queue before we start dropping
packets, to avoid the queue to grow excessively if smbd has blocked.

This could cause traverse packets to become discarded in case the main
smbd daemon does a traverse of a database while there is a recovery
(sending a erconfigured message to smbd, causing an avalanche of unlock
messages to be sent across the cluster.)

This avalance of messages could cause also the tranversal message to be
discarded  causing the main smbd process to hang indefinitely waiting
for the traversal message that will never arrive.

Bump the maximum queue length before starting to discard messages from
1000 to 1000000 and at the same time rework the queueing slightly so we
can append messages cheaply to the queue instead of walking the list
from head to tail every time.

(This used to be ctdb commit 59ba5d7f80e0465e5076533374fb9ee862ed7bb6)
2010-02-04 09:54:06 +11:00
Ronnie Sahlberg
7a5254ae69 add two new debug controls to send and receive messages
ctdb msglisten and msgsend

(This used to be ctdb commit 8c89aac20260dc7f3746e29fe99f17422a77cb88)
2010-02-04 09:45:32 +11:00
Ronnie Sahlberg
d7c00d8d7e Drop the debug level for logging fd creation to DEBUG_DEBUG
(This used to be ctdb commit eae1d4f9e52e73b4d8769868fffdafa590d03784)
2010-02-04 06:37:41 +11:00
Volker Lendecke
68273bbab8 tdb: fix an early release of the global lock that can cause data corruption
There was a bug in tdb where the

                tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1);

(ending the transaction-"mutex") was done before the

                        /* remove the recovery marker */

This means that when a transaction is committed there is a window where another
opener of the file sees the transaction marker while the transaction committer
is still fully functional and working on it. This led to transaction being
rolled back by that second opener of the file while transaction_commit() gave
no error to the caller.

This patch moves the F_UNLCK to after the recovery marker was removed, closing
this window.

(This used to be ctdb commit 898b5edfe757cb145960b8f3631029bfd5592119)
2010-02-02 07:52:15 +11:00
Martin Schwenke
56b178e1a2 eventscripts: stop loadconfig function from loading ctdb config file twice.
If "$1" was empty than loadconfig would load the ctdb config twice.
This stops that from happening.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0406d406da70aaee7ad6aac236114905c5d03ed2)
2010-01-22 17:19:12 +11:00
Martin Schwenke
407a8f7205 eventscript: Use of $NFS_TICKLE_SHARED_DIRECTORY must be after loadconfig.
Proper fix for 085d1bea78fabf754ef6dd6d323f74a1d361e45c's workaround.
$NFS_TICKLE_SHARED_DIRECTORY was being used before it is set via
loadconfig.

Ronnie actually spotted this one.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ee8b2e298351d05197a2e1494f3331433644c1e6)
2010-01-22 17:14:50 +11:00
Martin Schwenke
02e68340e8 initscript: Remove bash-ism.
Also, change the order of the comparison so it is consistent with
others in the script.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 44696e15cdb23e7656d3bb0ead54f509495738a7)
2010-01-22 17:13:17 +11:00
Martin Schwenke
d6b0578cfb initscript: handle spaces in option values inserted into $CTDB_OPTIONS.
This puts single quotes around everything and uses eval on the
command-lines that actually start ctdbd.  The eval causes the single
quotes to be interpreted.

The "redhat" init style no longer uses the Red Hat daemon function.
It loses the quoting and re-splits on spaces.  Instead we add an extra
line that uses the success/failure functions to keep things pretty.
Note that this means that we don't respect daemon's
$DAEMON_COREFILE_LIMIT variable but we do our own core file handling
with $CTDB_SUPPRESS_COREFILE anyway.  daemon's core file handling was
probably overriding what we were doing anyway, so this can be regarded
as a bug fix.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 522fbb012524fe41a67dbe43589a282dda6bcbe2)
2010-01-22 15:34:21 +11:00
Martin Schwenke
52dbd65825 onnode: update algorithm for finding nodes file.
2 changes:

* If a relative nodes file is specified via -f or $CTDB_NODES_FILE but
  this file does not exist then try looking for the file in /etc/ctdb
  (or $CTDB_BASE if set).

* If a nodes file is specified via -f or $CTDB_NODES_FILE but this
  file does not exist (even when checked as per above) then do not
  fall back to /etc/ctdb/nodes ((or $CTDB_BASE if set).  The old
  behaviour was surprising and hid errors.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 60aa570aaa77d293b963105b3f605f9625a4594b)
2010-01-21 18:52:44 +11:00
Martin Schwenke
7569b21f2d onnode - respect $CTDB_BASE rather than hard-coding /etc/ctdb.
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 503e4908b3028330bc25dc6de8561dbd53ee6a8d)
2010-01-21 18:52:31 +11:00
Stefan Metzmacher
12c8dd215c config: 10.interface: search "ethtool" in $PATH instead of using a hardcoded path
This is very useful for testing, I use such a script:

cat ~/bin/ethtool
 #!/bin/sh

 IFACE=$1

 case "$IFACE" in
        Neth2)
                ;;
        Neth3)
                ;;
        Neth4)
                ;;
        Neth5)
                ;;
        *)
                exec /usr/sbin/ethtool $@
                ;;
 esac

 ip link set down $IFACE

 exec /usr/sbin/ethtool $@

metze

(This used to be ctdb commit 3bab985cf615720eded4d47b4f9f37a9c28840aa)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
dbe912793e server: reload the public addresses before doing a takeover run
metze

(This used to be ctdb commit 0e41a2204fa8a1e77dc83c0d4b253ab272b5c72d)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
76cb4ce34c server: ban ourself if the ctdb and kernel knowledge of a public ip differs
metze

(This used to be ctdb commit 48e0af91113d6cead6cae3f28d8d8f610cacaa71)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
405368eeb0 server: give an error if we're getting an takeover_ip event with a wrong pnn
metze

(This used to be ctdb commit 2f44d6f3d290cc1b37b19ec34edfbad12cc0c0a7)
2010-01-20 11:11:04 +01:00
Stefan Metzmacher
a5ba5c129a server: return an error if we get an takeover ip event and we cannot serve the ip
metze

(This used to be ctdb commit f5c221e6abc118aefa489aa7e07755af952fd2bb)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
55d824bd77 server: print node number as signed integer on release ip event
metze

(This used to be ctdb commit 6c456face30606641f6b8beaad3121c9b05ca763)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
c5e579b56a server: debug redundant takeover ip events with level INFO
metze

(This used to be ctdb commit 7bc9969c4c28f2c4a4848bd730db3c63bb9204fe)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
ffdf32dedf server: be less verbose on redundant release_ip events
metze

(This used to be ctdb commit 72ef5f891f85ce51f5ca7e0c03d0c7cc955be110)
2010-01-20 11:11:03 +01:00
Stefan Metzmacher
58d7c44b1c server: add a ctdb_do_updateip()
metze

(This used to be ctdb commit eded224368dded2264e53546c196b1b485cb2094)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
aa485b17bb server: split out a ctdb_do_takeover_ip() function
metze

(This used to be ctdb commit 8fd6f4aab0c173b4c9c4c02c546e7d2ec1a98423)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
da59e0b162 server: split out a ctdb_announce_vnn_iface() function
metze

(This used to be ctdb commit ec87a51660cfa8a6851923f757fed31f7ffc7153)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
ea5843075c events: add updateip event to 13.per_ip_routing
metze

(This used to be ctdb commit 829150e814a5e6c85d0f21421f46f41e81d74c53)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
6a818e66ae events: 10.interface handle updateip event
metze

(This used to be ctdb commit a5cdf1277387f8c6292153c37fa9ceb64707d04f)
2010-01-20 11:11:02 +01:00
Stefan Metzmacher
98ee69c66d server: add updateip event
metze

(This used to be ctdb commit 712ed0c4c0bff1be9e96a54b62512787a4aa6259)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
50bff8c886 config: add CTDB_PARTIALLY_ONLINE_INTERFACES to ctdb.sysconfig
With this option set to "yes", we don't become unhealthy
as long as at least one interface is still available.

metze

(This used to be ctdb commit d054eb33c6ae92560cddb40732e5dcf622591a3c)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
179c098e86 server: start with disabled interfaces and let the event scripts enable the interfaces explicit
This makes sure that we don't get public addresses assigned during the
initial recovery and remove them again in the startup event.

metze

(This used to be ctdb commit f872e8c63a2f8979e6a0d088630575bdd4d7b4f1)
2010-01-20 11:11:01 +01:00
Stefan Metzmacher
5d2c3ef656 config: 10.interfaces call monitor_interfaces on startup
metze

(This used to be ctdb commit 615dec051c26aac628f120e96bf12fb39fc6d28a)
2010-01-20 11:11:01 +01:00