samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Author	SHA1	Message	Date
Rusty Russell	f93440c4b7	event: Update events to latest Samba version 0.9.8 In Samba this is now called "tevent", and while we use the backwards compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now a separate tevent_fd_set_auto_close() function. This is based on Samba version `7f29f817fa`. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)	2010-08-18 09:16:31 +09:30
Rusty Russell	7061ceffd8	Report client for queue errors. We've been seeing "Invalid packet of length 0" errors, but we don't know what is sending them. Add a name for each queue, and print nread. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e6cf0e8f14f4263fbd8b995418909199924827e9)	2010-07-01 23:08:49 +10:00
Ronnie Sahlberg	fa618aa66a	add additional logging when tdb_chainunlock() fails so we can see where it was called from when it fails (This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)	2010-06-09 14:37:16 +10:00
Rusty Russell	d5f6026a22	libctdb: reorganize headers: remove ctdb.h, add ctdb_client.h and ctdb_protocol.h ctdb_client.h is the existing internal client interface (which was mainly in ctdb.h), and ctdb_protocol.h is the information needed for the wire protocol only. ctdb.h will be the new, shiny, libctdb API. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 4bba6b8cd47b352f98d41f9f06258d5ac3c9adef)	2010-05-20 15:18:30 +09:30
Ronnie Sahlberg	eeeb89e3e2	Reduce the loglevel for two log messages for Registering and Deregistering server ids. BZ61890 (This used to be ctdb commit 944434eb6420774e42e58984c6ddaa326a6853bd)	2010-03-30 11:57:25 +11:00
Stefan Metzmacher	3419e9c4dd	server: add "setup" event This is needed because the "init" event can't use 'ctdb' commands. metze (This used to be ctdb commit 1493436b6b24eb05a23b7a339071ad85f70de8f4)	2010-02-23 10:38:49 +01:00
Andrew Tridgell	f23b82b58c	ctdb: when we fill the client packet queue we need to drop the client We can't just drop packets to the list, as those packets could be part of the core protocol the client is using. This happens (for example) when Samba is doing a traverse. If we drop a traverse packet then Samba hangs indefinately. We are better off dropping the ctdb socket to Samba. (This used to be ctdb commit a7a86dafa4d88a6bbc6a71b77ed79a178fd802a6)	2010-02-04 15:37:59 +11:00
Stefan Metzmacher	fd06167caa	server: add "init" event This is needed because the "startup" event runs after the initial recovery, but we need to do some actions before the initial recovery. metze (This used to be ctdb commit e953808449c102258abb6cba6f4abf486dda3b82)	2010-01-20 09:44:36 +01:00
Ronnie Sahlberg	4c722fe34c	fix a conflict in the merge from rusty Merge commit 'rusty/ctdb-no-setsched' Conflicts: server/ctdb_vacuum.c (This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)	2009-12-17 08:18:04 +11:00
Rusty Russell	af2613e16f	ctdb: use mlockall, cautiously We don't want ctdb stalling due to paging; this can be far worse than scheduling delays. But if we simply do mlockall(MCL_FUTURE), it increases the risk that mmap (ie. tdb open) or malloc will fail, causing us to abort. This patch is a compromise: we mlock all current pages (including 10k of future stack for expansion) and then relock when a client asks us to open a TDB. We warn, but don't exit, if it fails. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)	2009-12-16 20:57:20 +10:30
Rusty Russell	c488ba440a	Remove RT priority, use niceness. 1) It's buggy. Code needs to be carefully written (ie. no busy loops) to handle running with it, and we fork and run scripts.[1] 2) It makes debugging harder. If ctdbd loops (as has happened recently) it can be extremely hard to get in and see what's happening. We've already seen the valgrind hacks. 3) We have seen recent scheduler problems. Perhaps they are unrelated, but removing this very unusual setup is unlikely to hurt. 4) It doesn't make anything faster. Under all but the most perverse of circumstances, 99% of the cpu gives the same performance as 100%, and we will always preempt normal processes anyway. [1] I made this worse in 0fafdcb8d353 "eventscript: fork() a child for each script" by removing the switch_from_server_to_client() which restored it, but even that was only for monitor scripts. Others were run with RT priority. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 482c302d46e2162d0cf552f8456bc49573ae729d)	2009-12-16 19:26:22 +10:30
Stefan Metzmacher	94bc40307a	server: Use tdb_check to verify persistent tdbs on startup Depending on --max-persistent-check-errors we allow ctdb to start with unhealthy persistent databases. The default is 0 which means to reject a startup with unhealthy dbs. The health of the persistent databases is checked after each recovery. Node monitoring and the "startup" is deferred until all persistent databases are healthy. Databases can become healthy automaticly by a completely HEALTHY node joining the cluster. Or by an administrator with "ctdb backupdb/restoredb" or "ctdb wipedb". metze (This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)	2009-12-16 08:06:10 +01:00
Stefan Metzmacher	9a96ae0c97	server: only do the mkdir() calls for db_directory* once at the start metze (This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)	2009-12-16 08:03:56 +01:00
Volker Lendecke	a0d9bd3c13	Run only one event for each epoll_wait/select call This might be a bit less efficient, but experience in winbind has shown that event callbacks can trigger changes in the socket state in very hard to diagnose ways. (This used to be ctdb commit a78b8ea7168e5fdb2d62379ad3112008b2748576)	2009-12-10 07:52:16 +11:00
Ronnie Sahlberg	fab11acc65	lower the loglevel for the message that a client has attached through a domian socket (This used to be ctdb commit de9e5236b20d70eac5ed29991703d6d25a103963)	2009-12-02 14:51:57 +11:00
Ronnie Sahlberg	6bad4a4836	Add a proper function to process a process-exist control in the daemon. This controls is only used by samba when samba wants to check if a subrecord held by a <node-id>:<smbd-pid> is still valid or if it can be reclaimed. If the node is banned or stopped, we kill the smbd process and return that the process does not exist to the caller. This allows us to recover subrecords from stopped/banned nodes where smbd is hung waiting for the databases to thaw. bz58185 (This used to be ctdb commit 157807af72ed4f7314afbc9c19756f9787b92c15)	2009-12-02 13:58:27 +11:00
Ronnie Sahlberg	1c7de7a2ed	Add a double linked list to the ctdb_context to store a mapping between client pids and client structures. Add the mapping to the list everytime we accept() a new client connection and set it up to remove in the destructor when the client structure is freed. (This used to be ctdb commit f75d379377f5d4abbff2576ddc5d58d91dc53bf4)	2009-12-02 13:41:04 +11:00
Ronnie Sahlberg	bf27dc2d53	Use the PID we pick up from the domain socket when a client connects and store this in the client structure. There is no need to rely on the hack that samba sends some special message handle registrations that encodes the pid in the srvid any more. This might not work on AIX since I recall some issues to get the pid in this way on that platform. (This used to be ctdb commit b4a7efa7e53e060a91dea0e8e57b116e2aeacebf)	2009-12-02 13:17:12 +11:00
Ronnie Sahlberg	e33722a569	start the syslog child a little later, after we have forked and detached from the local shell (This used to be ctdb commit 9ffd54b73c0d64b67e8e736d7cb54490e77ffa78)	2009-10-30 19:39:11 +11:00
Ronnie Sahlberg	4d40b86805	for debugging add a global variable holding the pid of the main daemon. change the tracking of time() in the event loop to only check/warn when called from the main daemon (This used to be ctdb commit a10fc51f4c30e85ada6d4b7347b0f9a8ebc76637)	2009-10-27 13:18:52 +11:00
Ronnie Sahlberg	86d1b4c465	Add a mechanism where we can register notifications to be sent out to a SRVID when the client disconnects. The way to use this is from a client to : 1, first create a message handle and bind it to a SRVID A special prefix for the srvid space has been set aside for samba : Only samba is allowed to use srvid's with the top 32 bits set like this. The lower 32 bits are for samba to use internally. 2, register a "notification" using the new control : CTDB_CONTROL_REGISTER_NOTIFY = 114, This control takes as indata a structure like this : struct ctdb_client_notify_register { uint64_t srvid; uint32_t len; uint8_t notify_data[1]; }; srvid is the srvid used in the space set aside above. len and notify_data is an arbitrary blob. When notifications are later sent out to all clients, this is the payload of that notification message. If a client has registered with control 114 and then disconnects from ctdbd, ctdbd will broadcast a message to that srvid to all nodes/listeners in the cluster. A client can resister itself with as many different srvid's it want, but this is handled through a linked list from the client structure so it mainly designed for "few notifications per client". 3, a client that no longer wants to have a notification set up can deregister using control CTDB_CONTROL_DEREGISTER_NOTIFY = 115, which takes this as arguments : struct ctdb_client_notify_deregister { uint64_t srvid; }; When a client deregisters, there will no longer be sent a message to all other clients when this client disconnects from ctdbd. (This used to be ctdb commit f1b6ee4a55cdca60f93d992f0431d91bf301af2c)	2009-10-23 15:24:51 +11:00
Ronnie Sahlberg	a92ba7f729	lower the debug levels for the "create FD messages" so we dont fill up the logs. (This used to be ctdb commit 87146db2769c2ec494813685bf9cec0d2a6336c3)	2009-10-21 15:26:24 +11:00
Ronnie Sahlberg	9b8c72c446	When clients have blocked, perhaps because the node is banned or stopped and the client is blocked trying to tdb_fetch() a record, make sure we dont queue up too many REQ_MESSAGES. Add a new tunable to control the maximum queue size we allow to a blocked client before we start discarding REQ_MESSAGES instead of queueing them for delivery. This avoids having queued up very very large number of MESSAGES that samba semds between eachother to nodes that are blocked/banned/stopped for extended periods . (This used to be ctdb commit f76d6fed8f9630450263b9fa4b5fdf3493fb1e11)	2009-10-21 15:20:55 +11:00
Ronnie Sahlberg	9de3652380	add logging everytime we create a filedescriptor in the main ctdb daemon so we can spot if there are leaks. plug two leaks for filedescriptors related to when sending ARP fail and one leak when we can not parse the local address during tcp connection establish (This used to be ctdb commit ddd089810a14efe4be6e1ff3eccaa604e4913c9e)	2009-10-15 11:24:54 +11:00
Ronnie Sahlberg	73c0adb029	initial attempt at freezing databases in priority order (This used to be ctdb commit e8d692590da1070c87a4144031e3306d190ebed2)	2009-10-12 12:08:39 +11:00
Michael Adam	4cd06a330e	Fix persistent transaction commit race condition. In ctdb_client.c:ctdb_transaction_commit(), after a failed TRANS2_COMMIT control call (for instance due to the 1-second being exceeded waiting for a busy node's reply), there is a 1-second gap between the transaction_cancel() and replay_transaction() calls in which there is no lock on the persistent db. And due to the lack of global state indicating that a transaction is in progress in ctdbd, other nodes may succeed to start transactions on the db in this gap and even worse work on top of the possibly already pushed changes. So the data diverges on the several nodes. This change fixes this by introducing global state for a transaction commit being active in the ctdb_db_context struct and in a db_id field in the client so that a client keeps track of _which_ tdb it as transaction commit running on. These data are set by ctdb upon entering the trans2_commit control and they are cleared in the trans2_error or trans2_finished controls. This makes it impossible to start a nother transaction or migrate a record to a different node while a transaction is active on a persistent tdb, including the retry loop. This approach is dead lock free and still allows recovery process to be started in the retry-gap between cancel and replay. Also note, that this solution does not require any change in the client side. This was debugged and developed together with Stefan Metzmacher <metze@samba.org> - thanks! Michael (This used to be ctdb commit f88103516e5ad723062fb95fcb07a128f1069d69)	2009-07-29 11:12:39 +10:00
Ronnie Sahlberg	d4b30b34aa	dont even try to send a message from the main daemon if the transport is down (This used to be ctdb commit 9a2c4c3ed09ac9ea781d06999d11e5c3b5b4a97a)	2009-06-30 12:09:28 +10:00
Ronnie Sahlberg	4259156050	dont remove the socket when the dameon stops. This can race if the service is immediately restarted (This used to be ctdb commit b18356764cd49d934eab901e596bb75c6e3ecdf8)	2009-05-29 18:16:13 +10:00
Sumit Bose	2fcedf6dac	add missing checks on so far ignored return values Most of these were found during a review by Jim Meyering <meyering@redhat.com> (This used to be ctdb commit 3aee5ee1deb4a19be3bd3a4ce3abbe09de763344)	2009-05-21 11:22:21 +10:00
root	bfea570af4	when tracking the ctdb statistics, only decrement num_clients and pending_calls IFF the counter is >0 Otherwise there is the chance that we will reset the statistics after the counter has been incremented (client connects) to zero and when the client disconnects we decrement it to a negative number. this is a pure cosmetic patch with no operational impact to ctdb (This used to be ctdb commit 72f1c696ee77899f7973878f2568a60d199d4fea)	2009-05-01 12:30:26 +10:00
Ronnie Sahlberg	e5e2f6f8f7	increase the listen queue. Now that the eventscripts may become clients and connect back to the server we do get a lot more concurrent connection attempts (takepip/teleaseip are performed in parallell) (This used to be ctdb commit 018f8b0b1823ef59b46f1a671aec5309d10628f4)	2009-04-06 14:00:41 +10:00
Ronnie Sahlberg	94a56ea410	reqrite the handling of flag updates across the cluster to eliminate a race between the ctdb tool and the recovery daemon both at once trying to push flag changes across the cluster. (This used to be ctdb commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa)	2008-11-20 12:43:18 +11:00
Ronnie Sahlberg	06728fdac9	we actually need a ctdb_db variable (This used to be ctdb commit aba984f1b85f5a2d370b093061cf15843ee53758)	2008-11-03 21:54:52 +11:00
Ronnie Sahlberg	d7007793ea	latency is measured in us, not ms use an explicit ctdb_db variable instead of dereferencing state (This used to be ctdb commit 8c6a02fb423a8cbcbfc706767e3d353cd48073c3)	2008-10-30 13:34:10 +11:00
Ronnie Sahlberg	e1b0cea427	add control and logging of very high latencies. log the type of operation and the database name for all latencies higher than a treshold (This used to be ctdb commit 1d581dcd507e8e13d7ae085ff4d6a9f3e2aaeba5)	2008-10-30 12:49:53 +11:00
Ronnie Sahlberg	6474f3278d	additional monitoring between the two daemons. we currently only monitor that the dameons are running by kill(0, pid) and verifying the the domain socket between them is ok. this is not sufficient since we can have a situation where the recovery daemon is hung. this new code monitors that the recovery daemon is operating. if the recovery hangs, we log this and shut down the main daemon (This used to be ctdb commit cd69d292292eaab3aac0e9d9fc57cb621597c63c)	2008-09-09 13:44:46 +10:00
Ronnie Sahlberg	ef997d344f	initial ipv6 patch Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 1f131f21386f428bbbbb29098d56c2f64596583b)	2008-08-19 14:58:29 +10:00
Ronnie Sahlberg	8b520bcb5f	lower a debug message (This used to be ctdb commit 554dcf16d37c8b9e4704df11d21fb272f30f5cec)	2008-07-18 10:38:51 +10:00
Ronnie Sahlberg	6eb4e46fe1	Add two new controls to start and cancel a persistent update. This allows ctdb to automatically start a new full blown recovery if a client has started updating the local tdb for a persistent database but is kill -9ed before it has ensured the update is distributed clusterwide. (This used to be ctdb commit 1ffccb3e0b3b5bd376c5302304029af393709518)	2008-07-17 13:50:55 +10:00
Ronnie Sahlberg	334db8ccba	proper waitpid() fix. remove all waitpid() calls and use the event system to trap sigchld (This used to be ctdb commit 77458b2b6b51b2970c12b0e5b097088d3fb9d358)	2008-07-09 14:02:54 +10:00
Ronnie Sahlberg	522830dea8	Revert "waitpid() can block if it takes a long time before the child terminates" This reverts commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10. revert the waitpid changes. we need to waitpid for some childredn so should refactor the approach completely (This used to be ctdb commit 702ced6c2fe569c01fe96c60d0f35a7e61506a96)	2008-07-08 17:41:31 +10:00
Ronnie Sahlberg	79425ddec5	Revert "set sigchild to SIG_IGN instead of SIG_DFL" This reverts commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76. (This used to be ctdb commit 2030e9ff2ca044181b72c3b87d513bf27057b5a2)	2008-07-08 17:40:53 +10:00
Ronnie Sahlberg	71d2315eee	set sigchild to SIG_IGN instead of SIG_DFL (This used to be ctdb commit b1f1e80d3ad50280a300f2ed021513cf0a6f3a76)	2008-07-08 16:31:23 +10:00
Ronnie Sahlberg	d67de4a7d2	waitpid() can block if it takes a long time before the child terminates so we should not call it from the main daemon. 1, set SIGCHLD to SIG_DFL to make sure we ignore this signal 2, get rid of all waitpid() calls 3, change reporting of event script status code from _exit()/waitpid() to write()/read() one byte across the pipe. (This used to be ctdb commit bfba5c7249eff8a10a43b53c1b89dd44b625fd10)	2008-07-08 03:48:11 +10:00
Ronnie Sahlberg	adf40341a7	ctdb->methods becomes NULL when we shutdown the transport. If we shutdown the transport and CTDB later decides to send a command out for queueing, the call to ctdb->methods->allocate_pkt() will SEGV. This could trigger for example when we are in the process of shuttind down CTDBD and have already shutdown the transport but we are still waiting for the "shutdown" eventscripts to finish. If the event scripts now take much much longer to execute for some reason, this race condition becomes much more probable. Decorate all dereferencing of ctdb->methods-> with a check that ctdb->menthods is non-NULL (This used to be ctdb commit c4c2c53918da6fb566d6e9cbd6b02e61ae2921e7)	2008-05-11 14:28:33 +10:00
Ronnie Sahlberg	cd1858d126	fix compiler warning during a fatal error failing to lock down the socket (This used to be ctdb commit 0ad22de1a614dc2d1926546027be5f5eea3381ed)	2008-04-10 09:56:49 +10:00
Ronnie Sahlberg	2da3fe1b17	From Chris Cowan secure the domain socket and set permissions properly (This used to be ctdb commit ac6a362fc2fc4a56b4c310478a96eb12daace176)	2008-04-10 06:51:53 +10:00
Ronnie Sahlberg	6b797f148c	From Chris Cowan Add support in AIX to track the PID of a client that connects to the unix domain socket (This used to be ctdb commit 4c006c675d577d4a45f4db2929af6d50bc28dd9e)	2008-04-03 10:58:51 +11:00
Ronnie Sahlberg	03d30f405d	decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached (This used to be ctdb commit 0d9314302d0b988b6ab5d533deef40c5b343c249)	2008-04-01 17:17:21 +11:00
Ronnie Sahlberg	27a7f854f5	add improvements to tracking memory usage in ctdbd adn the recovery daemon and a ctdb command to pull the talloc memory map from a recovery daemon ctdb rddumpmemory (This used to be ctdb commit d23950be7406cf288f48b660c0f57a9b8d7bdd05)	2008-04-01 15:34:54 +11:00

1 2

76 Commits