samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-28 17:47:29 +03:00

Author	SHA1	Message	Date
Amitay Isaacs	cb8310ddb6	recoverd: Improve log message when nodes disagree on recmaster Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a)	2013-08-14 16:55:51 +10:00
Amitay Isaacs	3c0a477911	common: Null terminate process name string so valgrind doesn't complain Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1c9025fdd08d1cea342af7487d0123015e08831b)	2013-08-14 16:55:51 +10:00
Amitay Isaacs	ae30b61255	vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)	2013-08-14 16:55:51 +10:00
Amitay Isaacs	ee8d573069	vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1) This is caused by corruption of a record header such that the records on two nodes point to each other as dmaster. This makes a request for that record bounce between nodes endlessly. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)	2013-08-14 16:55:51 +10:00
Amitay Isaacs	f9be4803cb	db_wrap: Make sure tdb messages are logged correctly Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 60cb40d090e45ff6134c098a238fac7ad854f134)	2013-08-14 16:55:51 +10:00
Martin Schwenke	fec69034ee	eventscripts: Become unhealthy faster on nfsd failure Anecdotal evidence suggests that most nfsd RPC check failures are due to cluster filesystem or storage problem. Apparently these are rarely helped by attempting to restart the NFS service because the restart tends to hang. Fail after 2 nfsd RPC check failures, instead of waiting for 6 failures. Restart on every 10th failure to try to bring the node back to good health. Update unit tests to match. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)	2013-08-14 16:10:30 +10:00
Martin Schwenke	4cb3e2cd78	tools/ctdb: Increase default control timeout to 10 seconds The current 3 second timeout is arbitrary and users trip over it sometimes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)	2013-08-14 15:57:04 +10:00
Martin Schwenke	e6ce2f55ef	eventscripts: Improve message logged when a counter hits a limit It should print the actual number of consecutive failures rather than the limit. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)	2013-08-14 15:57:04 +10:00
Martin Schwenke	35d9631eda	eventscripts: Print a message when waiting for TCP connections to be killed This makes the gaps in the logs more obvious. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)	2013-08-14 15:57:04 +10:00
Martin Schwenke	b1f7337d2b	eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST Passing "localhost" to the rpcinfo command causes overheads, like reading /etc/services multiple times. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)	2013-08-14 15:57:04 +10:00
Martin Schwenke	0ca046577f	eventscripts: Add modulo (%) operator to ctdb_check_counter() Also add it to the corresponding eventscript unit test infrastructure. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)	2013-08-14 15:57:03 +10:00
Martin Schwenke	bdbe37b24f	eventscripts: Separate out RPC service restart code While doing this: * Explicitly assign RPC program and version information in _nfs_check_rpc_common(). This is more lines of code but is easier to read. * Don't print the options when starting a service. Trying to print it makes the code messy for little benefit. Update the eventscript unit testing code and a Ganesha test to reflect this. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)	2013-08-14 15:57:03 +10:00
Martin Schwenke	2afb5632c7	tests/eventscripts: Override background_with_logging(), just prepend "&" That is, output that goes through background_with_logging() just gets "&" prepended to each line. This is cleaner than having the tests grovel through logs. Update some 49.winbind/50.samba tests to deal with this. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)	2013-08-14 15:57:03 +10:00
Martin Schwenke	df539a66cb	eventscripts: Remove support for RPC service 'q' and 's' restart flags They're hard to maintain and provide very little benefit. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)	2013-08-14 15:57:03 +10:00
Martin Schwenke	5459cdc8a6	eventscripts: When restarting the nfslock service only show output of start That is, /dev/null the "stop" output. This is consistent with the way CTDB generally deals with the output when stopping a service. It also makes updating the eventscript unit tests easier. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)	2013-08-14 15:57:03 +10:00
Martin Schwenke	d63cf0e7a7	tests/simple: Unreachable node test should wait for recovery to complete This should minimise the chances of a control timing out. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)	2013-08-14 15:57:03 +10:00
Martin Schwenke	0997b0c400	tests/simple: Fix the missing IP test Update the missing IP test to wait until restarts are complete. Otherwise a service restart can collide with the following monitor event and cause chaos. Also, do not disable 10.interface until it matters. Disabling it too early can cause even more chaos if something goes wrong with the monitor step. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)	2013-08-14 15:57:03 +10:00
Amitay Isaacs	8f1e94dfa4	recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases When creating missing databases either locally or remotely, recovery master calls ctdb_ctrl_createdb(). Recovery master always passes 0 for tdb_flags. For volatile databases, if TDB_INCOMPATIBLE_HASH is not specified, then they will be attached without using jenkins hash causing database corruption. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 2fc6b6403707a292d134140fc0b9145b454992c5)	2013-08-14 15:54:48 +10:00
Amitay Isaacs	de6b97ce4f	Revert "recoverd: Use correct tdb flags when creating missing databases" This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4. This approach would not work when creating local databases since currently there is no control to receive TDB flags for remote databases. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)	2013-08-14 14:15:33 +10:00
Amitay Isaacs	d349b56e2d	common/io: Keep queue buffer size multiple of 4K Currently queue buffer size is realloc'd every time we need to extend the buffer. Small increments can cause memory fragmentation. Instead always extend buffer in multiples of 4K. This should reduce multiple talloc_realloc calls when there are lots of packets in the socket buffer. Also, if queue buffer has grown larger than 64K, throw away the buffer once all the requests in the queue have been processed. That way queue does not hold on to large buffers. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9)	2013-08-09 11:07:37 +10:00
Martin Schwenke	6f9090648a	packaging: Allow setting custom release number in RPM spec file Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 867afb247bd8cc86c8d738f051a44cc534cafacf)	2013-08-09 11:07:37 +10:00
Amitay Isaacs	a98baa539e	ctdbd: When a record is made sticky, log only once Instead of logging from ctdb_request_call(), log the message from ctdb_make_record_sticky(). That way if the record is already sticky, the message is not repeated unnecessarily. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831)	2013-08-09 11:07:37 +10:00
Amitay Isaacs	d42cea6efe	ctdbd: Improve high hopcount log messages when request is redirected Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)	2013-08-09 11:07:37 +10:00
Martin Schwenke	98163e01a9	scripts: Do not run ctdb tool commands when debugging hung "init" event CTDB daemon is not ready to accept clients in INIT runstate (init event). CTDB daemon will start accepting connections in SETUP runstate (setup event) and later. Also, minor log formatting changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)	2013-08-09 11:04:55 +10:00
Amitay Isaacs	ded2f28954	ctdbd: Avoid leaking file descriptor if talloc fails Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)	2013-08-09 11:04:55 +10:00
Amitay Isaacs	a030b938ca	eventscript: Wait for debug hung script to finish or timeout before continuing Currently if the debug hung script takes long time to finish, the subsequent monitor event can collide with the previous event which is not yet finished. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 9e99e0eb072e2b845914ee3896acbc66b96138d7)	2013-08-09 11:04:55 +10:00
Amitay Isaacs	f5ddb49e62	eventscripts: Use configured RECLOCK file instead of asking CTDB On cluster where recovery lock file is not being used, asking CTDB daemon is unnecessary overhead. And if CTDB is using recovery file, then changing configuration without restarting is stupid. Signed-off-by: Amitay Isaacs <amitay@gmail.com> Pair-Programmed-With: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d)	2013-08-09 11:04:55 +10:00
Amitay Isaacs	477a51aba5	locking: Do not create multiple lock processes for the same key If there are multiple lock helper processes waiting for the same record, then it will cause a thundering herd when that record has been unlocked. So avoid scheduling lock contexts for the same record. This will also mean that multiple requests will get queued up behind the same lock context and can be processed quickly once the lock has been obtained. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9)	2013-08-09 11:04:55 +10:00
Amitay Isaacs	9ba793a80f	locking: Move function find_lock_context() before ctdb_lock_schedule() So that ctdb_lock_schedule() can call this function without requiring extra prototype declaration. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 68af5405acc123b5a90decd2123e2a02961a8fcf)	2013-08-09 11:04:42 +10:00
Amitay Isaacs	b77fec9381	ctdbd: Print set db sticky message after it's set Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 824dcec35ec461d78e22b2ea109473b32bfe3972)	2013-08-01 11:08:26 +10:00
Amitay Isaacs	1d9d1d8cf9	tests: Add a test program to hold a lock on a database Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit f6b066a23610fb0092298861c21a9b354b91e2f1)	2013-08-01 11:08:26 +10:00
Amitay Isaacs	f15e1a28a7	recoverd: Use correct tdb flags when creating missing databases When creating missing databases either locally or remotely, make sure to use the correct tdb flags from other nodes. Without this, volatile databases can get attached without TDB_INCOMPATIBLE_HASH flag. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 10a057d8e15c8c18e540598a940d3548c731b0b4)	2013-08-01 11:08:25 +10:00
Amitay Isaacs	e44c38dc45	client: Always use jenkins hash when attaching volatile databases Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 7e7e59c4047c78159387089eca65d90037bcf722)	2013-08-01 11:08:25 +10:00
Amitay Isaacs	5ba280d8ce	recoverd: Make sure to use jenkins hash for recovery databases Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668)	2013-08-01 10:51:14 +10:00
Amitay Isaacs	f1f787ccac	recoverd: Assemble up-to-date node flags information from remote nodes Currently nodemap used by recovery master is the one obtained from the local node. This information may have been updated while processing main loop. Before comparing node flags on all the nodes, create up-to-date node flags information based on the information received from all the nodes. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit fcf77dec5af973a0e32f3999bc012053a6f47a96)	2013-07-30 15:34:32 +10:00
Amitay Isaacs	16b519c51b	tools/ctdb: Only print the hot records with non-zero hopcount Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 049d9beb3783482490e6273a434ccbad23f85f0a)	2013-07-30 15:34:32 +10:00
Amitay Isaacs	0993387f4a	ctdbd: Don't consider a hot record if the hopcount is zero Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit ab35773518ad15588013f4d859f7bee790437450)	2013-07-30 15:34:32 +10:00
Amitay Isaacs	054d8727ed	ctdbd: Fix updating of hot keys in database statistics Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit fde4b4db5a57f75c5efa5647c309f33e0d5a68f3)	2013-07-29 16:00:46 +10:00
Amitay Isaacs	d8fc36781c	ctdbd: Remove incomplete ctdb_db_statistics_wire structure Instead of maintaining another structure, add an element as place holder for marshall buffer of hot keys. This avoids duplication of the structure. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e73b2e12adc9db1dedb48d32bba3a8406a80f4cd)	2013-07-29 16:00:46 +10:00
Amitay Isaacs	854216236b	Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure" The structure cannot be removed without adding support for marshalling keys for hot records. This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81)	2013-07-29 16:00:46 +10:00
Martin Schwenke	e14fa50941	doc: Update XML files to use standard DocBook DTD This simplifies building since we don't use any of the Samba extensions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 57aa2dffea60abd73a95233f8b761cc676adebb6)	2013-07-29 15:58:51 +10:00
Martin Schwenke	3c73949317	initscript: The wrapper script should export CTDB_SOCKET This ensures that any invocation of the ctdb tool (within the wrapper) gets the desired value. This at least ensures that ctdbd will be started. If a non-standard value is set for CTDB_SOCKET then command-line users will still need the variable in their environment. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a)	2013-07-29 15:58:51 +10:00
Martin Schwenke	a5cb72cac3	ctdbd: Kill client process without checking for tracked child Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check to ensure that CTDB never kills unrelated processes. However, client processes are unrelated. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)	2013-07-29 15:58:51 +10:00
Martin Schwenke	a8dd716146	eventscripts: kill_tcp_connections() should send connections to stdin This avoids issuing multiple "ctdb killtcp" commands to terminate tcp connections, one per connection. This will considerably reduce the time when there is a large number of tcp connections. This also makes it possible to avoid calling "ctdb killtcp" when there are no connections. Add a couple of unit tests for killtcp and update eventscript unit test infrastructure to support. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a)	2013-07-29 15:53:06 +10:00
Martin Schwenke	200c28fbb2	tools/ctdb: Allow killtcp to read connections from standard input This will allows eventscripts to send information about multiple tcp connections to a single "ctdb killtcp" command, saving the overhead of setting up a client connection per tcp connection. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit af5aa369c266430fe912df0c26116b68bac3572e)	2013-07-29 15:51:03 +10:00
Martin Schwenke	34d55048bc	tests: Always tally the number of passed/failed tests Regardless of whether a summary is being printed! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1)	2013-07-29 15:49:23 +10:00
Martin Schwenke	f46ab595d1	recoverd: Call takeover fail callback only once per node Currently the fail callback is called once per (takeip/releaseip) control failure. This is overkill and can get a node banned much too quickly. Instead, keep track of control failures per node and only call fail callback once per failed node. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)	2013-07-29 15:48:48 +10:00
Martin Schwenke	67b22b6e94	scripts: Run scriptstatus for hung event The timeout information printed by ctdbd is less than useful because it refers to the cumulative time taken by the eventscripts run so far. Adding scriptstatus output indicates where time was actually spent. Since there is now quite a bit of output, serialise the calls to this script using flock. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714)	2013-07-29 14:02:13 +10:00
Martin Schwenke	6cbcc4a8d9	ctdbd: Pass event name to hung script debugger Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit e0f3fa1020e13b84bdd672538168d148f1847d57)	2013-07-23 11:28:07 +10:00
Martin Schwenke	6882625cfe	tests/complex: Fix NFS tests to work with root_squash Refactor the NFS test setup/cleanup code into new common functions. Signed-off-by: Martin Schwenke <martin@meltin.net> Pair-programmed-with: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b)	2013-07-23 11:28:07 +10:00

1 2 3 4 5 ...

5076 Commits