1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-18 06:04:06 +03:00

1374 Commits

Author SHA1 Message Date
Martin Schwenke
c503997746 recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)
2013-09-19 12:54:30 +10:00
Martin Schwenke
bbbb55eef9 recoverd: do_takeover_run() should mark when a takeover run is in progress
Nested takeover runs should never happens so they should fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4)
2013-09-19 12:54:30 +10:00
Martin Schwenke
a1f915f6b5 recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run
It is set on every failure anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e5f94c7857405bdeac233069003c3769b3dc3616)
2013-09-19 12:54:30 +10:00
Martin Schwenke
701c450e90 recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery.  Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)
2013-09-19 12:54:30 +10:00
Martin Schwenke
e167e2e7c7 recoverd: New function do_takeover_run()
Factor the calling sequence for ctdb_takeover_run() into a new
function and call it instead.  This changes rec->need_takeover_run to
false for each successful takeover run and that seems to be the right
thing to do.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09)
2013-09-19 12:54:30 +10:00
Martin Schwenke
30a50c6e1e recoverd: Stabilise the recovery master role
On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again.  If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.

While a node remains inactive we reset the priority time to discourage
it from winning elections.  The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.

Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed.  It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)
2013-09-19 12:54:29 +10:00
Martin Schwenke
630196423a recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery.  This can result in
extra banning credits being applied to banned nodes.

This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)
2013-09-18 17:16:35 +10:00
Martin Schwenke
8d11da3546 recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)
2013-09-11 15:35:16 +10:00
Martin Schwenke
4e62553fcb recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)
2013-09-11 15:35:10 +10:00
Michael Adam
18f17aaa33 server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
This was the comment block I was touching and meant to adapt in
commit 00d3bf092e2f72eda330978c75ec85f17e870553.
My search was apparently not unique...

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 09940255011b119dc6af3304f5d3e9568e6006fd)
2013-08-26 13:24:32 +02:00
Martin Schwenke
3afcc53516 recoverd: Remove an unused temporary talloc context
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da22d5e60dc023009854025cc9e6bc4b0a84c60e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
1ae731198a recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure.  It was moved into ctdb_private.h a
long time ago to allow unit testing.  Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
2013-08-22 17:00:20 +10:00
Martin Schwenke
e657f75484 recoverd: Log more information when interfaces change
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b)
2013-08-22 17:00:20 +10:00
Amitay Isaacs
58e96eb178 traverse: Log when database traverse is started
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 256b157232c60bc432c94e54b1fae9699f737557)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
e850a6d2ca ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script.  So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
19444f7c3d ctdbd: Make sure call data is freed if doing an early return
This should avoid memory bloat when a request bounces between nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7677fb263f06a97398e2c546e32273fb96edca69)
2013-08-22 16:59:49 +10:00
Amitay Isaacs
1467b666f2 Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.

This is a premature optimization.  Record can bounce between nodes
very quickly if it is a contended record.  There is no need to hold a
record on a node unnecessarily.  In case record contention becomes bad,
enabling sticky records on a database is a better idea.

Conflicts:
	include/ctdb_private.h
	server/ctdb_tunables.c

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
59dae19f5a ctdbd: Print a log message when a key becomes hot
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 48f40985f4592c28402303ccbb458756f4914f75)
2013-08-22 14:08:52 +10:00
Michael Adam
621bfe8b0d server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 00d3bf092e2f72eda330978c75ec85f17e870553)
2013-08-19 17:12:33 +02:00
Michael Adam
922246de73 server: fix wording and punctuation in comment block for ctdb_reply_dmaster().
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit cb3a1c5af3b796dba30cae07118670d3c9e57df7)
2013-08-19 17:12:32 +02:00
Amitay Isaacs
cb8310ddb6 recoverd: Improve log message when nodes disagree on recmaster
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ae30b61255 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ee8d573069 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
de6b97ce4f Revert "recoverd: Use correct tdb flags when creating missing databases"
This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.

This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)
2013-08-14 14:15:33 +10:00
Amitay Isaacs
a98baa539e ctdbd: When a record is made sticky, log only once
Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky().  That way if the record is already sticky, the
message is not repeated unnecessarily.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
d42cea6efe ctdbd: Improve high hopcount log messages when request is redirected
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
ded2f28954 ctdbd: Avoid leaking file descriptor if talloc fails
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
a030b938ca eventscript: Wait for debug hung script to finish or timeout before continuing
Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9e99e0eb072e2b845914ee3896acbc66b96138d7)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
477a51aba5 locking: Do not create multiple lock processes for the same key
If there are multiple lock helper processes waiting for the same record, then
it will cause a thundering herd when that record has been unlocked.  So avoid
scheduling lock contexts for the same record.  This will also mean that
multiple requests will get queued up behind the same lock context and can be
processed quickly once the lock has been obtained.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
9ba793a80f locking: Move function find_lock_context() before ctdb_lock_schedule()
So that ctdb_lock_schedule() can call this function without requiring extra
prototype declaration.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 68af5405acc123b5a90decd2123e2a02961a8fcf)
2013-08-09 11:04:42 +10:00
Amitay Isaacs
b77fec9381 ctdbd: Print set db sticky message after it's set
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 824dcec35ec461d78e22b2ea109473b32bfe3972)
2013-08-01 11:08:26 +10:00
Amitay Isaacs
f15e1a28a7 recoverd: Use correct tdb flags when creating missing databases
When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes.  Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 10a057d8e15c8c18e540598a940d3548c731b0b4)
2013-08-01 11:08:25 +10:00
Amitay Isaacs
5ba280d8ce recoverd: Make sure to use jenkins hash for recovery databases
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668)
2013-08-01 10:51:14 +10:00
Amitay Isaacs
f1f787ccac recoverd: Assemble up-to-date node flags information from remote nodes
Currently nodemap used by recovery master is the one obtained from the local
node.  This information may have been updated while processing main loop.
Before comparing node flags on all the nodes, create up-to-date node flags
information based on the information received from all the nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit fcf77dec5af973a0e32f3999bc012053a6f47a96)
2013-07-30 15:34:32 +10:00
Amitay Isaacs
0993387f4a ctdbd: Don't consider a hot record if the hopcount is zero
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ab35773518ad15588013f4d859f7bee790437450)
2013-07-30 15:34:32 +10:00
Amitay Isaacs
054d8727ed ctdbd: Fix updating of hot keys in database statistics
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit fde4b4db5a57f75c5efa5647c309f33e0d5a68f3)
2013-07-29 16:00:46 +10:00
Amitay Isaacs
d8fc36781c ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Instead of maintaining another structure, add an element as place holder for
marshall buffer of hot keys.  This avoids duplication of the structure.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e73b2e12adc9db1dedb48d32bba3a8406a80f4cd)
2013-07-29 16:00:46 +10:00
Amitay Isaacs
854216236b Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure"
The structure cannot be removed without adding support for marshalling keys
for hot records.

This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81)
2013-07-29 16:00:46 +10:00
Martin Schwenke
a5cb72cac3 ctdbd: Kill client process without checking for tracked child
Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes.  However, client
processes are unrelated.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 782814288bb560099ee44b607bf35f3eddf37f82)
2013-07-29 15:58:51 +10:00
Martin Schwenke
f46ab595d1 recoverd: Call takeover fail callback only once per node
Currently the fail callback is called once per (takeip/releaseip) control
failure.  This is overkill and can get a node banned much too quickly.

Instead, keep track of control failures per node and only call fail
callback once per failed node.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335)
2013-07-29 15:48:48 +10:00
Martin Schwenke
6cbcc4a8d9 ctdbd: Pass event name to hung script debugger
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit e0f3fa1020e13b84bdd672538168d148f1847d57)
2013-07-23 11:28:07 +10:00
Martin Schwenke
88ba32b787 ctdbd: Sleep at exit to allow time for log messages to flush
Register print_exit_message() earlier so that it covers most of the
early exits.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 90d792cf28d6a823141e4c417b6978f02a9cf596)
2013-07-19 15:40:59 +10:00
Martin Schwenke
84f5528d9b ctdbd: Exit if something is already listening on CTDB socket
Don't blindly remove the socket.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58)
2013-07-19 15:40:43 +10:00
Martin Schwenke
a3bef911f3 ctdbd: Allow extra recovery to repair persistent DBs during first recovery
Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45)
2013-07-19 15:35:41 +10:00
Martin Schwenke
ca13f28eef recoverd: Really fix bogus info in message about changed flags
Commit 9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this.
However, this was wrong because old_flags and new_flags were confused.
The latter has since been fixed in commit
7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed
properly.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 40f2825d6e818dc8c745b6385a545969dfb45fbc)
2013-07-11 15:18:06 +10:00
Sumit Bose
157f1cfefd Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 05bfdbbd0d4abdfbcf28e3930086723508b35952)
2013-07-11 15:16:55 +10:00
Sumit Bose
d039f799ac Check return value of tdb_delete()
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5cdcc3d45d358ddbcd7e864898eed9cbd9935429)
2013-07-11 15:16:55 +10:00
Martin Schwenke
a86f1f109a recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c6fded59fa4da67f738a90fdacb51900e41801f9)
2013-07-10 15:19:27 +10:00
Amitay Isaacs
14c49eabe4 ctdbd: Print tdb flags when logging attached to database message
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 846109169ee5e3d03135156e45c8dac93aa2e95b)
2013-07-10 14:33:19 +10:00
Amitay Isaacs
1c21f37e57 ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e)
2013-07-10 14:33:19 +10:00