1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-28 17:47:29 +03:00

4844 Commits

Author SHA1 Message Date
Martin Schwenke
e7cc998570 recoverd: Defer ipreallocated requests when takeover runs are disabled
The takeover run will fail anyway but deferring seems like a cleaner
option.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4)
2013-09-19 12:54:31 +10:00
Martin Schwenke
2f472b4573 recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK
Use disable_takeover_runs_handler() instead of maintaining duplicate
logic.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8)
2013-09-19 12:54:31 +10:00
Martin Schwenke
5f0913d321 recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS
This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK.  It stops
the IP checks but also causes any attempted takeover runs to fail and
be rescheduled.

This is meant to completely stop IP movements.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56)
2013-09-19 12:54:31 +10:00
Martin Schwenke
e79b750e5e tools/ctdb: Add a wait_for_all option to srvid_broadcast()
This will be useful for other SRVIDs.

The error checking in the handler depends on the SRVID responding with
a uint32_t where <0 indicates an error and >=0 is a PNN that
succeeded.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 52050e1c75b21961dafe2bc410268b44240ab24e)
2013-09-19 12:54:31 +10:00
Martin Schwenke
51db81344e tools/ctdb: Factor out SRVID broadcast code from ipreallocate()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a566fb5e70282c4e9f76654b1be4dc80829dced0)
2013-09-19 12:54:30 +10:00
Martin Schwenke
8a6979dac3 tools/ctdb: Change ipreallocate() to use a local done flag
Instead of the current global variable.  This is in anticipation of
abstracting the code.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17)
2013-09-19 12:54:30 +10:00
Martin Schwenke
0ba7e2ce31 recoverd: Factor out the SRVID handling code
The code that handles IP reallocate requests can be reused.

This also changes the result back to a SRVID caller to the PNN on
success or a negative error code on failure.  None of the callers
currently look at the result so this is harmless... but it will be
useful later.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e4eae6e3291baa299a1d0f733ab11b138ee699a3)
2013-09-19 12:54:30 +10:00
Martin Schwenke
4c3f8dc3bb recoverd: Make the SRVID request structure generic
No need for a separate one for each SRVID.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d9c22b04d5aa7938a3965bd3144568664eb772ce)
2013-09-19 12:54:30 +10:00
Martin Schwenke
c503997746 recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 48b603fbf16311daa47b01e7a33d477ed51da56d)
2013-09-19 12:54:30 +10:00
Martin Schwenke
bbbb55eef9 recoverd: do_takeover_run() should mark when a takeover run is in progress
Nested takeover runs should never happens so they should fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4)
2013-09-19 12:54:30 +10:00
Martin Schwenke
a1f915f6b5 recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run
It is set on every failure anyway.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e5f94c7857405bdeac233069003c3769b3dc3616)
2013-09-19 12:54:30 +10:00
Martin Schwenke
701c450e90 recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery.  Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 53722430ad35f80935aabd12fa07654126443b8b)
2013-09-19 12:54:30 +10:00
Martin Schwenke
e167e2e7c7 recoverd: New function do_takeover_run()
Factor the calling sequence for ctdb_takeover_run() into a new
function and call it instead.  This changes rec->need_takeover_run to
false for each successful takeover run and that seems to be the right
thing to do.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09)
2013-09-19 12:54:30 +10:00
Martin Schwenke
30a50c6e1e recoverd: Stabilise the recovery master role
On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again.  If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.

While a node remains inactive we reset the priority time to discourage
it from winning elections.  The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.

Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed.  It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0f48f22f45e4c82eba2582efae307e25385de81)
2013-09-19 12:54:29 +10:00
Martin Schwenke
630196423a recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery.  This can result in
extra banning credits being applied to banned nodes.

This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 403938804caf1322f9773d63197e4303a7b2a788)
2013-09-18 17:16:35 +10:00
Martin Schwenke
d30e269ecc common: Make parse_ip() valgrind-clean
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c0bb147ca09e82019b05ec22995623cffc3184e2)
2013-09-11 15:35:38 +10:00
Martin Schwenke
8d11da3546 recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 36de63843de10a1f2a9ccdbbee24cc1d08542984)
2013-09-11 15:35:16 +10:00
Martin Schwenke
4e62553fcb recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ea5576071b22e1877903ec0921d375626a23e13b)
2013-09-11 15:35:10 +10:00
Martin Schwenke
fe7f66547b client: Remove unused function list_of_active_nodes_except_pnn()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57)
2013-09-11 15:35:03 +10:00
Martin Schwenke
c870f01160 tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes()
list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call.  Less is more...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d4e206fb818048b7fab4797c877b854bdbb1ab70)
2013-09-11 15:34:58 +10:00
Martin Schwenke
2d31ec2131 tools/ctdb: Fix a memory leak in parse_nodestring()
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 8753a094b97340deb26dd44f6ea345ca0a642a95)
2013-09-11 15:34:51 +10:00
Martin Schwenke
e003699686 tests/eventscripts: Tests for memory checking in 00.ctdb
... plus updates to test infrastructure to support.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7)
2013-09-11 15:34:42 +10:00
Martin Schwenke
b88bf1275c eventscripts: Clean up monitoring of system memory in 00.ctdb
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d)
2013-09-11 15:34:30 +10:00
Michael Adam
18f17aaa33 server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
This was the comment block I was touching and meant to adapt in
commit 00d3bf092e2f72eda330978c75ec85f17e870553.
My search was apparently not unique...

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 09940255011b119dc6af3304f5d3e9568e6006fd)
2013-08-26 13:24:32 +02:00
Martin Schwenke
128e2cb29d doc: Update NEWS
Signed-off-by: Martin Schwenke <martin@meltin.net>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit c446579fc442955ecc74f5566eaa0635c3171498)
2013-08-22 18:07:49 +10:00
Amitay Isaacs
7531b9528f build: Fix build dependencies for ctdb_lock_tdb
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit eb8575718400c45626cd1b2e0fd247bc3ebff655)
2013-08-22 17:59:59 +10:00
Martin Schwenke
1c3f4f55b0 tests/simple: Minimise the chance of a monitor event being cancelled
A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.

This obviously needs to be fixed in CTDB itself.  This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068)
2013-08-22 17:00:20 +10:00
Martin Schwenke
aecd66d0a0 packaging: Remove pushd/popd from maketarball.sh, don't need bash
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ffca990a18cbd31c8bd3ae01c6671d60da58f58)
2013-08-22 17:00:20 +10:00
Martin Schwenke
a04fb43708 tools/ctdb_diagnostics: Add output of "ctdb getdbmap"
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f0d69a9079b7aecc68f1d2d8510702046b618b19)
2013-08-22 17:00:20 +10:00
Martin Schwenke
6c468c94a2 tools/ctdb_diagnostics: Safer temporary file creation
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 406e1cb1fdd17ddd239774d0228e3657b73ae68f)
2013-08-22 17:00:20 +10:00
Martin Schwenke
cc74417341 eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
bb974f150b scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
2013-08-22 17:00:20 +10:00
Martin Schwenke
d1918ba27a tools/ctdb: Make most non-auto-all commands abort if run with -n all
Or if run with -n A,B,...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b1d8732b5da18ae80aea1df0e66b0b5cdcd919bc)
2013-08-22 17:00:20 +10:00
Martin Schwenke
fd79a86d8f tools/ctdb: Remove more non-essential fetching of PNN from daemon
The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN!  :-)

This works because parse_nodestring() validates PNNs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3402ae9ffb tools/ctdb: Improve auto-all settings for some commands
* ipreallocate is cluster-wide so should not be auto-all

* enablescript, disablescript, getreclock, setreclock, natgwlist can
  all be auto-all without issues

* xpnn, ipiface a local-only so don't work with -n, so might as well
  not be auto-all

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 123a4677528cb46bee1c6dad8a5162eba9880bc1)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3afcc53516 recoverd: Remove an unused temporary talloc context
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da22d5e60dc023009854025cc9e6bc4b0a84c60e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
1ae731198a recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure.  It was moved into ctdb_private.h a
long time ago to allow unit testing.  Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
2013-08-22 17:00:20 +10:00
Martin Schwenke
e657f75484 recoverd: Log more information when interfaces change
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b)
2013-08-22 17:00:20 +10:00
Amitay Isaacs
58e96eb178 traverse: Log when database traverse is started
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 256b157232c60bc432c94e54b1fae9699f737557)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
e850a6d2ca ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script.  So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
19444f7c3d ctdbd: Make sure call data is freed if doing an early return
This should avoid memory bloat when a request bounces between nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7677fb263f06a97398e2c546e32273fb96edca69)
2013-08-22 16:59:49 +10:00
Amitay Isaacs
a61a4b1254 common/io: Limit the queue buffer size for fair scheduling via tevent
If we process all the data available in a socket buffer, CTDB can stay busy
processing lots of packets via immediate event mechanism in tevent.  After
processing an immediate event, tevent returns without epoll_wait.  So as long
as there are immediate events, tevent will never poll other FDs.  CTDB will
report this as "Event handling took xx seconds" warning.  This is misleading
since CTDB is very busy processing packets, but never gets to the point of
polling FDs.

The improvement in socket handling made it worse when handling traverse
control.  There were lots of packets filled in the socket buffer quickly and
CTDB stayed busy processing those packets and not polling other FDs and timer
events.  This can lead to controls timing out and in worse case other nodes
marking busy node as disconnected.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 92939c1178d04116d842708bc2d6a9c2950e36cc)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
cfb7f74fa2 Revert "common/io: Keep queue buffer size multiple of 4K"
This reverts commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9.

This is not the best approach.  Allowing queue buffer size to grow
indefinitely causes large number of CTDB packets to be queued up very
quickly which when processed via immediate events will block CTDB from
processing events from other FDs.  If there are immediate events queued
up, tevent will never process any of the FDs till all immediate events
are processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d8b094e804efc53fae9f44c6ef961b7b5797d290)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
1467b666f2 Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.

This is a premature optimization.  Record can bounce between nodes
very quickly if it is a contended record.  There is no need to hold a
record on a node unnecessarily.  In case record contention becomes bad,
enabling sticky records on a database is a better idea.

Conflicts:
	include/ctdb_private.h
	server/ctdb_tunables.c

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
59dae19f5a ctdbd: Print a log message when a key becomes hot
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 48f40985f4592c28402303ccbb458756f4914f75)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
27fd34e9ff ctdbd: For volatile databases, write an empty record with rsn=0 only on dmaster
Empty record with rsn=0 should not be written on any other node other than
dmaster.  This is however not true for persistent databases.  So currently
apply the check only for volatile databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit df83ae7a047dab4803e0d94b1c11df48ae17ca96)
2013-08-22 14:08:52 +10:00
Martin Schwenke
73da6c0201 tools/ctdb: Fix message in showban when node is banned
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5cdad2b8ebd71a5e458c301d00eac00a211feeb3)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b74c232b8a tools/ctdb: Reimplement ban/unban using update_flags_wait_and_ipreallocate()
This has the side effect of making these commands more resilient to
control timeouts.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0fe79662e20e347d9e1cb12a42cd356e33572402)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b42b0e4676 tools/ctdb: Factor out common pattern used in disable/enable/stop/continue
Now we will only have one set of bugs.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 444521c852749558f39dc6131acce9e47eefd489)
2013-08-21 14:02:36 +10:00
Martin Schwenke
f72f4c362b tools/ctdb: Factor, simplify and improve robustness of ipreallocate code
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed.  This is not
the case.  Change the callers so they call the new ipreallocate()
function instead.

Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes.  Inactive
nodes will ignore it.  This is safe since we only want 1 reply.  If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...

Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc)
2013-08-21 14:02:36 +10:00