1
0
mirror of https://github.com/samba-team/samba.git synced 2025-03-09 08:58:35 +03:00

4765 Commits

Author SHA1 Message Date
Martin Schwenke
6c468c94a2 tools/ctdb_diagnostics: Safer temporary file creation
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 406e1cb1fdd17ddd239774d0228e3657b73ae68f)
2013-08-22 17:00:20 +10:00
Martin Schwenke
cc74417341 eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81833052d7ee8f76b1e98376a0273448640cfa8e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
bb974f150b scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4b914d7e217202f3d11a8e95f9f74bc17869475b)
2013-08-22 17:00:20 +10:00
Martin Schwenke
d1918ba27a tools/ctdb: Make most non-auto-all commands abort if run with -n all
Or if run with -n A,B,...

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b1d8732b5da18ae80aea1df0e66b0b5cdcd919bc)
2013-08-22 17:00:20 +10:00
Martin Schwenke
fd79a86d8f tools/ctdb: Remove more non-essential fetching of PNN from daemon
The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN!  :-)

This works because parse_nodestring() validates PNNs.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3402ae9ffb tools/ctdb: Improve auto-all settings for some commands
* ipreallocate is cluster-wide so should not be auto-all

* enablescript, disablescript, getreclock, setreclock, natgwlist can
  all be auto-all without issues

* xpnn, ipiface a local-only so don't work with -n, so might as well
  not be auto-all

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 123a4677528cb46bee1c6dad8a5162eba9880bc1)
2013-08-22 17:00:20 +10:00
Martin Schwenke
3afcc53516 recoverd: Remove an unused temporary talloc context
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit da22d5e60dc023009854025cc9e6bc4b0a84c60e)
2013-08-22 17:00:20 +10:00
Martin Schwenke
1ae731198a recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure.  It was moved into ctdb_private.h a
long time ago to allow unit testing.  Unit test compilation was
changed shortly afterwards to make this unnecessary.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit db57261d7dc264e161659a8c547f44fbd9e88eeb)
2013-08-22 17:00:20 +10:00
Martin Schwenke
e657f75484 recoverd: Log more information when interfaces change
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b)
2013-08-22 17:00:20 +10:00
Amitay Isaacs
58e96eb178 traverse: Log when database traverse is started
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 256b157232c60bc432c94e54b1fae9699f737557)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
e850a6d2ca ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script.  So "ctdb scriptstatus" output
will be useful from debug hung script.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Pair-Programmed-With: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e)
2013-08-22 17:00:19 +10:00
Amitay Isaacs
19444f7c3d ctdbd: Make sure call data is freed if doing an early return
This should avoid memory bloat when a request bounces between nodes.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7677fb263f06a97398e2c546e32273fb96edca69)
2013-08-22 16:59:49 +10:00
Amitay Isaacs
a61a4b1254 common/io: Limit the queue buffer size for fair scheduling via tevent
If we process all the data available in a socket buffer, CTDB can stay busy
processing lots of packets via immediate event mechanism in tevent.  After
processing an immediate event, tevent returns without epoll_wait.  So as long
as there are immediate events, tevent will never poll other FDs.  CTDB will
report this as "Event handling took xx seconds" warning.  This is misleading
since CTDB is very busy processing packets, but never gets to the point of
polling FDs.

The improvement in socket handling made it worse when handling traverse
control.  There were lots of packets filled in the socket buffer quickly and
CTDB stayed busy processing those packets and not polling other FDs and timer
events.  This can lead to controls timing out and in worse case other nodes
marking busy node as disconnected.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 92939c1178d04116d842708bc2d6a9c2950e36cc)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
cfb7f74fa2 Revert "common/io: Keep queue buffer size multiple of 4K"
This reverts commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9.

This is not the best approach.  Allowing queue buffer size to grow
indefinitely causes large number of CTDB packets to be queued up very
quickly which when processed via immediate events will block CTDB from
processing events from other FDs.  If there are immediate events queued
up, tevent will never process any of the FDs till all immediate events
are processed.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d8b094e804efc53fae9f44c6ef961b7b5797d290)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
1467b666f2 Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.

This is a premature optimization.  Record can bounce between nodes
very quickly if it is a contended record.  There is no need to hold a
record on a node unnecessarily.  In case record contention becomes bad,
enabling sticky records on a database is a better idea.

Conflicts:
	include/ctdb_private.h
	server/ctdb_tunables.c

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ac417b0003f0116f116834ad2ac51482d25cfa0d)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
59dae19f5a ctdbd: Print a log message when a key becomes hot
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 48f40985f4592c28402303ccbb458756f4914f75)
2013-08-22 14:08:52 +10:00
Amitay Isaacs
27fd34e9ff ctdbd: For volatile databases, write an empty record with rsn=0 only on dmaster
Empty record with rsn=0 should not be written on any other node other than
dmaster.  This is however not true for persistent databases.  So currently
apply the check only for volatile databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit df83ae7a047dab4803e0d94b1c11df48ae17ca96)
2013-08-22 14:08:52 +10:00
Martin Schwenke
73da6c0201 tools/ctdb: Fix message in showban when node is banned
Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 5cdad2b8ebd71a5e458c301d00eac00a211feeb3)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b74c232b8a tools/ctdb: Reimplement ban/unban using update_flags_wait_and_ipreallocate()
This has the side effect of making these commands more resilient to
control timeouts.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 0fe79662e20e347d9e1cb12a42cd356e33572402)
2013-08-21 14:02:36 +10:00
Martin Schwenke
b42b0e4676 tools/ctdb: Factor out common pattern used in disable/enable/stop/continue
Now we will only have one set of bugs.  :-)

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 444521c852749558f39dc6131acce9e47eefd489)
2013-08-21 14:02:36 +10:00
Martin Schwenke
f72f4c362b tools/ctdb: Factor, simplify and improve robustness of ipreallocate code
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed.  This is not
the case.  Change the callers so they call the new ipreallocate()
function instead.

Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes.  Inactive
nodes will ignore it.  This is safe since we only want 1 reply.  If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...

Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc)
2013-08-21 14:02:36 +10:00
Martin Schwenke
db121b4c8f tools/ctdb: Use ctdb_get_pnn() to get PNN of the current node
This has already been stored at connect time and can't fail.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit d8eb2e7fdd7645719370dad4f2faa5c3fffa8249)
2013-08-21 14:02:36 +10:00
Michael Adam
aa1360aeb2 util: In passing the code, fix a space vs. tab in set_close_on_exec().
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit f9556a6f1fe0046308c8b363e6dcaf3f7ce6f2b7)
2013-08-19 17:12:33 +02:00
Michael Adam
621bfe8b0d server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 00d3bf092e2f72eda330978c75ec85f17e870553)
2013-08-19 17:12:33 +02:00
Michael Adam
922246de73 server: fix wording and punctuation in comment block for ctdb_reply_dmaster().
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit cb3a1c5af3b796dba30cae07118670d3c9e57df7)
2013-08-19 17:12:32 +02:00
Amitay Isaacs
cb8310ddb6 recoverd: Improve log message when nodes disagree on recmaster
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 7b7aa7b599536cd60ebb84d363607bb4e953248a)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
3c0a477911 common: Null terminate process name string so valgrind doesn't complain
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1c9025fdd08d1cea342af7487d0123015e08831b)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ae30b61255 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit f0853013655ac3bedf1b793de128fb679c6db6c6)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
ee8d573069 vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster.  This makes a request for
that record bounce between nodes endlessly.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit a610bc351f0754c84c78c27d02f9a695e60c5b0f)
2013-08-14 16:55:51 +10:00
Amitay Isaacs
f9be4803cb db_wrap: Make sure tdb messages are logged correctly
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 60cb40d090e45ff6134c098a238fac7ad854f134)
2013-08-14 16:55:51 +10:00
Martin Schwenke
fec69034ee eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem.  Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.

Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures.  Restart on every 10th failure to try to bring the node back
to good health.

Update unit tests to match.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef)
2013-08-14 16:10:30 +10:00
Martin Schwenke
4cb3e2cd78 tools/ctdb: Increase default control timeout to 10 seconds
The current 3 second timeout is arbitrary and users trip over it
sometimes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae)
2013-08-14 15:57:04 +10:00
Martin Schwenke
e6ce2f55ef eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit ff5f0d1e29af2b293e30cdc54bed03a644be7038)
2013-08-14 15:57:04 +10:00
Martin Schwenke
35d9631eda eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad)
2013-08-14 15:57:04 +10:00
Martin Schwenke
b1f7337d2b eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67)
2013-08-14 15:57:04 +10:00
Martin Schwenke
0ca046577f eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031)
2013-08-14 15:57:03 +10:00
Martin Schwenke
bdbe37b24f eventscripts: Separate out RPC service restart code
While doing this:

* Explicitly assign RPC program and version information in
  _nfs_check_rpc_common().  This is more lines of code but is easier
  to read.

* Don't print the options when starting a service.  Trying to print it
  makes the code messy for little benefit.

  Update the eventscript unit testing code and a Ganesha test to
  reflect this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit e8b531405665885196c95fe1608db33a255bf761)
2013-08-14 15:57:03 +10:00
Martin Schwenke
2afb5632c7 tests/eventscripts: Override background_with_logging(), just prepend "&"
That is, output that goes through background_with_logging() just gets
"&" prepended to each line.  This is cleaner than having the tests
grovel through logs.

Update some 49.winbind/50.samba tests to deal with this.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5)
2013-08-14 15:57:03 +10:00
Martin Schwenke
df539a66cb eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad)
2013-08-14 15:57:03 +10:00
Martin Schwenke
5459cdc8a6 eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output.  This is consistent with the way
CTDB generally deals with the output when stopping a service.

It also makes updating the eventscript unit tests easier.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit c7332526b1b488abefeb4be78a7cd3f2f9abc451)
2013-08-14 15:57:03 +10:00
Martin Schwenke
d63cf0e7a7 tests/simple: Unreachable node test should wait for recovery to complete
This should minimise the chances of a control timing out.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 63be516673c5d9c0d543617bf1bb8bca919956a8)
2013-08-14 15:57:03 +10:00
Martin Schwenke
0997b0c400 tests/simple: Fix the missing IP test
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.

Also, do not disable 10.interface until it matters.  Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.

Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45)
2013-08-14 15:57:03 +10:00
Amitay Isaacs
8f1e94dfa4 recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases
When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb().  Recovery master always passes 0
for tdb_flags.  For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2fc6b6403707a292d134140fc0b9145b454992c5)
2013-08-14 15:54:48 +10:00
Amitay Isaacs
de6b97ce4f Revert "recoverd: Use correct tdb flags when creating missing databases"
This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.

This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001)
2013-08-14 14:15:33 +10:00
Amitay Isaacs
d349b56e2d common/io: Keep queue buffer size multiple of 4K
Currently queue buffer size is realloc'd every time we need to extend the
buffer.  Small increments can cause memory fragmentation.  Instead always
extend buffer in multiples of 4K.  This should reduce multiple talloc_realloc
calls when there are lots of packets in the socket buffer.

Also, if queue buffer has grown larger than 64K, throw away the buffer once
all the requests in the queue have been processed.  That way queue does not
hold on to large buffers.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9)
2013-08-09 11:07:37 +10:00
Martin Schwenke
6f9090648a packaging: Allow setting custom release number in RPM spec file
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 867afb247bd8cc86c8d738f051a44cc534cafacf)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
a98baa539e ctdbd: When a record is made sticky, log only once
Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky().  That way if the record is already sticky, the
message is not repeated unnecessarily.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831)
2013-08-09 11:07:37 +10:00
Amitay Isaacs
d42cea6efe ctdbd: Improve high hopcount log messages when request is redirected
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e)
2013-08-09 11:07:37 +10:00
Martin Schwenke
98163e01a9 scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.

Also, minor log formatting changes.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 81d7ce03b28d592a1337639e14d9ea141e20bfff)
2013-08-09 11:04:55 +10:00
Amitay Isaacs
ded2f28954 ctdbd: Avoid leaking file descriptor if talloc fails
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe)
2013-08-09 11:04:55 +10:00