IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11956
In do_recovery, after the recovery and takeover is complete, recoverd
event is triggered. When the parallel database recovery was separated,
ctdb_recovery_helper implemented sending END_RECOVERY control which
causes recoverd event to be triggered. So when there is parallel database
recovery, recoverd event is triggered twice.
Instead move the call to run_recovered_eventscript() explicitly in
the serial recovery code path. This avoids the duplication trigger of
recoverd event.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit ecb74721e78942e66aaf2d2f88f141305e311328)
Autobuild-User(v4-4-test): Karolin Seeger <kseeger@samba.org>
Autobuild-Date(v4-4-test): Thu Jun 9 16:46:53 CEST 2016 on sn-devel-144
If one or more nodes are misbehaving during recovery, keep track of
failures as ban_credits. If the node with the highest ban_credits exceeds
5 ban credits, then tell recovery daemon to assign banning credits.
This will ban only a single node at a time in case of recovery failure.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Fri Mar 25 06:57:32 CET 2016 on sn-devel-144
(cherry picked from commit c51b8c22349bde6a3280c51ac147cab5ea27b5a6)
The last 3 patches address
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11941
This will be called from recovery helper to assign banning credits to
misbehaving node.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit ae366fb932e9d42fbde5aa48f04d70e15dc36888)
Once DB_PUSH_START is processed as part of recovery, push_started
flag tracks if there are multiple attempts to send DB_PUSH_START.
In DB_PUSH_CONFIRM, once the record count is confirmed, all information
related to DB_PUSH should be reset. However, The push_started flag was
not reset when the push_state was reset.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Wed Jun 8 14:31:52 CEST 2016 on sn-devel-144
(cherry picked from commit c620bf5debd57a4a5d7f893a2b6383098ff7a919)
The last 29 patches address
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11940
The timeout RecoverTimeout (default 120) is used for control messages
sent during the recovery. If any of the nodes does not respond to any
of the recovery control messages for RecoverTimeout seconds, then it
will cause a failure of recovery of a database. Recovery helper will
retry the recovery for a database 5 times.
In the worst case, if a database could not be recovered within 5 attempts,
a total of 600 seconds would have passed. During this time period other
timeouts will be triggered causing unnecessary failures as follows:
1. During the recovery, even though recoverd is processing events,
it does not send a ping message to ctdb daemon. If a ping message is
not received for RecdPingTimeout (default 60) seconds, then ctdb will
count it as unresponsive recovery daemon. If the recovery daemon
fails for RecdFailCount (default 10) times, then ctdb daemon will
restart recovery daemon. So after 600 seconds, ctdb daemon will
restart recovery daemon.
2. If ctdb daemon stays in recovery for RecoveryDropAllIPs (default 120),
then it will drop all the public addresses. This will cause all
SMB client to be disconnected unnecessarily. The released public
addresses will not be taken over till the recovery is complete.
To avoid dropping of IPs and restarting recovery daemon during a delayed
recovery, adjust RecoverTimeout to 30 seconds and limit number of
retries for recovering a database to 3. If we don't hear from a node
for more than 25 seconds, then the node is considered disconnected.
So 30 seconds is sufficient timeout for controls during recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Mon Jun 6 08:49:15 CEST 2016 on sn-devel-144
(cherry picked from commit 93dcca2a5f7af9698c9ba1024dbce1d1a66d4efb)
... instead of hardcoding number of retries.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit ad7a407a13b87ec13d94a808111d2583bfd1d217)
This abstraction uses capabilities of the remote nodes to either send
older PUSH_DB controls or newer DB_PUSH_START and DB_PUSH_CONFIRM
controls.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit ffea827bae2a8054ad488ae82eedb021cdfb71c4)
This abstraction depending on the capability of the remote node either
uses older PULL_DB control or newer DB_PULL control.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit b96a4759b397d873d56ccdd0c0b26e770cc10b89)
Also, rename traverse function and traverse state for recdb_records
consistently.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit 9058fe06df7f24e9a1b8fd5792a537a1fa8f5a60)
This variable is used to set the dmaster value for each record in
recdb_traverse().
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit 70011a1bfb24d9f4b2a042d353f907ef7c49bc1c)
This will be used to limit the size of record buffer sent in newer
controls for recovery and existing controls for vacuuming.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit c41808e6d292ec4cd861af545478e7dfb5c448e8)
During the recovery process, the timeout value for sending all controls
is decided by RecoverTimeout tunable. So in the recovery process,
first get the tunables, so the control timeout gets set correctly.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
(cherry picked from commit 700f39372ad9c893f8eb8884cd24999dc7026e60)
This matches the behaviour during serial database recovery.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Thu Feb 11 08:01:14 CET 2016 on sn-devel-144
(cherry picked from commit 19a411f839c5c34fec2aa5e6cb095346be56d94e)
If the node becomes stopped or banned after recovery is marked
active, then it will never freeze the databases, and hence the
node will keep banning itself indefinitely, until ctdbd is restarted.
This is a regression from 4.3, introduced with
b4357a79d916b1f8ade8fa78563fbef0ce670aa9
and
d8f3b490bbb691c9916eed0df5b980c1aef23c85
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11945
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Jun 1 17:36:12 CEST 2016 on sn-devel-144
(cherry picked from commit f8141e91a693912ea1107a49320e83702a80757a)
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
(cherry picked from commit 476672b647e44898a6de8894b23e598ad13b1fcf)
This reverts commit 0ff90f4fac74e61192aff100b168e38ce0adfabb.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=11707
The checks against database generation are not required since
the global generation is updated as part of updating vnnmap
before the actual database recovery. This change was done in
5aab31a39a3589b910a78b96071d6aa5e6547696.
Checking only against the database generation is incomplete. It can
cause CTDB to abort if the following sequence of events happen.
- CTDB gets REQ_DMASTER packet (gen1)
This packet processing gets deferred to get a record lock
- CTDB goes into recovery, marks RECOVERY_ACTIVE
CTDB recovery helper updates vnnmap (gen2)
- CTDB processes REQ_DMASTER packet (gen1)
The check against database generation (gen1) succeeds.
The check for lmaster is now invalid because VNNMAP has changed.
This will cause CTDB to abort due to protocol error.
Reverting the patch stops processing packets of older generation before
they get into call processing.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Martin Schwenke <martin@meltin.net>
Autobuild-User(master): Martin Schwenke <martins@samba.org>
Autobuild-Date(master): Tue Feb 9 12:39:24 CET 2016 on sn-devel-144
(cherry picked from commit b71c2e42308d23f08e1dd38c9a45ee8f25c65404)
Commit cfa0ffe78073f9e3a014bb127fb9a4b7ad95fceb introduced a memory
leak. Never assume...
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
The first element of these structures is a 32-bit PNN. On 64-bit
systems this field can be followed by 32-bits of padding. When the
structures are copied this can cause uninitialised memory to be
copied.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Thousands of these can be generated each second, rendering INFO level
debugging useless.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Michael Adam <obnox@samba.org>
Shorter temporary variables for compactness/readability. "tmp_ip" is
5 characters longer than "t". In each for statement it is used 4
times, so costs 20 characters. Save those extra characters so that
future edits will avoid going over 80 columns.
Tweak whitespace for readability, rewrap some code.
No functional changes.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
As per the comment:
If the IP address is hosted on this node then remove the connection.
Otherwise this function has been called because the server IP
address has been released to another node and the client has exited.
This means that we should not delete the connection information.
The takeover node processes connections too.
This doesn't matter at the moment, since the empty connection list for
an IP address that has been released will never be pushed to another
node. However, it matters if the connection information is stored in
a real replicated database.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
In a subsequent commit ctdb_takeover_client_destructor_hook() needs to
know the VNN. So just have both callers of
ctdb_remove_tcp_connection() do the lookup and pass in the VNN.
This should cause no change in behaviour.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Tickle list updates are broadcast to all connected nodes and are
accepted even when received on the same node that sent them. This
could actually lead to lost connection information when information
about new connections is received while an update is in-flight.
Instead, return early when the IP is hosted on the current node, since
it is the only one that could have sent the update.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
It hasn't worked since commit cda5f02c7c3491917d831ee23b93278dfaa5c82b
in 2009, which reworked the banning code. Since then
ctdb_control_modflags() has contained a comment saying:
/* we don't let other nodes modify our BANNED status */
Unbanning all nodes originally occurred here when the recovery master
role moved to a new node. The logic could have been meant for the
case when the old recovery master was malfunctioning, so got banned.
If any other nodes had been banned by this recovery master then they
would be unbanned. However, this would also unban the old recovery
master, which is probably suboptimal. The logic would also trigger if
a node was banned for a good reason and then the recovery master was
stopped. So, apart from doing nothing, the logic is too simplistic so
might as well be removed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The banning code caters for the case where the node specified in the
bantime data is not the node receiving the control. This never
happens. There are 2 places where ctdb_ctrl_set_ban() is called: the
ctdb CLI tool and the recovery daemon. Both pass the same node in the
bantime data that they are sending the control to. There are no plans
to do anything more elaborate, so just delete the handling of this
special case.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This can be easily decomposed into 2 separate arrays.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Autobuild-User(master): Amitay Isaacs <amitay@samba.org>
Autobuild-Date(master): Mon Nov 23 05:34:55 CET 2015 on sn-devel-104
This puts all of the memory allocation for ipalloc_state into its init
function. This also simplifies the code because
set_ipflags_internal() can no longer fail because it no longer
allocates memory.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
This is cleaner than returning ipflags and assigning them into
ipalloc_state afterwards.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
Instead of local or passed temporary contexts.
This has the side effect of making ipalloc_state available inside the
modified functions, making future use of ipalloc_state simpler.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
The only likely failure is out of memory, so just return boolean
value.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>