IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
A stopped or banned node cannot do anything useful. So do not participate
in any cluster activity and do not cause any unnecessary network traffic.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 2396981c4bcf30530aeb7f4395093cc202105b50)
If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 38304f88e0c634e97d4687c25adef975f71537b8)
If the banned pnn is not the local node, the function returns early.
So no need for additional check.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 297d93cecc3c0655e72ecac38508e113bdbeab9c)
When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.
(This used to be ctdb commit bb178338658b4ae32382a1f62f7c21cee1d4878f)
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 46efe7a886f8c4c56f19536adc98a73c22db906a)
ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
priority if it's 0 and return error if it's invalid. Other callers of
ctdb_start_freeze() are internal to CTDB. So if priority is invalid in
ctdb_start_freeze(), definitely something is seriously wrong.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 87716e8f504d659515d3dbcf93badbf106873bc8)
This ensures that whenever databases are frozen either via sending
control or by calling ctdb_start_freeze(), the action is logged.
Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
message in early return condition if databases are already frozen.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 478e24bceda3fedfba54ccb48faa115df726b819)
This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
was sent to the local daemon which in turn informed the recovery daemon.
And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc)
The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown. This regression was introduced in
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458)
Messages are lost until it is really up because syslogd_is_started is
set too early. Adding a pipe to do the notification allows the parent
to wait and only set syslogd_is_started when the logging daemon is
actually ready.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f3dd2eec200d6eeada2ea19cd7e76f1edfad6167)
It should run before:
* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 0a0c8543f167e11b75a622513367b083e42cbd3f)
If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1). This does not happen on RHEL, but does on AIX.
Reported-by: Chris Cowan <cc@us.ibm.com>
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit edb2a3556d03e248b42f63dd2c62382b723bc98f)
Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL. Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c429394afbabaee09f9216dc743419adddf523ea)
Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control. If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes. The "old" nodes will fail in a fairly
nondescript way (result == -1).
To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated". Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit b2654853ce9b7c18c5874b080bc94d3118078a5d)
Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain. We will try again to store this record.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
Reviewed-by: Michael Adam <obnox@samba.org>
(This used to be ctdb commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff)
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f15dd562fd8c08cafd957ce9509102db7eb49668)
If a tunable is not implemented on a remote node then this should not
be fatal. In this case the takeover run can continue using benign
defaults for the tunables.
However, timeouts and any unexpected errors should be fatal. These
should abort the takeover run because they can lead to unexpected IP
movements.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c0c27762ea728ed86405b29c642ba9e43200f4ae)
Both of the current defaults are implicitly 0. It is better to make
the defaults obvious.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1)
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544)
The "setup" event can fail when one of the eventscripts fails to run
its "setup" event. If this occurs then the eventscript should log an
error. The stack trace and core file generated when we abort provides
no useful information.
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7)
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7)
Also new client function ctdb_ctrl_get_runstate().
Signed-off-by: Martin Schwenke <martin@meltin.net>
(This used to be ctdb commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1)
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit f43fe3a560d5915c1a9893256f4e7bfe3d7e290a)
This deconstructs ctdb_start_transport(), which did much more than
starting the transport.
This removes a very unlikely race and adds some clarity. The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c31feb27dcdb748b5333321c85fe54852dfa1bcf)
This allows states, including startup and shutdown states, to be
clearly tracked. This doesn't include regular runtime "states", which
are handled by node flags.
Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().
Signed-off-by: Martin Schwenke <martin@meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28)
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4)
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>
(This used to be ctdb commit 6ea3212a7b177c6c06b1484cf9e8b2f4036653d9)