IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Don't rely on availability of bios from the global fs_bio_set,
we should use our own bio_set for meta data IO.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Symptom: messages similar to
"FIXME asender in bm_change_bits_to,
bitmap locked for 'write from resync_finished' by worker"
If a resync or verify is finished (or aborted), a full bitmap writeout
is triggered. If we have ongoing local IO, the bitmap may still change
during that writeout, pending and not yet processed acks may cause bits
to be cleared, while new writes may cause bits to be to be set.
To fix this, introduce the drbd_bm_write_copy_pages() variant.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith),
or because we lost access to data (on-no-data-accessible suspend-io).
Resuming from there (re-connect, or re-attach, or explicit admin
intervention) should "just work".
Unfortunately, if the re-attach/re-connect did not happen within
the timeout, since the commit
drbd: Implemented real timeout checking for request processing time
if so configured, the request_timer_fn() would timeout and
detach/disconnect virtually immediately.
This change tracks the most recent attach and connect, and does not
timeout within <configured timeout interval> after attach/connect.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Changes to the role and disk state should be delayed or rejected
while we establish a connection.
This is necessary, since the peer will base its resync decision
on the UUIDs and the state we sent in the drbd_connect() function.
The most prominent example for this race is becoming primary after
sending state and UUIDs and before the state changes to C_WF_CONNECTION.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We used to have a barrier implementation where barrier_nr 0 was
reserved. That is long gone. Just use the full sequence space.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We assumed only bios with bi_idx == 0 would end up
in drbd_make_request().
That is wrong.
At least device mapper, in __clone_and_map(), may submit
clones only covering a partial bio, but sharing
the original bvec, by adjusting bi_idx and relevant
other bio members of the clone.
We used __bio_for_each_segment() in various places,
even though that is documented as
* drivers should not use the __ version unless they _really_ want to
* run through the entire bio and not just pending pieces
Impact: we would send the full bio bvec, even for the clone
with bi_idx > 0, which will cause data corruption on the
peer (because we submit wrong data at the clone offset),
and will cause a DRBD protocol error, disconnect/reconnect
and resync (thus fixing the corruption),
because the next package header would be expected right
in the middle of the sent data, causing DRBD magic mismatch.
Fix: drop the assert, and use bio_for_each_segment()
instead of the __ version.
Conflicts:
drbd/drbd_tracing.c
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If the backing device is already frozen during attach, we failed
to recognize that. The current disk-timeout code works on top
of the drbd_request objects. During attach we do not allow IO
and therefore never generate a drbd_request object but block
before that in drbd_make_request().
This patch adds the timeout to all drbd_md_sync_page_io().
Before this patch we used to go from D_ATTACHING directly
to D_DISKLESS if IO failed during attach. We can no longer
do this since we have to stay in D_FAILED until all IO
ops issued to the backing device returned.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION.
Sending may already work in these cstates, but the peer still expects
the HandShake / ConnectionFeatures packet.
Actually triggered by the Testuite on kugel.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.
Log excerpt:
Remote failed to finish a request within ko-count * timeout
peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
asender terminated
...
peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 )
...
Connection closed
peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 )
receiver terminated
Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.
The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.
Fixed in several code paths:
Make sure the connection state is NetworkFailure or worse
before starting the cleanup in drbd_disconnect().
This should make sure the cleanup won't miss any requests.
Disallow receive_state() to "upgrade" the connection state
from an error state. This will make sure the "illegal" state
transition won't happen.
For all connection failure states,
relax the safe-guard in sanitize_state() again
to silently mask out those state changes
(e.g. Timeout -> Connected becomes Timeout -> Timeout).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
DRBD state changes schedule after_state_ch() actions to a worker thread,
which decides on the old and new states of that change, whether to send
an informational state update packet (P_STATE) to the peer.
If it decides to drbd_send_state(), it would however always send the
_curent_ state, which, if a second state change happens before the
after_state_ch() of the first ran, may "fast-forward" the peer's view
about this node. In most cases that is harmless, but sometimes this can
confuse DRBD, for example into not actually starting a necessary resync
if you do a very tight detach/attach loop on a Connected Secondary.
Fix this by always sending the "new" state of the respective state
transition which scheduled this after_state_ch() work.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we get into the C_BROKEN_PIPE cstate once, the state engine set the
thi->t_state of the receiver thread to restarting. But with the while loop
in drbdd_init() a new connection gets established. After the call into
drbdd() returns immediately since the thi->t_state is not RUNNING. The
restart of drbd_init() then resets thi->t_state to RUNNING.
I.e. after entering C_BROKEN_PIPE once, the next successful established
connection gets wasted.
The two parts of the fix:
* Do not cause the thread to restart if we detect the issue
with the sockets while we are in C_WF_CONNECTION.
* Make sure that all actions that would have set us to C_BROKEN_PIPE
happen before the state change to C_WF_REPORT_PARAMS.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
...when the peer has inconsistent data. In that case we failed to
clear the susp_nod flag. When the local disk was attached again
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now, the new edition of the clause only fires if a diskless
peer gets promoted.
This is a fixup for "drbd: Delayed creation of current-UUID".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the disk-timeout is active, and it expires for a single request,
we consider the local disk as D_FAILED. Note: With this change,
I made both timeout based state transitions HARD state transitions.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The new function drbd_md_get_buffer() aborts waiting for the buffer
in case the disk failes in the meantime.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Regression introduced with 8.3.11 commit:
drbd: Take a more conservative approach when deciding max_bio_size
Never ever tell an older drbd, that we support more than 32KiB
in a single data request (packet).
Never believe an older drbd, that is supports more than 32KiB
in a single data request (packet)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The reason for this change is that, with when doing
'drbdadm invalidate' on a disconnected resource caused
an "implicitly set pdsk from UpToDate to DUnknown" message,
which was missleading.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Still missing: rcu_readlock() on the various call sites that
access/iterate over those idrs.
We don't need a specific write lock, as we only modify from
configuration context, which is already strictly serialized.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Don't rely on availability of bios from the global fs_bio_set,
we should use our own bio_set for meta data IO.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Up to now it only operated on minor numbers. Now it can work also
on named connections.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Instead of artificially enlarging the command decoding arrays to
P_MAX_CMD entries, check if an index is within the valid range using the
ARRAY_SIZE() macro.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The previous algorithm for dealing with overlapping concurrent writes
was generating unnecessary warnings for scenarios which could be
legitimate, and did not always handle partially overlapping requests
correctly. Improve it algorithm as follows:
* While local or remote write requests are in progress, conflicting new
local write requests will be delayed (commit 82172f7).
* When a conflict between a local and remote write request is detected,
the node with the discard flag decides how to resolve the conflict: It
will ask its peer to discard conflicting requests which are fully
contained in the local request and retry requests which overlap only
partially. This involves a protocol change.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Use a new on-disk transaction format for the activity log, which allows
for multiple changes to the active set per transaction.
Using 4k transaction blocks, we can now get rid of the work-around code
to deal with devices not supporting 512 byte logical block size.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Some open-coded clear_bit(); smp_mb__after_clear_bit();
should in fact have been smp_mb__before_clear_bit(); clear_bit();
Instead, use clear_bit_unlock() to annotate the intention,
and have it do the right thing.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In compatibility mode with old DRBDs, use that as the state_mutex
as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
No longer work callbacks must operate on a mdev. From now on they
can also operate on a tconn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>