Fix race condition in psync2-pingoff test (#9712)

Test failed on freebsd:
```
*** [err]: Make the old master a replica of the new one and check conditions in tests/integration/psync2-pingoff.tcl
Expected '162' to be equal to '176' (context: type eval line 18 cmd {assert_equal [status $R(0) master_repl_offset] [status $R(1) master_repl_offset]} proc ::test)
```

There are two possible race conditions in the test.

1. The code waits for sync_full to increment, and assumes that means the
master did the fork. But in fact there are cases the master will increment
that sync_full counter (after replica asks for sync), but will see that
there's already a fork running and will delay the fork creation.

In this case the INCR will be executed before the fork happens, so it'll
not be in the command stream. Solve that by waiting for `master_link_status: up`
on the replica before the INCR.

2. The repl-ping-replica-period is still high (1 second), so there's a chance the
master will send an additional PING between the two calls to INFO (the line that
fails is the one that samples INFO from both servers). So there's a chance one of
them will have an incremented offset due to PING and the other won't have it yet.

In theory we can wait for the repl_offset to match, but then we risk facing a
situation where that race will hide an offset mis-match. so instead, i think we
should just change repl-ping-replica-period to prevent further pings from being pushed.

Co-authored-by: Oran Agra <oran@redislabs.com>
This commit is contained in:
Binbin 2021-11-01 22:07:08 +08:00 committed by GitHub
parent f1f3cceb50
commit cea7809cea
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -53,10 +53,16 @@ start_server {} {
}
test "Make the old master a replica of the new one and check conditions" {
# We set the new master's ping period to a high value, so that there's
# no chance for a race condition of sending a PING in between the two
# INFO calls in the assert for master_repl_offset match below.
$R(1) CONFIG SET repl-ping-replica-period 1000
assert_equal [status $R(1) sync_full] 0
$R(0) REPLICAOF $R_host(1) $R_port(1)
wait_for_condition 50 1000 {
[status $R(1) sync_full] == 1
[status $R(0) master_link_status] == "up"
} else {
fail "The new master was not able to sync"
}