BUG/MEDIUM: server/dns: preserve server's port upon resolution timeout or error

@boi4 reported in GH #2578 that since 3.0-dev1 for servers with address
learned from A/AAAA records after a DNS flap server would be put out of
maintenance with proper address but with invalid port (== 0), making it
unusable and causing tcp checks to fail:

[NOTICE]   (1) : Loading success.
[WARNING]  (8) : Server mybackend/myserver1 is going DOWN for maintenance (DNS refused status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'mybackend' has no server available!
[WARNING]  (8) : mybackend/myserver1: IP changed from '(none)' to '127.0.0.1' by 'myresolver/ns1'.
[WARNING]  (8) : Server mybackend/myserver1 ('myhost') is UP/READY (resolves again).
[WARNING]  (8) : Server mybackend/myserver1 administratively READY thanks to valid DNS answer.
[WARNING]  (8) : Server mybackend/myserver1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

@boi4 also mentioned that this used to work fine before.

Willy suggested that this regression may have been introduced by 64c9c8e
("BUG/MINOR: server/dns: use server_set_inetaddr() to unset srv addr from DNS")

Turns out he was right! Indeed, in 64c9c8e we systematically memset the
whole server_inetaddr struct (which contains both the requested server's
addr and port planned for atomic update) instead of only memsetting the
addr part of the structure: except when SRV records are involved (SRV
records provide both the address and the port unlike A or AAAA records),
we must not reset the server's port upon DNS errors because the port may
have been provided at config time and we don't want to lose its value.

Big thanks to @boi4 for his well-documented issue that really helped us to
pinpoint the bug right on time for the dev-13 release.

No backport needed (unless 64c9c8e gets backported).
This commit is contained in:
Aurelien DARRAGON 2024-05-24 13:55:41 +02:00
parent 98ed11b0c5
commit c16eba8183
2 changed files with 10 additions and 7 deletions

View File

@ -725,7 +725,7 @@ static void resolv_srvrq_cleanup_srv(struct server *srv)
ha_free(&srv->hostname_dn);
srv->hostname_dn_len = 0;
memset(&srv_addr, 0, sizeof(srv_addr));
/* unset server's addr */
/* unset server's addr AND port */
server_set_inetaddr(srv, &srv_addr, SERVER_INETADDR_UPDATER_NONE, NULL);
srv->flags |= SRV_F_NO_RESOLUTION;

View File

@ -4597,8 +4597,9 @@ int snr_resolution_cb(struct resolv_requester *requester, struct dns_counters *c
if (has_no_ip && !snr_set_srv_down(s)) {
struct server_inetaddr srv_addr;
memset(&srv_addr, 0, sizeof(srv_addr));
/* unset server's addr */
/* unset server's addr, keep port */
server_get_inetaddr(s, &srv_addr);
memset(&srv_addr.addr, 0, sizeof(srv_addr.addr));
server_set_inetaddr(s, &srv_addr, SERVER_INETADDR_UPDATER_NONE, NULL);
}
return 1;
@ -4611,8 +4612,9 @@ int snr_resolution_cb(struct resolv_requester *requester, struct dns_counters *c
if (has_no_ip && !snr_set_srv_down(s)) {
struct server_inetaddr srv_addr;
memset(&srv_addr, 0, sizeof(srv_addr));
/* unset server's addr */
/* unset server's addr, keep port */
server_get_inetaddr(s, &srv_addr);
memset(&srv_addr.addr, 0, sizeof(srv_addr.addr));
server_set_inetaddr(s, &srv_addr, SERVER_INETADDR_UPDATER_NONE, NULL);
}
return 0;
@ -4697,8 +4699,9 @@ int snr_resolution_error_cb(struct resolv_requester *requester, int error_code)
if (!snr_set_srv_down(s)) {
struct server_inetaddr srv_addr;
memset(&srv_addr, 0, sizeof(srv_addr));
/* unset server's addr */
/* unset server's addr, keep port */
server_get_inetaddr(s, &srv_addr);
memset(&srv_addr.addr, 0, sizeof(srv_addr.addr));
server_set_inetaddr(s, &srv_addr, SERVER_INETADDR_UPDATER_NONE, NULL);
HA_SPIN_UNLOCK(SERVER_LOCK, &s->lock);
resolv_detach_from_resolution_answer_items(requester->resolution, requester);