MEDIUM: session: implement half-closed timeouts (client-fin and server-fin)

Long-lived sessions are often subject to half-closed sessions resulting in
a lot of sessions appearing in FIN_WAIT state in the system tables, and no
way for haproxy to get rid of them. This typically happens because clients
suddenly disconnect without sending any packet (eg: FIN or RST was lost in
the path), and while the server detects this using an applicative heart
beat, haproxy does not close the connection.

This patch adds two new timeouts : "timeout client-fin" and
"timeout server-fin". The former allows one to override the client-facing
timeout when a FIN has been received or sent. The latter does the same for
server-facing connections, which is less useful.
This commit is contained in:
Willy Tarreau 2014-05-10 14:30:07 +02:00
parent 941aac0072
commit 05cdd9655d
4 changed files with 126 additions and 12 deletions

View File

@ -1397,6 +1397,7 @@ tcp-response content - - X X
tcp-response inspect-delay - - X X
timeout check X - X X
timeout client X X X -
timeout client-fin X X X -
timeout clitimeout (deprecated) X X X -
timeout connect X - X X
timeout contimeout (deprecated) X - X X
@ -1404,6 +1405,7 @@ timeout http-keep-alive X X X X
timeout http-request X X X X
timeout queue X - X X
timeout server X - X X
timeout server-fin X - X X
timeout srvtimeout (deprecated) X - X X
timeout tarpit X X X X
timeout tunnel X - X X
@ -7548,7 +7550,8 @@ timeout clitimeout <timeout> (deprecated)
losses by specifying timeouts that are slightly above multiples of 3 seconds
(eg: 4 or 5 seconds). If some long-lived sessions are mixed with short-lived
sessions (eg: WebSocket and HTTP), it's worth considering "timeout tunnel",
which overrides "timeout client" and "timeout server" for tunnels.
which overrides "timeout client" and "timeout server" for tunnels, as well as
"timeout client-fin" for half-closed connections.
This parameter is specific to frontends, but can be specified once for all in
"defaults" sections. This is in fact one of the easiest solutions not to
@ -7564,6 +7567,31 @@ timeout clitimeout <timeout> (deprecated)
See also : "clitimeout", "timeout server", "timeout tunnel".
timeout client-fin <timeout>
Set the inactivity timeout on the client side for half-closed connections.
May be used in sections : defaults | frontend | listen | backend
yes | yes | yes | no
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
The inactivity timeout applies when the client is expected to acknowledge or
send data while one direction is already shut down. This timeout is different
from "timeout client" in that it only applies to connections which are closed
in one direction. This is particularly useful to avoid keeping connections in
FIN_WAIT state for too long when clients do not disconnect cleanly. This
problem is particularly common long connections such as RDP or WebSocket.
Note that this timeout can override "timeout tunnel" when a connection shuts
down in one direction.
This parameter is specific to frontends, but can be specified once for all in
"defaults" sections. By default it is not set, so half-closed connections
will use the other timeouts (timeout.client or timeout.tunnel).
See also : "timeout client", "timeout server-fin", and "timeout tunnel".
timeout connect <timeout>
timeout contimeout <timeout> (deprecated)
Set the maximum time to wait for a connection attempt to a server to succeed.
@ -7741,6 +7769,32 @@ timeout srvtimeout <timeout> (deprecated)
See also : "srvtimeout", "timeout client" and "timeout tunnel".
timeout server-fin <timeout>
Set the inactivity timeout on the server side for half-closed connections.
May be used in sections : defaults | frontend | listen | backend
yes | no | yes | yes
Arguments :
<timeout> is the timeout value specified in milliseconds by default, but
can be in any other unit if the number is suffixed by the unit,
as explained at the top of this document.
The inactivity timeout applies when the server is expected to acknowledge or
send data while one direction is already shut down. This timeout is different
from "timeout server" in that it only applies to connections which are closed
in one direction. This is particularly useful to avoid keeping connections in
FIN_WAIT state for too long when a remote server does not disconnect cleanly.
This problem is particularly common long connections such as RDP or WebSocket.
Note that this timeout can override "timeout tunnel" when a connection shuts
down in one direction. This setting was provided for completeness, but in most
situations, it should not be needed.
This parameter is specific to backends, but can be specified once for all in
"defaults" sections. By default it is not set, so half-closed connections
will use the other timeouts (timeout.server or timeout.tunnel).
See also : "timeout client-fin", "timeout server", and "timeout tunnel".
timeout tarpit <timeout>
Set the duration for which tarpitted connections will be maintained
May be used in sections : defaults | frontend | listen | backend
@ -7782,6 +7836,14 @@ timeout tunnel <timeout>
to a proxy), or after the first response when no keepalive/close option is
specified.
Since this timeout is usually used in conjunction with long-lived connections,
it usually is a good idea to also set "timeout client-fin" to handle the
situation where a client suddenly disappears from the net and does not
acknowledge a close, or sends a shutdown and does not acknowledge pending
data anymore. This can happen in lossy networks where firewalls are present,
and is detected by the presence of large amounts of sessions in a FIN_WAIT
state.
The value is specified in milliseconds by default, but can be in any other
unit if the number is suffixed by the unit, as specified at the top of this
document. Whatever the expected normal idle time, it is a good practice to
@ -7797,11 +7859,11 @@ timeout tunnel <timeout>
option http-server-close
timeout connect 5s
timeout client 30s
timeout client 30s
timeout client-fin 30s
timeout server 30s
timeout tunnel 1h # timeout to use with WebSocket and CONNECT
See also : "timeout client", "timeout server".
See also : "timeout client", "timeout client-fin", "timeout server".
transparent (deprecated)

View File

@ -284,6 +284,8 @@ struct proxy {
int httpka; /* maximum time for a new HTTP request when using keep-alive */
int check; /* maximum time for complete check */
int tunnel; /* I/O timeout to use in tunnel mode (in ticks) */
int clientfin; /* timeout to apply to client half-closed connections */
int serverfin; /* timeout to apply to server half-closed connections */
} timeout;
char *id, *desc; /* proxy id (name) and description */
struct list pendconns; /* pending connections with no server assigned yet */

View File

@ -187,10 +187,19 @@ static int proxy_parse_timeout(char **args, int section, struct proxy *proxy,
tv = &proxy->timeout.tunnel;
td = &defpx->timeout.tunnel;
cap = PR_CAP_BE;
} else if (!strcmp(args[0], "client-fin")) {
tv = &proxy->timeout.clientfin;
td = &defpx->timeout.clientfin;
cap = PR_CAP_FE;
} else if (!strcmp(args[0], "server-fin")) {
tv = &proxy->timeout.serverfin;
td = &defpx->timeout.serverfin;
cap = PR_CAP_BE;
} else {
memprintf(err,
"'timeout' supports 'client', 'server', 'connect', 'check', "
"'queue', 'http-keep-alive', 'http-request', 'tunnel' or 'tarpit', (got '%s')",
"'queue', 'http-keep-alive', 'http-request', 'tunnel', 'tarpit', "
"'client-fin' and 'server-fin' (got '%s')",
args[0]);
return -1;
}

View File

@ -2189,11 +2189,17 @@ struct task *process_session(struct task *t)
/* first, let's check if the request buffer needs to shutdown(write), which may
* happen either because the input is closed or because we want to force a close
* once the server has begun to respond.
* once the server has begun to respond. If a half-closed timeout is set, we adjust
* the other side's timeout as well.
*/
if (unlikely((s->req->flags & (CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CLOSE|CF_SHUTR)) ==
(CF_AUTO_CLOSE|CF_SHUTR)))
channel_shutw_now(s->req);
(CF_AUTO_CLOSE|CF_SHUTR))) {
channel_shutw_now(s->req);
if (tick_isset(s->fe->timeout.clientfin)) {
s->rep->wto = s->fe->timeout.clientfin;
s->rep->wex = tick_add(now_ms, s->rep->wto);
}
}
/* shutdown(write) pending */
if (unlikely((s->req->flags & (CF_SHUTW|CF_SHUTW_NOW)) == CF_SHUTW_NOW &&
@ -2201,6 +2207,10 @@ struct task *process_session(struct task *t)
if (s->req->flags & CF_READ_ERROR)
s->req->cons->flags |= SI_FL_NOLINGER;
si_shutw(s->req->cons);
if (tick_isset(s->be->timeout.serverfin)) {
s->rep->rto = s->be->timeout.serverfin;
s->rep->rex = tick_add(now_ms, s->rep->rto);
}
}
/* shutdown(write) done on server side, we must stop the client too */
@ -2213,6 +2223,10 @@ struct task *process_session(struct task *t)
if (s->req->prod->flags & SI_FL_NOHALF)
s->req->prod->flags |= SI_FL_NOLINGER;
si_shutr(s->req->prod);
if (tick_isset(s->fe->timeout.clientfin)) {
s->rep->wto = s->fe->timeout.clientfin;
s->rep->wex = tick_add(now_ms, s->rep->wto);
}
}
/* it's possible that an upper layer has requested a connection setup or abort.
@ -2308,13 +2322,26 @@ struct task *process_session(struct task *t)
channel_forward(s->rep, CHN_INFINITE_FORWARD);
/* if we have no analyser anymore in any direction and have a
* tunnel timeout set, use it now.
* tunnel timeout set, use it now. Note that we must respect
* the half-closed timeouts as well.
*/
if (!s->req->analysers && s->be->timeout.tunnel) {
s->req->rto = s->req->wto = s->rep->rto = s->rep->wto =
s->be->timeout.tunnel;
s->req->rex = s->req->wex = s->rep->rex = s->rep->wex =
tick_add(now_ms, s->be->timeout.tunnel);
if ((s->req->flags & CF_SHUTR) && tick_isset(s->fe->timeout.clientfin))
s->rep->wto = s->fe->timeout.clientfin;
if ((s->req->flags & CF_SHUTW) && tick_isset(s->be->timeout.serverfin))
s->rep->rto = s->be->timeout.serverfin;
if ((s->rep->flags & CF_SHUTR) && tick_isset(s->be->timeout.serverfin))
s->req->wto = s->be->timeout.serverfin;
if ((s->rep->flags & CF_SHUTW) && tick_isset(s->fe->timeout.clientfin))
s->req->rto = s->fe->timeout.clientfin;
s->req->rex = tick_add(now_ms, s->req->rto);
s->req->wex = tick_add(now_ms, s->req->wto);
s->rep->rex = tick_add(now_ms, s->rep->rto);
s->rep->wex = tick_add(now_ms, s->rep->wto);
}
}
@ -2344,13 +2371,23 @@ struct task *process_session(struct task *t)
/* first, let's check if the response buffer needs to shutdown(write) */
if (unlikely((s->rep->flags & (CF_SHUTW|CF_SHUTW_NOW|CF_AUTO_CLOSE|CF_SHUTR)) ==
(CF_AUTO_CLOSE|CF_SHUTR)))
(CF_AUTO_CLOSE|CF_SHUTR))) {
channel_shutw_now(s->rep);
if (tick_isset(s->be->timeout.serverfin)) {
s->req->wto = s->be->timeout.serverfin;
s->req->wex = tick_add(now_ms, s->req->wto);
}
}
/* shutdown(write) pending */
if (unlikely((s->rep->flags & (CF_SHUTW|CF_SHUTW_NOW)) == CF_SHUTW_NOW &&
channel_is_empty(s->rep)))
channel_is_empty(s->rep))) {
si_shutw(s->rep->cons);
if (tick_isset(s->fe->timeout.clientfin)) {
s->req->rto = s->fe->timeout.clientfin;
s->req->rex = tick_add(now_ms, s->req->rto);
}
}
/* shutdown(write) done on the client side, we must stop the server too */
if (unlikely((s->rep->flags & (CF_SHUTW|CF_SHUTR|CF_SHUTR_NOW)) == CF_SHUTW) &&
@ -2362,6 +2399,10 @@ struct task *process_session(struct task *t)
if (s->rep->prod->flags & SI_FL_NOHALF)
s->rep->prod->flags |= SI_FL_NOLINGER;
si_shutr(s->rep->prod);
if (tick_isset(s->be->timeout.serverfin)) {
s->req->wto = s->be->timeout.serverfin;
s->req->wex = tick_add(now_ms, s->req->wto);
}
}
if (s->req->prod->state == SI_ST_DIS || s->req->cons->state == SI_ST_DIS)