2007-10-29 01:09:36 +01:00
/*
* AF_INET / AF_INET6 SOCK_STREAM protocol layer ( tcp )
*
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
* Copyright 2000 - 2013 Willy Tarreau < w @ 1 wt . eu >
2007-10-29 01:09:36 +01:00
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*
*/
# include <ctype.h>
# include <errno.h>
# include <fcntl.h>
# include <stdio.h>
# include <stdlib.h>
# include <string.h>
# include <time.h>
# include <sys/param.h>
# include <sys/socket.h>
# include <sys/stat.h>
# include <sys/types.h>
# include <sys/un.h>
2009-08-24 15:11:06 +04:00
# include <netinet/tcp.h>
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
# include <common/cfgparse.h>
2007-10-29 01:09:36 +01:00
# include <common/compat.h>
# include <common/config.h>
# include <common/debug.h>
# include <common/errors.h>
# include <common/mini-clist.h>
# include <common/standard.h>
# include <types/global.h>
2014-06-13 16:18:52 +02:00
# include <types/capture.h>
2009-08-16 14:02:45 +02:00
# include <types/server.h>
2007-10-29 01:09:36 +01:00
# include <proto/acl.h>
2012-04-20 14:45:49 +02:00
# include <proto/arg.h>
2012-08-24 19:22:53 +02:00
# include <proto/channel.h>
2012-07-06 14:29:45 +02:00
# include <proto/connection.h>
2012-09-02 22:34:23 +02:00
# include <proto/fd.h>
2012-09-12 22:58:11 +02:00
# include <proto/listener.h>
2009-08-16 14:02:45 +02:00
# include <proto/log.h>
# include <proto/port_range.h>
2012-09-12 22:58:11 +02:00
# include <proto/protocol.h>
2007-10-29 01:09:36 +01:00
# include <proto/proto_tcp.h>
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
# include <proto/proxy.h>
2012-04-27 21:52:18 +02:00
# include <proto/sample.h>
2010-06-14 21:04:55 +02:00
# include <proto/session.h>
2010-06-05 19:13:27 +02:00
# include <proto/stick_table.h>
2013-09-11 23:20:29 +02:00
# include <proto/stream_interface.h>
2010-06-05 19:13:27 +02:00
# include <proto/task.h>
2007-10-29 01:09:36 +01:00
2008-01-13 18:40:14 +01:00
# ifdef CONFIG_HAP_CTTPROXY
# include <import/ip_tproxy.h>
# endif
2010-10-22 16:06:11 +02:00
static int tcp_bind_listeners ( struct protocol * proto , char * errmsg , int errlen ) ;
static int tcp_bind_listener ( struct listener * listener , char * errmsg , int errlen ) ;
2007-10-29 01:09:36 +01:00
/* Note: must not be declared <const> as its list will be overwritten */
static struct protocol proto_tcpv4 = {
. name = " tcpv4 " ,
. sock_domain = AF_INET ,
. sock_type = SOCK_STREAM ,
. sock_prot = IPPROTO_TCP ,
. sock_family = AF_INET ,
. sock_addrlen = sizeof ( struct sockaddr_in ) ,
. l3_addrlen = 32 / 8 ,
2012-05-07 21:22:09 +02:00
. accept = & listener_accept ,
2012-05-07 18:12:14 +02:00
. connect = tcp_connect_server ,
2010-10-22 16:06:11 +02:00
. bind = tcp_bind_listener ,
2007-10-29 01:09:36 +01:00
. bind_all = tcp_bind_listeners ,
. unbind_all = unbind_all_listeners ,
. enable_all = enable_all_listeners ,
2012-05-11 16:16:40 +02:00
. get_src = tcp_get_src ,
. get_dst = tcp_get_dst ,
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
. drain = tcp_drain ,
2014-07-07 20:22:12 +02:00
. pause = tcp_pause_listener ,
2007-10-29 01:09:36 +01:00
. listeners = LIST_HEAD_INIT ( proto_tcpv4 . listeners ) ,
. nb_listeners = 0 ,
} ;
/* Note: must not be declared <const> as its list will be overwritten */
static struct protocol proto_tcpv6 = {
. name = " tcpv6 " ,
. sock_domain = AF_INET6 ,
. sock_type = SOCK_STREAM ,
. sock_prot = IPPROTO_TCP ,
. sock_family = AF_INET6 ,
. sock_addrlen = sizeof ( struct sockaddr_in6 ) ,
. l3_addrlen = 128 / 8 ,
2012-05-07 21:22:09 +02:00
. accept = & listener_accept ,
2012-05-07 18:12:14 +02:00
. connect = tcp_connect_server ,
2010-10-22 16:06:11 +02:00
. bind = tcp_bind_listener ,
2007-10-29 01:09:36 +01:00
. bind_all = tcp_bind_listeners ,
. unbind_all = unbind_all_listeners ,
. enable_all = enable_all_listeners ,
2012-05-11 16:16:40 +02:00
. get_src = tcp_get_src ,
. get_dst = tcp_get_dst ,
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
. drain = tcp_drain ,
2014-07-07 20:22:12 +02:00
. pause = tcp_pause_listener ,
2007-10-29 01:09:36 +01:00
. listeners = LIST_HEAD_INIT ( proto_tcpv6 . listeners ) ,
. nb_listeners = 0 ,
} ;
2011-03-10 22:26:24 +01:00
/* Binds ipv4/ipv6 address <local> to socket <fd>, unless <flags> is set, in which
2008-01-13 18:40:14 +01:00
* case we try to bind < remote > . < flags > is a 2 - bit field consisting of :
* - 0 : ignore remote address ( may even be a NULL pointer )
* - 1 : use provided address
* - 2 : use provided port
* - 3 : use both
*
* The function supports multiple foreign binding methods :
* - linux_tproxy : we directly bind to the foreign address
* - cttproxy : we bind to a local address then nat .
* The second one can be used as a fallback for the first one .
* This function returns 0 when everything ' s OK , 1 if it could not bind , to the
* local address , 2 if it could not bind to the foreign address .
*/
2011-03-10 22:26:24 +01:00
int tcp_bind_socket ( int fd , int flags , struct sockaddr_storage * local , struct sockaddr_storage * remote )
2008-01-13 18:40:14 +01:00
{
2011-03-10 22:26:24 +01:00
struct sockaddr_storage bind_addr ;
2008-01-13 18:40:14 +01:00
int foreign_ok = 0 ;
int ret ;
static int ip_transp_working = 1 ;
2012-07-13 14:34:59 +02:00
static int ip6_transp_working = 1 ;
2013-05-08 22:49:23 +02:00
2012-07-13 14:34:59 +02:00
switch ( local - > ss_family ) {
case AF_INET :
if ( flags & & ip_transp_working ) {
2013-05-08 22:49:23 +02:00
/* This deserves some explanation. Some platforms will support
* multiple combinations of certain methods , so we try the
* supported ones until one succeeds .
*/
if ( 0
# if defined(IP_TRANSPARENT)
| | ( setsockopt ( fd , SOL_IP , IP_TRANSPARENT , & one , sizeof ( one ) ) = = 0 )
# endif
# if defined(IP_FREEBIND)
| | ( setsockopt ( fd , SOL_IP , IP_FREEBIND , & one , sizeof ( one ) ) = = 0 )
2013-05-08 23:22:39 +02:00
# endif
# if defined(IP_BINDANY)
| | ( setsockopt ( fd , IPPROTO_IP , IP_BINDANY , & one , sizeof ( one ) ) = = 0 )
2013-05-08 23:30:23 +02:00
# endif
# if defined(SO_BINDANY)
| | ( setsockopt ( fd , SOL_SOCKET , SO_BINDANY , & one , sizeof ( one ) ) = = 0 )
2013-05-08 22:49:23 +02:00
# endif
)
2012-07-13 14:34:59 +02:00
foreign_ok = 1 ;
else
ip_transp_working = 0 ;
}
break ;
case AF_INET6 :
if ( flags & & ip6_transp_working ) {
2013-05-08 22:49:23 +02:00
if ( 0
# if defined(IPV6_TRANSPARENT)
| | ( setsockopt ( fd , SOL_IPV6 , IPV6_TRANSPARENT , & one , sizeof ( one ) ) = = 0 )
2013-05-08 23:22:39 +02:00
# endif
2014-03-03 21:10:51 +01:00
# if defined(IP_FREEBIND)
| | ( setsockopt ( fd , SOL_IP , IP_FREEBIND , & one , sizeof ( one ) ) = = 0 )
# endif
2013-05-08 23:22:39 +02:00
# if defined(IPV6_BINDANY)
| | ( setsockopt ( fd , IPPROTO_IPV6 , IPV6_BINDANY , & one , sizeof ( one ) ) = = 0 )
2013-05-08 23:30:23 +02:00
# endif
# if defined(SO_BINDANY)
| | ( setsockopt ( fd , SOL_SOCKET , SO_BINDANY , & one , sizeof ( one ) ) = = 0 )
2013-05-08 22:49:23 +02:00
# endif
)
2012-07-13 14:34:59 +02:00
foreign_ok = 1 ;
else
ip6_transp_working = 0 ;
}
break ;
2008-01-13 18:40:14 +01:00
}
2013-05-08 22:49:23 +02:00
2008-01-13 18:40:14 +01:00
if ( flags ) {
memset ( & bind_addr , 0 , sizeof ( bind_addr ) ) ;
2011-04-19 07:20:57 +02:00
bind_addr . ss_family = remote - > ss_family ;
2011-03-10 22:26:24 +01:00
switch ( remote - > ss_family ) {
case AF_INET :
if ( flags & 1 )
( ( struct sockaddr_in * ) & bind_addr ) - > sin_addr = ( ( struct sockaddr_in * ) remote ) - > sin_addr ;
if ( flags & 2 )
( ( struct sockaddr_in * ) & bind_addr ) - > sin_port = ( ( struct sockaddr_in * ) remote ) - > sin_port ;
break ;
case AF_INET6 :
if ( flags & 1 )
( ( struct sockaddr_in6 * ) & bind_addr ) - > sin6_addr = ( ( struct sockaddr_in6 * ) remote ) - > sin6_addr ;
if ( flags & 2 )
( ( struct sockaddr_in6 * ) & bind_addr ) - > sin6_port = ( ( struct sockaddr_in6 * ) remote ) - > sin6_port ;
break ;
2011-12-16 21:25:11 +01:00
default :
/* we don't want to try to bind to an unknown address family */
foreign_ok = 0 ;
2011-03-10 22:26:24 +01:00
}
2008-01-13 18:40:14 +01:00
}
2011-06-24 15:11:37 +09:00
setsockopt ( fd , SOL_SOCKET , SO_REUSEADDR , & one , sizeof ( one ) ) ;
2008-01-13 18:40:14 +01:00
if ( foreign_ok ) {
2014-05-09 22:56:10 +02:00
if ( is_inet_addr ( & bind_addr ) ) {
2012-10-26 19:57:58 +02:00
ret = bind ( fd , ( struct sockaddr * ) & bind_addr , get_addr_len ( & bind_addr ) ) ;
if ( ret < 0 )
return 2 ;
}
2008-01-13 18:40:14 +01:00
}
else {
2014-05-09 22:56:10 +02:00
if ( is_inet_addr ( local ) ) {
2012-10-26 19:57:58 +02:00
ret = bind ( fd , ( struct sockaddr * ) local , get_addr_len ( local ) ) ;
if ( ret < 0 )
return 1 ;
}
2008-01-13 18:40:14 +01:00
}
if ( ! flags )
return 0 ;
# ifdef CONFIG_HAP_CTTPROXY
2011-03-20 14:03:54 +01:00
if ( ! foreign_ok & & remote - > ss_family = = AF_INET ) {
2008-01-13 18:40:14 +01:00
struct in_tproxy itp1 , itp2 ;
memset ( & itp1 , 0 , sizeof ( itp1 ) ) ;
itp1 . op = TPROXY_ASSIGN ;
2011-03-20 14:03:54 +01:00
itp1 . v . addr . faddr = ( ( struct sockaddr_in * ) & bind_addr ) - > sin_addr ;
itp1 . v . addr . fport = ( ( struct sockaddr_in * ) & bind_addr ) - > sin_port ;
2008-01-13 18:40:14 +01:00
/* set connect flag on socket */
itp2 . op = TPROXY_FLAGS ;
itp2 . v . flags = ITP_CONNECT | ITP_ONCE ;
if ( setsockopt ( fd , SOL_IP , IP_TPROXY , & itp1 , sizeof ( itp1 ) ) ! = - 1 & &
setsockopt ( fd , SOL_IP , IP_TPROXY , & itp2 , sizeof ( itp2 ) ) ! = - 1 ) {
foreign_ok = 1 ;
}
}
# endif
if ( ! foreign_ok )
/* we could not bind to a foreign address */
return 2 ;
return 0 ;
}
2009-08-16 14:02:45 +02:00
/*
2012-08-30 22:23:13 +02:00
* This function initiates a TCP connection establishment to the target assigned
* to connection < conn > using ( si - > { target , addr . to } ) . A source address may be
* pointed to by conn - > addr . from in case of transparent proxying . Normal source
* bind addresses are still determined locally ( due to the possible need of a
* source port ) . conn - > target may point either to a valid server or to a backend ,
2012-11-12 00:42:33 +01:00
* depending on conn - > target . Only OBJ_TYPE_PROXY and OBJ_TYPE_SERVER are
2012-11-24 10:24:27 +01:00
* supported . The < data > parameter is a boolean indicating whether there are data
* waiting for being sent or not , in order to adjust data write polling and on
* some platforms , the ability to avoid an empty initial ACK . The < delack > argument
* allows the caller to force using a delayed ACK when establishing the connection :
* - 0 = no delayed ACK unless data are advertised and backend has tcp - smart - connect
* - 1 = delayed ACK if backend has tcp - smart - connect , regardless of data
* - 2 = delayed ACK regardless of backend options
2010-03-29 19:36:59 +02:00
*
2013-10-24 21:45:00 +02:00
* Note that a pending send_proxy message accounts for data .
*
2009-08-16 14:02:45 +02:00
* It can return one of :
* - SN_ERR_NONE if everything ' s OK
* - SN_ERR_SRVTO if there are no more servers
* - SN_ERR_SRVCL if the connection was refused by the server
* - SN_ERR_PRXCOND if the connection has been limited by the proxy ( maxconn )
* - SN_ERR_RESOURCE if a system resource is lacking ( eg : fd limits , ports , . . . )
* - SN_ERR_INTERNAL for any other purely internal errors
* Additionnally , in the case of SN_ERR_RESOURCE , an emergency log will be emitted .
2012-11-23 08:51:32 +01:00
*
* The connection ' s fd is inserted only when SN_ERR_NONE is returned , otherwise
* it ' s invalid and the caller has nothing to do .
2009-08-16 14:02:45 +02:00
*/
2011-03-03 18:27:32 +01:00
2012-11-24 10:24:27 +01:00
int tcp_connect_server ( struct connection * conn , int data , int delack )
2009-08-16 14:02:45 +02:00
{
int fd ;
2011-03-04 22:04:29 +01:00
struct server * srv ;
struct proxy * be ;
2012-12-08 22:49:11 +01:00
struct conn_src * src ;
2011-03-04 22:04:29 +01:00
2014-01-24 16:08:19 +01:00
conn - > flags = CO_FL_WAIT_L4_CONN ; /* connection in progress */
2012-11-12 00:42:33 +01:00
switch ( obj_type ( conn - > target ) ) {
case OBJ_TYPE_PROXY :
be = objt_proxy ( conn - > target ) ;
2011-03-04 22:04:29 +01:00
srv = NULL ;
break ;
2012-11-12 00:42:33 +01:00
case OBJ_TYPE_SERVER :
srv = objt_server ( conn - > target ) ;
2011-03-04 22:04:29 +01:00
be = srv - > proxy ;
break ;
default :
2014-01-24 16:08:19 +01:00
conn - > flags | = CO_FL_ERROR ;
2011-03-04 22:04:29 +01:00
return SN_ERR_INTERNAL ;
}
2009-08-16 14:02:45 +02:00
2012-08-30 21:11:38 +02:00
if ( ( fd = conn - > t . sock . fd = socket ( conn - > addr . to . ss_family , SOCK_STREAM , IPPROTO_TCP ) ) = = - 1 ) {
2009-08-16 14:02:45 +02:00
qfprintf ( stderr , " Cannot get a server socket. \n " ) ;
2014-01-24 16:08:19 +01:00
if ( errno = = ENFILE ) {
conn - > err_code = CO_ER_SYS_FDLIM ;
2009-08-16 14:02:45 +02:00
send_log ( be , LOG_EMERG ,
" Proxy %s reached system FD limit at %d. Please check system tunables. \n " ,
be - > id , maxfd ) ;
2014-01-24 16:08:19 +01:00
}
else if ( errno = = EMFILE ) {
conn - > err_code = CO_ER_PROC_FDLIM ;
2009-08-16 14:02:45 +02:00
send_log ( be , LOG_EMERG ,
" Proxy %s reached process FD limit at %d. Please check 'ulimit-n' and restart. \n " ,
be - > id , maxfd ) ;
2014-01-24 16:08:19 +01:00
}
else if ( errno = = ENOBUFS | | errno = = ENOMEM ) {
conn - > err_code = CO_ER_SYS_MEMLIM ;
2009-08-16 14:02:45 +02:00
send_log ( be , LOG_EMERG ,
" Proxy %s reached system memory limit at %d sockets. Please check system tunables. \n " ,
be - > id , maxfd ) ;
2014-01-24 16:08:19 +01:00
}
else if ( errno = = EAFNOSUPPORT | | errno = = EPROTONOSUPPORT ) {
conn - > err_code = CO_ER_NOPROTO ;
}
else
conn - > err_code = CO_ER_SOCK_ERR ;
2009-08-16 14:02:45 +02:00
/* this is a resource error */
2014-01-24 16:08:19 +01:00
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_RESOURCE ;
}
if ( fd > = global . maxsock ) {
/* do not log anything there, it's a normal condition when this option
* is used to serialize connections to a server !
*/
Alert ( " socket(): not enough free sockets. Raise -n argument. Giving up. \n " ) ;
close ( fd ) ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_CONF_FDLIM ;
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_PRXCOND ; /* it is a configuration limit */
}
if ( ( fcntl ( fd , F_SETFL , O_NONBLOCK ) = = - 1 ) | |
2011-06-24 15:11:37 +09:00
( setsockopt ( fd , IPPROTO_TCP , TCP_NODELAY , & one , sizeof ( one ) ) = = - 1 ) ) {
2009-08-16 14:02:45 +02:00
qfprintf ( stderr , " Cannot set client socket to non blocking mode. \n " ) ;
close ( fd ) ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_SOCK_ERR ;
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_INTERNAL ;
}
if ( be - > options & PR_O_TCP_SRV_KA )
2011-06-24 15:11:37 +09:00
setsockopt ( fd , SOL_SOCKET , SO_KEEPALIVE , & one , sizeof ( one ) ) ;
2009-08-16 14:02:45 +02:00
/* allow specific binding :
* - server - specific at first
* - proxy - specific next
*/
2012-12-08 22:49:11 +01:00
if ( srv & & srv - > conn_src . opts & CO_SRC_BIND )
src = & srv - > conn_src ;
else if ( be - > conn_src . opts & CO_SRC_BIND )
src = & be - > conn_src ;
else
src = NULL ;
if ( src ) {
2009-08-16 14:02:45 +02:00
int ret , flags = 0 ;
2014-05-09 22:56:10 +02:00
if ( is_inet_addr ( & conn - > addr . from ) ) {
2012-12-08 22:49:11 +01:00
switch ( src - > opts & CO_SRC_TPROXY_MASK ) {
2012-12-08 22:29:20 +01:00
case CO_SRC_TPROXY_ADDR :
case CO_SRC_TPROXY_CLI :
2012-10-26 19:57:58 +02:00
flags = 3 ;
break ;
2012-12-08 22:29:20 +01:00
case CO_SRC_TPROXY_CIP :
case CO_SRC_TPROXY_DYN :
2012-10-26 19:57:58 +02:00
flags = 1 ;
break ;
}
2009-08-16 14:02:45 +02:00
}
2010-03-29 19:36:59 +02:00
2009-08-16 14:02:45 +02:00
# ifdef SO_BINDTODEVICE
/* Note: this might fail if not CAP_NET_RAW */
2012-12-08 22:49:11 +01:00
if ( src - > iface_name )
setsockopt ( fd , SOL_SOCKET , SO_BINDTODEVICE , src - > iface_name , src - > iface_len + 1 ) ;
2009-08-16 14:02:45 +02:00
# endif
2012-12-08 22:49:11 +01:00
if ( src - > sport_range ) {
2009-08-16 14:02:45 +02:00
int attempts = 10 ; /* should be more than enough to find a spare port */
2012-12-08 22:49:11 +01:00
struct sockaddr_storage sa ;
2009-08-16 14:02:45 +02:00
ret = 1 ;
2012-12-08 22:49:11 +01:00
sa = src - > source_addr ;
2009-08-16 14:02:45 +02:00
do {
/* note: in case of retry, we may have to release a previously
* allocated port , hence this loop ' s construct .
*/
2009-10-18 07:25:52 +02:00
port_range_release_port ( fdinfo [ fd ] . port_range , fdinfo [ fd ] . local_port ) ;
fdinfo [ fd ] . port_range = NULL ;
2009-08-16 14:02:45 +02:00
if ( ! attempts )
break ;
attempts - - ;
2012-12-08 22:49:11 +01:00
fdinfo [ fd ] . local_port = port_range_alloc_port ( src - > sport_range ) ;
2014-01-24 16:08:19 +01:00
if ( ! fdinfo [ fd ] . local_port ) {
conn - > err_code = CO_ER_PORT_RANGE ;
2009-08-16 14:02:45 +02:00
break ;
2014-01-24 16:08:19 +01:00
}
2009-08-16 14:02:45 +02:00
2012-12-08 22:49:11 +01:00
fdinfo [ fd ] . port_range = src - > sport_range ;
set_host_port ( & sa , fdinfo [ fd ] . local_port ) ;
2009-08-16 14:02:45 +02:00
2012-12-08 22:49:11 +01:00
ret = tcp_bind_socket ( fd , flags , & sa , & conn - > addr . from ) ;
2014-01-24 16:08:19 +01:00
if ( ret ! = 0 )
conn - > err_code = CO_ER_CANT_BIND ;
2009-08-16 14:02:45 +02:00
} while ( ret ! = 0 ) ; /* binding NOK */
}
else {
2012-12-08 22:49:11 +01:00
ret = tcp_bind_socket ( fd , flags , & src - > source_addr , & conn - > addr . from ) ;
2014-01-24 16:08:19 +01:00
if ( ret ! = 0 )
conn - > err_code = CO_ER_CANT_BIND ;
2009-08-16 14:02:45 +02:00
}
2012-12-08 22:49:11 +01:00
if ( unlikely ( ret ! = 0 ) ) {
2009-10-18 07:25:52 +02:00
port_range_release_port ( fdinfo [ fd ] . port_range , fdinfo [ fd ] . local_port ) ;
fdinfo [ fd ] . port_range = NULL ;
2009-08-16 14:02:45 +02:00
close ( fd ) ;
if ( ret = = 1 ) {
2012-12-08 22:49:11 +01:00
Alert ( " Cannot bind to source address before connect() for backend %s. Aborting. \n " ,
2009-08-16 14:02:45 +02:00
be - > id ) ;
send_log ( be , LOG_EMERG ,
2012-12-08 22:49:11 +01:00
" Cannot bind to source address before connect() for backend %s. \n " ,
2009-08-16 14:02:45 +02:00
be - > id ) ;
} else {
2012-12-08 22:49:11 +01:00
Alert ( " Cannot bind to tproxy source address before connect() for backend %s. Aborting. \n " ,
2009-08-16 14:02:45 +02:00
be - > id ) ;
send_log ( be , LOG_EMERG ,
2012-12-08 22:49:11 +01:00
" Cannot bind to tproxy source address before connect() for backend %s. \n " ,
2009-08-16 14:02:45 +02:00
be - > id ) ;
}
2014-01-24 16:08:19 +01:00
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_RESOURCE ;
}
}
2013-10-24 21:45:00 +02:00
/* if a send_proxy is there, there are data */
data | = conn - > send_proxy_ofs ;
2009-08-24 15:11:06 +04:00
# if defined(TCP_QUICKACK)
2009-08-16 14:02:45 +02:00
/* disabling tcp quick ack now allows the first request to leave the
* machine with the first ACK . We only do this if there are pending
2012-11-24 10:24:27 +01:00
* data in the buffer .
2009-08-16 14:02:45 +02:00
*/
2012-11-24 10:24:27 +01:00
if ( delack = = 2 | | ( ( delack | | data ) & & ( be - > options2 & PR_O2_SMARTCON ) ) )
2011-06-24 15:11:37 +09:00
setsockopt ( fd , IPPROTO_TCP , TCP_QUICKACK , & zero , sizeof ( zero ) ) ;
2009-08-16 14:02:45 +02:00
# endif
2010-01-21 17:43:04 +01:00
if ( global . tune . server_sndbuf )
setsockopt ( fd , SOL_SOCKET , SO_SNDBUF , & global . tune . server_sndbuf , sizeof ( global . tune . server_sndbuf ) ) ;
if ( global . tune . server_rcvbuf )
setsockopt ( fd , SOL_SOCKET , SO_RCVBUF , & global . tune . server_rcvbuf , sizeof ( global . tune . server_rcvbuf ) ) ;
2012-08-30 21:11:38 +02:00
if ( ( connect ( fd , ( struct sockaddr * ) & conn - > addr . to , get_addr_len ( & conn - > addr . to ) ) = = - 1 ) & &
2009-08-16 14:02:45 +02:00
( errno ! = EINPROGRESS ) & & ( errno ! = EALREADY ) & & ( errno ! = EISCONN ) ) {
2012-12-08 23:03:28 +01:00
if ( errno = = EAGAIN | | errno = = EADDRINUSE | | errno = = EADDRNOTAVAIL ) {
2009-08-16 14:02:45 +02:00
char * msg ;
2014-01-24 16:08:19 +01:00
if ( errno = = EAGAIN | | errno = = EADDRNOTAVAIL ) {
2009-08-16 14:02:45 +02:00
msg = " no free ports " ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_FREE_PORTS ;
}
else {
2009-08-16 14:02:45 +02:00
msg = " local address already in use " ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_ADDR_INUSE ;
}
2009-08-16 14:02:45 +02:00
2012-12-08 23:03:28 +01:00
qfprintf ( stderr , " Connect() failed for backend %s: %s. \n " , be - > id , msg ) ;
2009-10-18 07:25:52 +02:00
port_range_release_port ( fdinfo [ fd ] . port_range , fdinfo [ fd ] . local_port ) ;
fdinfo [ fd ] . port_range = NULL ;
2009-08-16 14:02:45 +02:00
close ( fd ) ;
2012-12-08 23:03:28 +01:00
send_log ( be , LOG_ERR , " Connect() failed for backend %s: %s. \n " , be - > id , msg ) ;
2014-01-24 16:08:19 +01:00
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_RESOURCE ;
} else if ( errno = = ETIMEDOUT ) {
//qfprintf(stderr,"Connect(): ETIMEDOUT");
2009-10-18 07:25:52 +02:00
port_range_release_port ( fdinfo [ fd ] . port_range , fdinfo [ fd ] . local_port ) ;
fdinfo [ fd ] . port_range = NULL ;
2009-08-16 14:02:45 +02:00
close ( fd ) ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_SOCK_ERR ;
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_SRVTO ;
} else {
// (errno == ECONNREFUSED || errno == ENETUNREACH || errno == EACCES || errno == EPERM)
//qfprintf(stderr,"Connect(): %d", errno);
2009-10-18 07:25:52 +02:00
port_range_release_port ( fdinfo [ fd ] . port_range , fdinfo [ fd ] . local_port ) ;
fdinfo [ fd ] . port_range = NULL ;
2009-08-16 14:02:45 +02:00
close ( fd ) ;
2014-01-24 16:08:19 +01:00
conn - > err_code = CO_ER_SOCK_ERR ;
conn - > flags | = CO_FL_ERROR ;
2009-08-16 14:02:45 +02:00
return SN_ERR_SRVCL ;
}
}
2012-12-08 18:53:44 +01:00
conn - > flags | = CO_FL_ADDR_TO_SET ;
2012-05-11 19:53:32 +02:00
2013-10-24 21:45:00 +02:00
/* Prepare to send a few handshakes related to the on-wire protocol. */
if ( conn - > send_proxy_ofs )
2013-10-24 22:01:26 +02:00
conn - > flags | = CO_FL_SEND_PROXY ;
2013-10-24 21:45:00 +02:00
MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :
- on accept() side, the fd is set first, then the ctrl layer then the
transport layer ; upon error, they must be undone in the reverse order,
then the FD must be closed. The FD must not be deleted if the control
layer was not yet initialized ;
- on the connect() side, the fd is set last and there is no reliable way
to know if it has been initialized or not. In practice it's initialized
to -1 first but this is hackish and supposes that local FDs only will
be used forever. Also, there are even less solutions for keeping trace
of the transport layer's state.
Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.
So the proposed solution is to add two flags to the connection :
- CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
and cleared after it's released (fd_delete).
- CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
and cleared after it's released (xprt->close).
The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.
The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.
In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.
Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.
conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.
In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-10-21 16:30:56 +02:00
conn_ctrl_init ( conn ) ; /* registers the FD */
2013-12-15 14:19:38 +01:00
fdtab [ fd ] . linger_risk = 1 ; /* close hard if needed */
2012-08-30 21:11:38 +02:00
conn_sock_want_send ( conn ) ; /* for connect status */
2012-08-31 13:54:11 +02:00
REORG: connection: rename the data layer the "transport layer"
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.
In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.
The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.
A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
2012-10-03 00:19:48 +02:00
if ( conn_xprt_init ( conn ) < 0 ) {
MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :
- on accept() side, the fd is set first, then the ctrl layer then the
transport layer ; upon error, they must be undone in the reverse order,
then the FD must be closed. The FD must not be deleted if the control
layer was not yet initialized ;
- on the connect() side, the fd is set last and there is no reliable way
to know if it has been initialized or not. In practice it's initialized
to -1 first but this is hackish and supposes that local FDs only will
be used forever. Also, there are even less solutions for keeping trace
of the transport layer's state.
Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.
So the proposed solution is to add two flags to the connection :
- CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
and cleared after it's released (fd_delete).
- CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
and cleared after it's released (xprt->close).
The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.
The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.
In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.
Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.
conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.
In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-10-21 16:30:56 +02:00
conn_force_close ( conn ) ;
2014-01-24 16:08:19 +01:00
conn - > flags | = CO_FL_ERROR ;
2012-08-31 13:54:11 +02:00
return SN_ERR_RESOURCE ;
2012-09-06 14:04:41 +02:00
}
2012-08-31 13:54:11 +02:00
2012-08-30 22:23:13 +02:00
if ( data )
2012-08-30 21:11:38 +02:00
conn_data_want_send ( conn ) ; /* prepare to send data if any */
2009-08-16 14:02:45 +02:00
return SN_ERR_NONE ; /* connection is OK */
}
2012-05-11 16:16:40 +02:00
/*
* Retrieves the source address for the socket < fd > , with < dir > indicating
* if we ' re a listener ( = 0 ) or an initiator ( ! = 0 ) . It returns 0 in case of
* success , - 1 in case of error . The socket ' s source address is stored in
* < sa > for < salen > bytes .
*/
int tcp_get_src ( int fd , struct sockaddr * sa , socklen_t salen , int dir )
{
if ( dir )
return getsockname ( fd , sa , & salen ) ;
else
return getpeername ( fd , sa , & salen ) ;
}
/*
* Retrieves the original destination address for the socket < fd > , with < dir >
* indicating if we ' re a listener ( = 0 ) or an initiator ( ! = 0 ) . In the case of a
* listener , if the original destination address was translated , the original
* address is retrieved . It returns 0 in case of success , - 1 in case of error .
* The socket ' s source address is stored in < sa > for < salen > bytes .
*/
int tcp_get_dst ( int fd , struct sockaddr * sa , socklen_t salen , int dir )
{
if ( dir )
return getpeername ( fd , sa , & salen ) ;
# if defined(TPROXY) && defined(SO_ORIGINAL_DST)
else if ( getsockopt ( fd , SOL_IP , SO_ORIGINAL_DST , sa , & salen ) = = 0 )
return 0 ;
# endif
else
return getsockname ( fd , sa , & salen ) ;
}
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
/* Tries to drain any pending incoming data from the socket to reach the
2014-01-20 11:26:12 +01:00
* receive shutdown . Returns positive if the shutdown was found , negative
* if EAGAIN was hit , otherwise zero . This is useful to decide whether we
* can close a connection cleanly are we must kill it hard .
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
*/
int tcp_drain ( int fd )
{
int turns = 2 ;
int len ;
while ( turns ) {
# ifdef MSG_TRUNC_CLEARS_INPUT
len = recv ( fd , NULL , INT_MAX , MSG_DONTWAIT | MSG_NOSIGNAL | MSG_TRUNC ) ;
if ( len = = - 1 & & errno = = EFAULT )
# endif
len = recv ( fd , trash . str , trash . size , MSG_DONTWAIT | MSG_NOSIGNAL ) ;
2014-01-20 11:56:37 +01:00
if ( len = = 0 ) {
/* cool, shutdown received */
fdtab [ fd ] . linger_risk = 0 ;
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
return 1 ;
2014-01-20 11:56:37 +01:00
}
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
if ( len < 0 ) {
MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.
This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.
Several new state manipulation functions have been introduced or
adapted :
- fd_want_{recv,send} : enable receiving/sending on the FD regardless
of its state (sets the ACTIVE flag) ;
- fd_stop_{recv,send} : stop receiving/sending on the FD regardless
of its state (clears the ACTIVE flag) ;
- fd_cant_{recv,send} : report a failure to receive/send on the FD
corresponding to EAGAIN (clears the READY flag) ;
- fd_may_{recv,send} : report the ability to receive/send on the FD
as reported by poll() (sets the READY flag) ;
Some functions are used to report the current FD status :
- fd_{recv,send}_active
- fd_{recv,send}_ready
- fd_{recv,send}_polled
Some functions were removed :
- fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()
The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.
In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.
The following pollers have been updated :
ev_select() : done, built, tested on Linux 3.10
ev_poll() : done, built, tested on Linux 3.10
ev_epoll() : done, built, tested on Linux 3.10 & 3.13
ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-10 16:58:45 +01:00
if ( errno = = EAGAIN ) {
/* connection not closed yet */
fd_cant_recv ( fd ) ;
2014-01-20 11:26:12 +01:00
return - 1 ;
MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.
This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.
Several new state manipulation functions have been introduced or
adapted :
- fd_want_{recv,send} : enable receiving/sending on the FD regardless
of its state (sets the ACTIVE flag) ;
- fd_stop_{recv,send} : stop receiving/sending on the FD regardless
of its state (clears the ACTIVE flag) ;
- fd_cant_{recv,send} : report a failure to receive/send on the FD
corresponding to EAGAIN (clears the READY flag) ;
- fd_may_{recv,send} : report the ability to receive/send on the FD
as reported by poll() (sets the READY flag) ;
Some functions are used to report the current FD status :
- fd_{recv,send}_active
- fd_{recv,send}_ready
- fd_{recv,send}_polled
Some functions were removed :
- fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()
The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.
In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.
The following pollers have been updated :
ev_select() : done, built, tested on Linux 3.10
ev_poll() : done, built, tested on Linux 3.10
ev_epoll() : done, built, tested on Linux 3.10 & 3.13
ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-10 16:58:45 +01:00
}
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
if ( errno = = EINTR ) /* oops, try again */
continue ;
/* other errors indicate a dead connection, fine. */
2014-01-20 11:56:37 +01:00
fdtab [ fd ] . linger_risk = 0 ;
MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 19:56:38 +02:00
return 1 ;
}
/* OK we read some data, let's try again once */
turns - - ;
}
/* some data are still present, give up */
return 0 ;
}
2012-05-11 19:53:32 +02:00
/* This is the callback which is set when a connection establishment is pending
2013-12-04 16:11:04 +01:00
* and we have nothing to send . It updates the FD polling status . It returns 0
* if it fails in a fatal way or needs to poll to go further , otherwise it
* returns non - zero and removes the CO_FL_WAIT_L4_CONN flag from the connection ' s
* flags . In case of error , it sets CO_FL_ERROR and leaves the error code in
* errno . The error checking is done in two passes in order to limit the number
* of syscalls in the normal case :
* - if POLL_ERR was reported by the poller , we check for a pending error on
* the socket before proceeding . If found , it ' s assigned to errno so that
* upper layers can see it .
* - otherwise connect ( ) is used to check the connection state again , since
* the getsockopt return cannot reliably be used to know if the connection
* is still pending or ready . This one may often return an error as well ,
* since we don ' t always have POLL_ERR ( eg : OSX or cached events ) .
2012-05-11 19:53:32 +02:00
*/
2012-07-23 18:53:03 +02:00
int tcp_connect_probe ( struct connection * conn )
2012-05-11 19:53:32 +02:00
{
2012-07-23 18:53:03 +02:00
int fd = conn - > t . sock . fd ;
2013-12-04 16:11:04 +01:00
socklen_t lskerr ;
int skerr ;
2012-05-11 19:53:32 +02:00
2012-07-06 14:54:49 +02:00
if ( conn - > flags & CO_FL_ERROR )
2012-08-09 14:45:22 +02:00
return 0 ;
2012-05-11 19:53:32 +02:00
2014-01-23 13:50:42 +01:00
if ( ! conn_ctrl_ready ( conn ) )
MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :
- on accept() side, the fd is set first, then the ctrl layer then the
transport layer ; upon error, they must be undone in the reverse order,
then the FD must be closed. The FD must not be deleted if the control
layer was not yet initialized ;
- on the connect() side, the fd is set last and there is no reliable way
to know if it has been initialized or not. In practice it's initialized
to -1 first but this is hackish and supposes that local FDs only will
be used forever. Also, there are even less solutions for keeping trace
of the transport layer's state.
Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.
So the proposed solution is to add two flags to the connection :
- CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
and cleared after it's released (fd_delete).
- CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
and cleared after it's released (xprt->close).
The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.
The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.
In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.
Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.
conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.
In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-10-21 16:30:56 +02:00
return 0 ;
2012-07-06 14:54:49 +02:00
if ( ! ( conn - > flags & CO_FL_WAIT_L4_CONN ) )
2012-07-23 20:05:00 +02:00
return 1 ; /* strange we were called while ready */
2012-05-11 19:53:32 +02:00
2014-01-20 15:13:07 +01:00
if ( ! fd_send_ready ( fd ) )
return 0 ;
2013-12-04 16:11:04 +01:00
/* we might be the first witness of FD_POLL_ERR. Note that FD_POLL_HUP
* without FD_POLL_IN also indicates a hangup without input data meaning
* there was no connection .
*/
if ( fdtab [ fd ] . ev & FD_POLL_ERR | |
( fdtab [ fd ] . ev & ( FD_POLL_IN | FD_POLL_HUP ) ) = = FD_POLL_HUP ) {
skerr = 0 ;
lskerr = sizeof ( skerr ) ;
getsockopt ( fd , SOL_SOCKET , SO_ERROR , & skerr , & lskerr ) ;
errno = skerr ;
if ( errno = = EAGAIN )
errno = 0 ;
if ( errno )
goto out_error ;
}
2012-07-23 15:07:23 +02:00
2013-12-04 16:11:04 +01:00
/* Use connect() to check the state of the socket. This has the
* advantage of giving us the following info :
2012-07-06 17:12:34 +02:00
* - error
* - connecting ( EALREADY , EINPROGRESS )
* - connected ( EISCONN , 0 )
2012-05-11 19:53:32 +02:00
*/
2012-08-30 21:11:38 +02:00
if ( connect ( fd , ( struct sockaddr * ) & conn - > addr . to , get_addr_len ( & conn - > addr . to ) ) < 0 ) {
2012-08-17 17:33:53 +02:00
if ( errno = = EALREADY | | errno = = EINPROGRESS ) {
2012-12-10 17:03:52 +01:00
__conn_sock_stop_recv ( conn ) ;
2014-01-22 20:02:06 +01:00
fd_cant_send ( fd ) ;
2012-07-23 20:05:00 +02:00
return 0 ;
2012-08-17 17:33:53 +02:00
}
2012-05-20 18:35:19 +02:00
2012-07-06 17:12:34 +02:00
if ( errno & & errno ! = EISCONN )
2012-05-20 18:35:19 +02:00
goto out_error ;
2012-07-06 17:12:34 +02:00
/* otherwise we're connected */
2012-05-20 18:35:19 +02:00
}
2012-05-11 19:53:32 +02:00
2012-07-06 16:02:29 +02:00
/* The FD is ready now, we'll mark the connection as complete and
REORG: connection: rename the data layer the "transport layer"
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.
In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.
The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.
A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
2012-10-03 00:19:48 +02:00
* forward the event to the transport layer which will notify the
* data layer .
2012-05-20 18:35:19 +02:00
*/
2012-07-06 14:54:49 +02:00
conn - > flags & = ~ CO_FL_WAIT_L4_CONN ;
2012-07-23 20:05:00 +02:00
return 1 ;
2012-05-11 19:53:32 +02:00
out_error :
2012-07-23 20:05:00 +02:00
/* Write error on the file descriptor. Report it to the connection
* and disable polling on this FD .
2012-05-11 19:53:32 +02:00
*/
2014-01-20 11:56:37 +01:00
fdtab [ fd ] . linger_risk = 0 ;
2013-12-04 23:44:10 +01:00
conn - > flags | = CO_FL_ERROR | CO_FL_SOCK_RD_SH | CO_FL_SOCK_WR_SH ;
2012-12-10 17:03:52 +01:00
__conn_sock_stop_both ( conn ) ;
2012-08-09 14:45:22 +02:00
return 0 ;
2012-05-11 19:53:32 +02:00
}
2007-10-29 01:09:36 +01:00
/* This function tries to bind a TCPv4/v6 listener. It may return a warning or
2013-01-24 01:41:38 +01:00
* an error message in < errmsg > if the message is at most < errlen > bytes long
* ( including ' \0 ' ) . Note that < errmsg > may be NULL if < errlen > is also zero .
* The return value is composed from ERR_ABORT , ERR_WARN ,
2007-10-29 01:09:36 +01:00
* ERR_ALERT , ERR_RETRYABLE and ERR_FATAL . ERR_NONE indicates that everything
* was alright and that no message was returned . ERR_RETRYABLE means that an
* error occurred but that it may vanish after a retry ( eg : port in use ) , and
2012-04-06 17:39:26 -07:00
* ERR_FATAL indicates a non - fixable error . ERR_WARN and ERR_ALERT do not alter
2007-10-29 01:09:36 +01:00
* the meaning of the error , but just indicate that a message is present which
* should be displayed with the respective level . Last , ERR_ABORT indicates
* that it ' s pointless to try to start other listeners . No error message is
* returned if errlen is NULL .
*/
int tcp_bind_listener ( struct listener * listener , char * errmsg , int errlen )
{
__label__ tcp_return , tcp_close_return ;
int fd , err ;
2013-03-10 23:51:38 +01:00
int ext , ready ;
socklen_t ready_len ;
2007-10-29 01:09:36 +01:00
const char * msg = NULL ;
/* ensure we never return garbage */
2013-01-24 01:41:38 +01:00
if ( errlen )
2007-10-29 01:09:36 +01:00
* errmsg = 0 ;
if ( listener - > state ! = LI_ASSIGNED )
return ERR_NONE ; /* already bound */
err = ERR_NONE ;
2013-03-10 23:51:38 +01:00
/* if the listener already has an fd assigned, then we were offered the
* fd by an external process ( most likely the parent ) , and we don ' t want
* to create a new socket . However we still want to set a few flags on
* the socket .
*/
fd = listener - > fd ;
ext = ( fd > = 0 ) ;
if ( ! ext & & ( fd = socket ( listener - > addr . ss_family , SOCK_STREAM , IPPROTO_TCP ) ) = = - 1 ) {
2007-10-29 01:09:36 +01:00
err | = ERR_RETRYABLE | ERR_ALERT ;
msg = " cannot create listening socket " ;
goto tcp_return ;
}
2008-11-30 23:15:34 +01:00
2007-10-29 01:09:36 +01:00
if ( fd > = global . maxsock ) {
err | = ERR_FATAL | ERR_ABORT | ERR_ALERT ;
msg = " not enough free sockets (raise '-n' parameter) " ;
goto tcp_close_return ;
}
2009-06-14 15:24:37 +02:00
if ( fcntl ( fd , F_SETFL , O_NONBLOCK ) = = - 1 ) {
2007-10-29 01:09:36 +01:00
err | = ERR_FATAL | ERR_ALERT ;
msg = " cannot make socket non-blocking " ;
goto tcp_close_return ;
}
2013-03-10 23:51:38 +01:00
if ( ! ext & & setsockopt ( fd , SOL_SOCKET , SO_REUSEADDR , & one , sizeof ( one ) ) = = - 1 ) {
2007-10-29 01:09:36 +01:00
/* not fatal but should be reported */
msg = " cannot do so_reuseaddr " ;
err | = ERR_ALERT ;
}
if ( listener - > options & LI_O_NOLINGER )
2011-06-24 15:11:37 +09:00
setsockopt ( fd , SOL_SOCKET , SO_LINGER , & nolinger , sizeof ( struct linger ) ) ;
2008-11-30 23:15:34 +01:00
2007-10-29 01:09:36 +01:00
# ifdef SO_REUSEPORT
/* OpenBSD supports this. As it's present in old libc versions of Linux,
* it might return an error that we will silently ignore .
*/
2013-03-10 23:51:38 +01:00
if ( ! ext )
setsockopt ( fd , SOL_SOCKET , SO_REUSEPORT , & one , sizeof ( one ) ) ;
2008-01-13 14:49:51 +01:00
# endif
2013-05-08 22:49:23 +02:00
2013-03-10 23:51:38 +01:00
if ( ! ext & & ( listener - > options & LI_O_FOREIGN ) ) {
2012-07-13 14:34:59 +02:00
switch ( listener - > addr . ss_family ) {
case AF_INET :
2013-05-08 22:49:23 +02:00
if ( 1
# if defined(IP_TRANSPARENT)
& & ( setsockopt ( fd , SOL_IP , IP_TRANSPARENT , & one , sizeof ( one ) ) = = - 1 )
# endif
# if defined(IP_FREEBIND)
& & ( setsockopt ( fd , SOL_IP , IP_FREEBIND , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 23:22:39 +02:00
# endif
# if defined(IP_BINDANY)
& & ( setsockopt ( fd , IPPROTO_IP , IP_BINDANY , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 23:30:23 +02:00
# endif
# if defined(SO_BINDANY)
& & ( setsockopt ( fd , SOL_SOCKET , SO_BINDANY , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 22:49:23 +02:00
# endif
) {
2012-07-13 14:34:59 +02:00
msg = " cannot make listening socket transparent " ;
err | = ERR_ALERT ;
}
break ;
case AF_INET6 :
2013-05-08 22:49:23 +02:00
if ( 1
# if defined(IPV6_TRANSPARENT)
& & ( setsockopt ( fd , SOL_IPV6 , IPV6_TRANSPARENT , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 23:22:39 +02:00
# endif
2014-03-03 21:10:51 +01:00
# if defined(IP_FREEBIND)
& & ( setsockopt ( fd , SOL_IP , IP_FREEBIND , & one , sizeof ( one ) ) = = - 1 )
# endif
2013-05-08 23:22:39 +02:00
# if defined(IPV6_BINDANY)
& & ( setsockopt ( fd , IPPROTO_IPV6 , IPV6_BINDANY , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 23:30:23 +02:00
# endif
# if defined(SO_BINDANY)
& & ( setsockopt ( fd , SOL_SOCKET , SO_BINDANY , & one , sizeof ( one ) ) = = - 1 )
2013-05-08 22:49:23 +02:00
# endif
) {
2012-07-13 14:34:59 +02:00
msg = " cannot make listening socket transparent " ;
err | = ERR_ALERT ;
}
break ;
}
2008-01-13 14:49:51 +01:00
}
2013-05-08 22:49:23 +02:00
2009-02-04 17:19:29 +01:00
# ifdef SO_BINDTODEVICE
/* Note: this might fail if not CAP_NET_RAW */
2013-03-10 23:51:38 +01:00
if ( ! ext & & listener - > interface ) {
2009-02-04 17:19:29 +01:00
if ( setsockopt ( fd , SOL_SOCKET , SO_BINDTODEVICE ,
2009-03-06 00:48:23 +01:00
listener - > interface , strlen ( listener - > interface ) + 1 ) = = - 1 ) {
2009-02-04 17:19:29 +01:00
msg = " cannot bind listener to device " ;
err | = ERR_WARN ;
}
}
2009-06-14 18:48:19 +02:00
# endif
2009-08-24 15:11:06 +04:00
# if defined(TCP_MAXSEG)
2010-12-24 15:26:39 +01:00
if ( listener - > maxseg > 0 ) {
2009-08-24 15:11:06 +04:00
if ( setsockopt ( fd , IPPROTO_TCP , TCP_MAXSEG ,
2009-06-14 18:48:19 +02:00
& listener - > maxseg , sizeof ( listener - > maxseg ) ) = = - 1 ) {
msg = " cannot set MSS " ;
err | = ERR_WARN ;
}
}
2009-10-13 07:34:14 +02:00
# endif
# if defined(TCP_DEFER_ACCEPT)
if ( listener - > options & LI_O_DEF_ACCEPT ) {
/* defer accept by up to one second */
int accept_delay = 1 ;
if ( setsockopt ( fd , IPPROTO_TCP , TCP_DEFER_ACCEPT , & accept_delay , sizeof ( accept_delay ) ) = = - 1 ) {
msg = " cannot enable DEFER_ACCEPT " ;
err | = ERR_WARN ;
}
}
2012-10-05 16:21:00 +02:00
# endif
# if defined(TCP_FASTOPEN)
if ( listener - > options & LI_O_TCP_FO ) {
/* TFO needs a queue length, let's use the configured backlog */
int qlen = listener - > backlog ? listener - > backlog : listener - > maxconn ;
if ( setsockopt ( fd , IPPROTO_TCP , TCP_FASTOPEN , & qlen , sizeof ( qlen ) ) = = - 1 ) {
msg = " cannot enable TCP_FASTOPEN " ;
err | = ERR_WARN ;
}
}
2007-10-29 01:09:36 +01:00
# endif
2012-11-24 11:55:28 +01:00
# if defined(IPV6_V6ONLY)
if ( listener - > options & LI_O_V6ONLY )
setsockopt ( fd , IPPROTO_IPV6 , IPV6_V6ONLY , & one , sizeof ( one ) ) ;
2012-11-24 15:07:23 +01:00
else if ( listener - > options & LI_O_V4V6 )
setsockopt ( fd , IPPROTO_IPV6 , IPV6_V6ONLY , & zero , sizeof ( zero ) ) ;
2012-11-24 11:55:28 +01:00
# endif
2013-03-10 23:51:38 +01:00
if ( ! ext & & bind ( fd , ( struct sockaddr * ) & listener - > addr , listener - > proto - > sock_addrlen ) = = - 1 ) {
2007-10-29 01:09:36 +01:00
err | = ERR_RETRYABLE | ERR_ALERT ;
msg = " cannot bind socket " ;
goto tcp_close_return ;
}
2008-11-30 23:15:34 +01:00
2013-03-10 23:51:38 +01:00
ready = 0 ;
ready_len = sizeof ( ready ) ;
if ( getsockopt ( fd , SOL_SOCKET , SO_ACCEPTCONN , & ready , & ready_len ) = = - 1 )
ready = 0 ;
if ( ! ( ext & & ready ) & & /* only listen if not already done by external process */
listen ( fd , listener - > backlog ? listener - > backlog : listener - > maxconn ) = = - 1 ) {
2007-10-29 01:09:36 +01:00
err | = ERR_RETRYABLE | ERR_ALERT ;
msg = " cannot listen to socket " ;
goto tcp_close_return ;
}
2008-11-30 23:15:34 +01:00
2009-08-24 15:11:06 +04:00
# if defined(TCP_QUICKACK)
2009-06-14 12:07:01 +02:00
if ( listener - > options & LI_O_NOQUICKACK )
2011-06-24 15:11:37 +09:00
setsockopt ( fd , IPPROTO_TCP , TCP_QUICKACK , & zero , sizeof ( zero ) ) ;
2009-06-14 12:07:01 +02:00
# endif
2007-10-29 01:09:36 +01:00
/* the socket is ready */
listener - > fd = fd ;
listener - > state = LI_LISTEN ;
2008-08-29 23:36:51 +02:00
fdtab [ fd ] . owner = listener ; /* reference the listener instead of a task */
2012-07-06 12:25:58 +02:00
fdtab [ fd ] . iocb = listener - > proto - > accept ;
2010-05-28 18:46:57 +02:00
fd_insert ( fd ) ;
2007-10-29 01:09:36 +01:00
tcp_return :
2010-11-01 19:26:01 +01:00
if ( msg & & errlen ) {
char pn [ INET6_ADDRSTRLEN ] ;
2011-09-05 00:36:48 +02:00
addr_to_str ( & listener - > addr , pn , sizeof ( pn ) ) ;
snprintf ( errmsg , errlen , " %s [%s:%d] " , msg , pn , get_host_port ( & listener - > addr ) ) ;
2010-11-01 19:26:01 +01:00
}
2007-10-29 01:09:36 +01:00
return err ;
tcp_close_return :
close ( fd ) ;
goto tcp_return ;
}
/* This function creates all TCP sockets bound to the protocol entry <proto>.
* It is intended to be used as the protocol ' s bind_all ( ) function .
* The sockets will be registered but not added to any fd_set , in order not to
* loose them across the fork ( ) . A call to enable_all_listeners ( ) is needed
* to complete initialization . The return value is composed from ERR_ * .
*/
2010-10-22 16:06:11 +02:00
static int tcp_bind_listeners ( struct protocol * proto , char * errmsg , int errlen )
2007-10-29 01:09:36 +01:00
{
struct listener * listener ;
int err = ERR_NONE ;
list_for_each_entry ( listener , & proto - > listeners , proto_list ) {
2010-10-22 16:06:11 +02:00
err | = tcp_bind_listener ( listener , errmsg , errlen ) ;
if ( err & ERR_ABORT )
2007-10-29 01:09:36 +01:00
break ;
}
return err ;
}
/* Add listener to the list of tcpv4 listeners. The listener's state
* is automatically updated from LI_INIT to LI_ASSIGNED . The number of
* listeners is updated . This is the function to use to add a new listener .
*/
void tcpv4_add_listener ( struct listener * listener )
{
if ( listener - > state ! = LI_INIT )
return ;
listener - > state = LI_ASSIGNED ;
listener - > proto = & proto_tcpv4 ;
LIST_ADDQ ( & proto_tcpv4 . listeners , & listener - > proto_list ) ;
proto_tcpv4 . nb_listeners + + ;
}
/* Add listener to the list of tcpv4 listeners. The listener's state
* is automatically updated from LI_INIT to LI_ASSIGNED . The number of
* listeners is updated . This is the function to use to add a new listener .
*/
void tcpv6_add_listener ( struct listener * listener )
{
if ( listener - > state ! = LI_INIT )
return ;
listener - > state = LI_ASSIGNED ;
listener - > proto = & proto_tcpv6 ;
LIST_ADDQ ( & proto_tcpv6 . listeners , & listener - > proto_list ) ;
proto_tcpv6 . nb_listeners + + ;
}
2014-07-07 20:22:12 +02:00
/* Pause a listener. Returns < 0 in case of failure, 0 if the listener
* was totally stopped , or > 0 if correctly paused .
*/
int tcp_pause_listener ( struct listener * l )
{
if ( shutdown ( l - > fd , SHUT_WR ) ! = 0 )
return - 1 ; /* Solaris dies here */
if ( listen ( l - > fd , l - > backlog ? l - > backlog : l - > maxconn ) ! = 0 )
return - 1 ; /* OpenBSD dies here */
if ( shutdown ( l - > fd , SHUT_RD ) ! = 0 )
return - 1 ; /* should always be OK */
return 1 ;
}
2008-11-30 23:15:34 +01:00
/* This function performs the TCP request analysis on the current request. It
* returns 1 if the processing can continue on next analysers , or zero if it
* needs more data , encounters an error , or wants to immediately abort the
2010-08-03 14:02:05 +02:00
* request . It relies on buffers flags , and updates s - > req - > analysers . The
* function may be called for frontend rules and backend rules . It only relies
* on the backend pointer so this works for both cases .
2008-11-30 23:15:34 +01:00
*/
2012-07-02 15:11:27 +02:00
int tcp_inspect_request ( struct session * s , struct channel * req , int an_bit )
2008-11-30 23:15:34 +01:00
{
struct tcp_rule * rule ;
2010-08-03 19:34:32 +02:00
struct stksess * ts ;
struct stktable * t ;
2008-11-30 23:15:34 +01:00
int partial ;
2012-03-01 18:19:58 +01:00
DPRINTF ( stderr , " [%u] %s: session=%p b=%p, exp(r,w)=%u,%u bf=%08x bh=%d analysers=%02x \n " ,
2008-11-30 23:15:34 +01:00
now_ms , __FUNCTION__ ,
s ,
req ,
req - > rex , req - > wex ,
req - > flags ,
2012-10-12 23:49:43 +02:00
req - > buf - > i ,
2008-11-30 23:15:34 +01:00
req - > analysers ) ;
/* We don't know whether we have enough data, so must proceed
* this way :
* - iterate through all rules in their declaration order
* - if one rule returns MISS , it means the inspect delay is
* not over yet , then return immediately , otherwise consider
* it as a non - match .
* - if one rule returns OK , then return OK
* - if one rule returns KO , then return KO
*/
2012-10-12 23:49:43 +02:00
if ( ( req - > flags & CF_SHUTR ) | | buffer_full ( req - > buf , global . tune . maxrewrite ) | |
2012-08-27 20:46:07 +02:00
! s - > be - > tcp_req . inspect_delay | | tick_is_expired ( req - > analyse_exp , now_ms ) )
2012-04-25 10:13:36 +02:00
partial = SMP_OPT_FINAL ;
2008-11-30 23:15:34 +01:00
else
2012-04-25 10:13:36 +02:00
partial = 0 ;
2008-11-30 23:15:34 +01:00
2010-08-03 14:02:05 +02:00
list_for_each_entry ( rule , & s - > be - > tcp_req . inspect_rules , list ) {
2013-11-28 22:21:02 +01:00
enum acl_test_res ret = ACL_TEST_PASS ;
2008-11-30 23:15:34 +01:00
if ( rule - > cond ) {
2012-04-25 10:13:36 +02:00
ret = acl_exec_cond ( rule - > cond , s - > be , s , & s - > txn , SMP_OPT_DIR_REQ | partial ) ;
2014-06-13 16:18:52 +02:00
if ( ret = = ACL_TEST_MISS )
goto missing_data ;
2008-11-30 23:15:34 +01:00
ret = acl_pass ( ret ) ;
if ( rule - > cond - > pol = = ACL_COND_UNLESS )
ret = ! ret ;
}
if ( ret ) {
/* we have a matching rule. */
if ( rule - > action = = TCP_ACT_REJECT ) {
2012-08-28 00:06:31 +02:00
channel_abort ( req ) ;
channel_abort ( s - > rep ) ;
2008-11-30 23:15:34 +01:00
req - > analysers = 0 ;
2009-10-04 15:43:17 +02:00
2011-03-10 23:25:56 +01:00
s - > be - > be_counters . denied_req + + ;
s - > fe - > fe_counters . denied_req + + ;
2009-10-04 15:43:17 +02:00
if ( s - > listener - > counters )
2010-05-23 23:50:44 +02:00
s - > listener - > counters - > denied_req + + ;
2009-10-04 15:43:17 +02:00
2008-11-30 23:15:34 +01:00
if ( ! ( s - > flags & SN_ERR_MASK ) )
s - > flags | = SN_ERR_PRXCOND ;
if ( ! ( s - > flags & SN_FINST_MASK ) )
s - > flags | = SN_FINST_R ;
return 0 ;
}
2013-10-30 19:24:00 +01:00
else if ( rule - > action > = TCP_ACT_TRK_SC0 & & rule - > action < = TCP_ACT_TRK_SCMAX ) {
2012-12-09 12:00:04 +01:00
/* Note: only the first valid tracking parameter of each
* applies .
*/
struct stktable_key * key ;
2014-06-25 17:01:56 +02:00
struct sample smp ;
2012-12-09 12:00:04 +01:00
2014-01-28 23:18:23 +01:00
if ( stkctr_entry ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] ) )
2013-10-30 19:24:00 +01:00
continue ;
2012-12-09 12:00:04 +01:00
t = rule - > act_prm . trk_ctr . table . t ;
2014-06-25 17:01:56 +02:00
key = stktable_fetch_key ( t , s - > be , s , & s - > txn , SMP_OPT_DIR_REQ | partial , rule - > act_prm . trk_ctr . expr , & smp ) ;
if ( smp . flags & SMP_F_MAY_CHANGE )
goto missing_data ;
2012-12-09 12:00:04 +01:00
if ( key & & ( ts = stktable_get_entry ( t , key ) ) ) {
2013-05-28 17:40:25 +02:00
session_track_stkctr ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] , t , ts ) ;
2014-01-28 23:18:23 +01:00
stkctr_set_flags ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] , STKCTR_TRACK_CONTENT ) ;
2013-05-28 17:40:25 +02:00
if ( s - > fe ! = s - > be )
2014-01-28 23:18:23 +01:00
stkctr_set_flags ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] , STKCTR_TRACK_BACKEND ) ;
2010-08-03 19:34:32 +02:00
}
}
2014-06-13 16:18:52 +02:00
else if ( rule - > action = = TCP_ACT_CAPTURE ) {
struct sample * key ;
struct cap_hdr * h = rule - > act_prm . cap . hdr ;
char * * cap = s - > txn . req . cap ;
int len ;
key = sample_fetch_string ( s - > be , s , & s - > txn , SMP_OPT_DIR_REQ | partial , rule - > act_prm . cap . expr ) ;
if ( ! key )
continue ;
if ( key - > flags & SMP_F_MAY_CHANGE )
goto missing_data ;
if ( cap [ h - > index ] = = NULL )
cap [ h - > index ] = pool_alloc2 ( h - > pool ) ;
if ( cap [ h - > index ] = = NULL ) /* no more capture memory */
continue ;
len = key - > data . str . len ;
if ( len > h - > len )
len = h - > len ;
memcpy ( cap [ h - > index ] , key - > data . str . str , len ) ;
cap [ h - > index ] [ len ] = 0 ;
}
2010-08-03 19:34:32 +02:00
else {
2008-11-30 23:15:34 +01:00
/* otherwise accept */
2010-08-03 19:34:32 +02:00
break ;
}
2008-11-30 23:15:34 +01:00
}
}
/* if we get there, it means we have no rule which matches, or
* we have an explicit accept , so we apply the default accept .
*/
2009-07-07 10:55:49 +02:00
req - > analysers & = ~ an_bit ;
2008-11-30 23:15:34 +01:00
req - > analyse_exp = TICK_ETERNITY ;
return 1 ;
2014-06-13 16:18:52 +02:00
missing_data :
channel_dont_connect ( req ) ;
/* just set the request timeout once at the beginning of the request */
if ( ! tick_isset ( req - > analyse_exp ) & & s - > be - > tcp_req . inspect_delay )
req - > analyse_exp = tick_add ( now_ms , s - > be - > tcp_req . inspect_delay ) ;
return 0 ;
2008-11-30 23:15:34 +01:00
}
2010-09-23 17:56:44 +02:00
/* This function performs the TCP response analysis on the current response. It
* returns 1 if the processing can continue on next analysers , or zero if it
* needs more data , encounters an error , or wants to immediately abort the
* response . It relies on buffers flags , and updates s - > rep - > analysers . The
* function may be called for backend rules .
*/
2012-07-02 15:11:27 +02:00
int tcp_inspect_response ( struct session * s , struct channel * rep , int an_bit )
2010-09-23 17:56:44 +02:00
{
struct tcp_rule * rule ;
int partial ;
2012-03-01 18:19:58 +01:00
DPRINTF ( stderr , " [%u] %s: session=%p b=%p, exp(r,w)=%u,%u bf=%08x bh=%d analysers=%02x \n " ,
2010-09-23 17:56:44 +02:00
now_ms , __FUNCTION__ ,
s ,
rep ,
rep - > rex , rep - > wex ,
rep - > flags ,
2012-10-12 23:49:43 +02:00
rep - > buf - > i ,
2010-09-23 17:56:44 +02:00
rep - > analysers ) ;
/* We don't know whether we have enough data, so must proceed
* this way :
* - iterate through all rules in their declaration order
* - if one rule returns MISS , it means the inspect delay is
* not over yet , then return immediately , otherwise consider
* it as a non - match .
* - if one rule returns OK , then return OK
* - if one rule returns KO , then return KO
*/
2012-08-27 23:14:58 +02:00
if ( rep - > flags & CF_SHUTR | | tick_is_expired ( rep - > analyse_exp , now_ms ) )
2012-04-25 10:13:36 +02:00
partial = SMP_OPT_FINAL ;
2010-09-23 17:56:44 +02:00
else
2012-04-25 10:13:36 +02:00
partial = 0 ;
2010-09-23 17:56:44 +02:00
list_for_each_entry ( rule , & s - > be - > tcp_rep . inspect_rules , list ) {
2013-11-28 22:21:02 +01:00
enum acl_test_res ret = ACL_TEST_PASS ;
2010-09-23 17:56:44 +02:00
if ( rule - > cond ) {
2012-04-25 10:13:36 +02:00
ret = acl_exec_cond ( rule - > cond , s - > be , s , & s - > txn , SMP_OPT_DIR_RES | partial ) ;
2013-11-28 18:22:00 +01:00
if ( ret = = ACL_TEST_MISS ) {
2010-09-23 17:56:44 +02:00
/* just set the analyser timeout once at the beginning of the response */
if ( ! tick_isset ( rep - > analyse_exp ) & & s - > be - > tcp_rep . inspect_delay )
2013-11-04 15:56:53 +01:00
rep - > analyse_exp = tick_add ( now_ms , s - > be - > tcp_rep . inspect_delay ) ;
2010-09-23 17:56:44 +02:00
return 0 ;
}
ret = acl_pass ( ret ) ;
if ( rule - > cond - > pol = = ACL_COND_UNLESS )
ret = ! ret ;
}
if ( ret ) {
/* we have a matching rule. */
if ( rule - > action = = TCP_ACT_REJECT ) {
2012-08-28 00:06:31 +02:00
channel_abort ( rep ) ;
channel_abort ( s - > req ) ;
2010-09-23 17:56:44 +02:00
rep - > analysers = 0 ;
2011-03-10 23:25:56 +01:00
s - > be - > be_counters . denied_resp + + ;
s - > fe - > fe_counters . denied_resp + + ;
2010-09-23 17:56:44 +02:00
if ( s - > listener - > counters )
s - > listener - > counters - > denied_resp + + ;
if ( ! ( s - > flags & SN_ERR_MASK ) )
s - > flags | = SN_ERR_PRXCOND ;
if ( ! ( s - > flags & SN_FINST_MASK ) )
s - > flags | = SN_FINST_D ;
return 0 ;
}
2013-09-11 23:20:29 +02:00
else if ( rule - > action = = TCP_ACT_CLOSE ) {
rep - > prod - > flags | = SI_FL_NOLINGER | SI_FL_NOHALF ;
si_shutr ( rep - > prod ) ;
si_shutw ( rep - > prod ) ;
break ;
}
2010-09-23 17:56:44 +02:00
else {
/* otherwise accept */
break ;
}
}
}
/* if we get there, it means we have no rule which matches, or
* we have an explicit accept , so we apply the default accept .
*/
rep - > analysers & = ~ an_bit ;
rep - > analyse_exp = TICK_ETERNITY ;
return 1 ;
}
2010-05-31 10:30:33 +02:00
/* This function performs the TCP layer4 analysis on the current request. It
* returns 0 if a reject rule matches , otherwise 1 if either an accept rule
* matches or if no more rule matches . It can only use rules which don ' t need
2013-10-01 10:45:07 +02:00
* any data . This only works on connection - based client - facing stream interfaces .
2010-05-31 10:30:33 +02:00
*/
int tcp_exec_req_rules ( struct session * s )
{
struct tcp_rule * rule ;
2010-08-03 16:29:52 +02:00
struct stksess * ts ;
2010-06-14 21:04:55 +02:00
struct stktable * t = NULL ;
2013-10-01 10:45:07 +02:00
struct connection * conn = objt_conn ( s - > si [ 0 ] . end ) ;
2010-06-14 21:04:55 +02:00
int result = 1 ;
2013-11-28 22:21:02 +01:00
enum acl_test_res ret ;
2010-05-31 10:30:33 +02:00
2013-10-01 10:45:07 +02:00
if ( ! conn )
return result ;
2010-05-31 10:30:33 +02:00
list_for_each_entry ( rule , & s - > fe - > tcp_req . l4_rules , list ) {
2013-11-28 18:22:00 +01:00
ret = ACL_TEST_PASS ;
2010-05-31 10:30:33 +02:00
if ( rule - > cond ) {
2012-04-25 10:13:36 +02:00
ret = acl_exec_cond ( rule - > cond , s - > fe , s , NULL , SMP_OPT_DIR_REQ | SMP_OPT_FINAL ) ;
2010-05-31 10:30:33 +02:00
ret = acl_pass ( ret ) ;
if ( rule - > cond - > pol = = ACL_COND_UNLESS )
ret = ! ret ;
}
if ( ret ) {
/* we have a matching rule. */
if ( rule - > action = = TCP_ACT_REJECT ) {
2011-03-10 23:25:56 +01:00
s - > fe - > fe_counters . denied_conn + + ;
2010-05-31 10:30:33 +02:00
if ( s - > listener - > counters )
2010-06-05 15:43:21 +02:00
s - > listener - > counters - > denied_conn + + ;
2010-05-31 10:30:33 +02:00
if ( ! ( s - > flags & SN_ERR_MASK ) )
s - > flags | = SN_ERR_PRXCOND ;
if ( ! ( s - > flags & SN_FINST_MASK ) )
s - > flags | = SN_FINST_R ;
2010-06-14 21:04:55 +02:00
result = 0 ;
break ;
}
2013-10-30 19:24:00 +01:00
else if ( rule - > action > = TCP_ACT_TRK_SC0 & & rule - > action < = TCP_ACT_TRK_SCMAX ) {
2012-12-09 12:00:04 +01:00
/* Note: only the first valid tracking parameter of each
* applies .
*/
struct stktable_key * key ;
2014-01-28 23:18:23 +01:00
if ( stkctr_entry ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] ) )
2013-10-30 19:24:00 +01:00
continue ;
2012-12-09 12:00:04 +01:00
t = rule - > act_prm . trk_ctr . table . t ;
2014-06-25 16:20:53 +02:00
key = stktable_fetch_key ( t , s - > be , s , & s - > txn , SMP_OPT_DIR_REQ | SMP_OPT_FINAL , rule - > act_prm . trk_ctr . expr , NULL ) ;
2012-12-09 12:00:04 +01:00
2013-05-28 17:40:25 +02:00
if ( key & & ( ts = stktable_get_entry ( t , key ) ) )
session_track_stkctr ( & s - > stkctr [ tcp_trk_idx ( rule - > action ) ] , t , ts ) ;
2010-06-14 21:04:55 +02:00
}
2013-06-11 20:40:55 +02:00
else if ( rule - > action = = TCP_ACT_EXPECT_PX ) {
2013-10-01 10:45:07 +02:00
conn - > flags | = CO_FL_ACCEPT_PROXY ;
conn_sock_want_recv ( conn ) ;
2013-06-11 20:40:55 +02:00
}
2010-06-14 21:04:55 +02:00
else {
/* otherwise it's an accept */
break ;
2010-05-31 10:30:33 +02:00
}
}
}
2010-06-14 21:04:55 +02:00
return result ;
2010-05-31 10:30:33 +02:00
}
2010-09-23 17:56:44 +02:00
/* Parse a tcp-response rule. Return a negative value in case of failure */
static int tcp_parse_response_rule ( char * * args , int arg , int section_type ,
2014-02-11 03:31:34 +01:00
struct proxy * curpx , struct proxy * defpx ,
struct tcp_rule * rule , char * * err ,
unsigned int where ,
const char * file , int line )
2010-09-23 17:56:44 +02:00
{
if ( curpx = = defpx | | ! ( curpx - > cap & PR_CAP_BE ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " %s %s is only allowed in 'backend' sections " ,
args [ 0 ] , args [ 1 ] ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
if ( strcmp ( args [ arg ] , " accept " ) = = 0 ) {
arg + + ;
rule - > action = TCP_ACT_ACCEPT ;
}
else if ( strcmp ( args [ arg ] , " reject " ) = = 0 ) {
arg + + ;
rule - > action = TCP_ACT_REJECT ;
}
2013-09-11 23:20:29 +02:00
else if ( strcmp ( args [ arg ] , " close " ) = = 0 ) {
arg + + ;
rule - > action = TCP_ACT_CLOSE ;
}
2010-09-23 17:56:44 +02:00
else {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
2013-09-11 23:20:29 +02:00
" '%s %s' expects 'accept', 'close' or 'reject' in %s '%s' (got '%s') " ,
2012-05-08 19:47:01 +02:00
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
if ( strcmp ( args [ arg ] , " if " ) = = 0 | | strcmp ( args [ arg ] , " unless " ) = = 0 ) {
2014-02-11 03:31:34 +01:00
if ( ( rule - > cond = build_acl_cond ( file , line , curpx , ( const char * * ) args + arg , err ) ) = = NULL ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s %s' : error detected in %s '%s' while parsing '%s' condition : %s " ,
args [ 0 ] , args [ 1 ] , args [ 2 ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] , * err ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
}
else if ( * args [ arg ] ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s %s' only accepts 'if' or 'unless', in %s '%s' (got '%s') " ,
2010-09-23 17:56:44 +02:00
args [ 0 ] , args [ 1 ] , args [ 2 ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] ) ;
return - 1 ;
}
return 0 ;
}
2010-08-06 15:08:45 +02:00
/* Parse a tcp-request rule. Return a negative value in case of failure */
static int tcp_parse_request_rule ( char * * args , int arg , int section_type ,
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
struct proxy * curpx , struct proxy * defpx ,
struct tcp_rule * rule , char * * err ,
2014-02-11 03:31:34 +01:00
unsigned int where , const char * file , int line )
2010-08-06 15:08:45 +02:00
{
if ( curpx = = defpx ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " %s %s is not allowed in 'defaults' sections " ,
args [ 0 ] , args [ 1 ] ) ;
2010-08-06 15:08:45 +02:00
return - 1 ;
}
if ( ! strcmp ( args [ arg ] , " accept " ) ) {
arg + + ;
rule - > action = TCP_ACT_ACCEPT ;
}
else if ( ! strcmp ( args [ arg ] , " reject " ) ) {
arg + + ;
rule - > action = TCP_ACT_REJECT ;
}
2014-06-13 16:18:52 +02:00
else if ( strcmp ( args [ arg ] , " capture " ) = = 0 ) {
struct sample_expr * expr ;
struct cap_hdr * hdr ;
int kw = arg ;
int len = 0 ;
if ( ! ( curpx - > cap & PR_CAP_FE ) ) {
memprintf ( err ,
" '%s %s %s' : proxy '%s' has no frontend capability " ,
args [ 0 ] , args [ 1 ] , args [ kw ] , curpx - > id ) ;
return - 1 ;
}
if ( ! ( where & SMP_VAL_FE_REQ_CNT ) ) {
memprintf ( err ,
" '%s %s' is not allowed in '%s %s' rules in %s '%s' " ,
args [ arg ] , args [ arg + 1 ] , args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
return - 1 ;
}
arg + + ;
curpx - > conf . args . ctx = ARGC_CAP ;
expr = sample_parse_expr ( args , & arg , file , line , err , & curpx - > conf . args ) ;
if ( ! expr ) {
memprintf ( err ,
" '%s %s %s' : %s " ,
args [ 0 ] , args [ 1 ] , args [ kw ] , * err ) ;
return - 1 ;
}
if ( ! ( expr - > fetch - > val & where ) ) {
memprintf ( err ,
" '%s %s %s' : fetch method '%s' extracts information from '%s', none of which is available here " ,
args [ 0 ] , args [ 1 ] , args [ kw ] , args [ arg - 1 ] , sample_src_names ( expr - > fetch - > use ) ) ;
free ( expr ) ;
return - 1 ;
}
if ( strcmp ( args [ arg ] , " len " ) = = 0 ) {
arg + + ;
if ( ! args [ arg ] ) {
memprintf ( err ,
" '%s %s %s' : missing length value " ,
args [ 0 ] , args [ 1 ] , args [ kw ] ) ;
free ( expr ) ;
return - 1 ;
}
/* we copy the table name for now, it will be resolved later */
len = atoi ( args [ arg ] ) ;
if ( len < = 0 ) {
memprintf ( err ,
" '%s %s %s' : length must be > 0 " ,
args [ 0 ] , args [ 1 ] , args [ kw ] ) ;
free ( expr ) ;
return - 1 ;
}
arg + + ;
}
if ( ! len ) {
memprintf ( err ,
" '%s %s %s' : a positive 'len' argument is mandatory " ,
args [ 0 ] , args [ 1 ] , args [ kw ] ) ;
free ( expr ) ;
return - 1 ;
}
hdr = calloc ( sizeof ( struct cap_hdr ) , 1 ) ;
hdr - > next = curpx - > req_cap ;
hdr - > name = NULL ; /* not a header capture */
hdr - > namelen = 0 ;
hdr - > len = len ;
hdr - > pool = create_pool ( " caphdr " , hdr - > len + 1 , MEM_F_SHARED ) ;
hdr - > index = curpx - > nb_req_cap + + ;
curpx - > req_cap = hdr ;
curpx - > to_log | = LW_REQHDR ;
/* check if we need to allocate an hdr_idx struct for HTTP parsing */
curpx - > http_needed | = ! ! ( expr - > fetch - > use & SMP_USE_HTTP_ANY ) ;
rule - > act_prm . cap . expr = expr ;
rule - > act_prm . cap . hdr = hdr ;
rule - > action = TCP_ACT_CAPTURE ;
}
2013-07-23 19:15:30 +02:00
else if ( strncmp ( args [ arg ] , " track-sc " , 8 ) = = 0 & &
args [ arg ] [ 9 ] = = ' \0 ' & & args [ arg ] [ 8 ] > = ' 0 ' & &
args [ arg ] [ 8 ] < = ' 0 ' + MAX_SESS_STKCTR ) { /* track-sc 0..9 */
2012-12-09 12:00:04 +01:00
struct sample_expr * expr ;
2012-05-08 19:47:01 +02:00
int kw = arg ;
2010-08-06 15:08:45 +02:00
arg + + ;
MAJOR: sample: maintain a per-proxy list of the fetch args to resolve
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.
The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.
The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).
After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.
Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.
The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.
Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
2013-04-02 16:34:32 +02:00
curpx - > conf . args . ctx = ARGC_TRK ;
2014-02-11 14:00:19 +01:00
expr = sample_parse_expr ( args , & arg , file , line , err , & curpx - > conf . args ) ;
2012-12-09 12:00:04 +01:00
if ( ! expr ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
2012-12-09 12:00:04 +01:00
" '%s %s %s' : %s " ,
2013-12-12 23:16:54 +01:00
args [ 0 ] , args [ 1 ] , args [ kw ] , * err ) ;
2012-12-09 12:00:04 +01:00
return - 1 ;
2012-05-08 19:47:01 +02:00
}
2010-08-06 15:08:45 +02:00
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
if ( ! ( expr - > fetch - > val & where ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
" '%s %s %s' : fetch method '%s' extracts information from '%s', none of which is available here " ,
2013-04-12 08:26:32 +02:00
args [ 0 ] , args [ 1 ] , args [ kw ] , args [ arg - 1 ] , sample_src_names ( expr - > fetch - > use ) ) ;
2012-12-09 12:00:04 +01:00
free ( expr ) ;
return - 1 ;
}
/* check if we need to allocate an hdr_idx struct for HTTP parsing */
2013-03-24 07:22:08 +01:00
curpx - > http_needed | = ! ! ( expr - > fetch - > use & SMP_USE_HTTP_ANY ) ;
2012-12-09 12:00:04 +01:00
if ( strcmp ( args [ arg ] , " table " ) = = 0 ) {
2012-12-09 16:57:27 +01:00
arg + + ;
if ( ! args [ arg ] ) {
2012-12-09 12:00:04 +01:00
memprintf ( err ,
" '%s %s %s' : missing table name " ,
args [ 0 ] , args [ 1 ] , args [ kw ] ) ;
free ( expr ) ;
return - 1 ;
}
/* we copy the table name for now, it will be resolved later */
2012-12-09 16:57:27 +01:00
rule - > act_prm . trk_ctr . table . n = strdup ( args [ arg ] ) ;
2012-12-09 12:00:04 +01:00
arg + + ;
2012-05-08 19:47:01 +02:00
}
2012-12-09 12:00:04 +01:00
rule - > act_prm . trk_ctr . expr = expr ;
2013-06-17 15:04:07 +02:00
rule - > action = TCP_ACT_TRK_SC0 + args [ kw ] [ 8 ] - ' 0 ' ;
2010-08-06 15:08:45 +02:00
}
2013-06-11 20:40:55 +02:00
else if ( strcmp ( args [ arg ] , " expect-proxy " ) = = 0 ) {
if ( strcmp ( args [ arg + 1 ] , " layer4 " ) ! = 0 ) {
memprintf ( err ,
" '%s %s %s' only supports 'layer4' in %s '%s' (got '%s') " ,
args [ 0 ] , args [ 1 ] , args [ arg ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg + 1 ] ) ;
return - 1 ;
}
if ( ! ( where & SMP_VAL_FE_CON_ACC ) ) {
memprintf ( err ,
" '%s %s' is not allowed in '%s %s' rules in %s '%s' " ,
args [ arg ] , args [ arg + 1 ] , args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
return - 1 ;
}
arg + = 2 ;
rule - > action = TCP_ACT_EXPECT_PX ;
}
2010-08-06 15:08:45 +02:00
else {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
2013-07-23 19:15:30 +02:00
" '%s %s' expects 'accept', 'reject', 'track-sc0' ... 'track-sc%d' "
" in %s '%s' (got '%s') " ,
args [ 0 ] , args [ 1 ] , MAX_SESS_STKCTR , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] ) ;
2010-08-06 15:08:45 +02:00
return - 1 ;
}
if ( strcmp ( args [ arg ] , " if " ) = = 0 | | strcmp ( args [ arg ] , " unless " ) = = 0 ) {
2014-02-11 03:31:34 +01:00
if ( ( rule - > cond = build_acl_cond ( file , line , curpx , ( const char * * ) args + arg , err ) ) = = NULL ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s %s' : error detected in %s '%s' while parsing '%s' condition : %s " ,
args [ 0 ] , args [ 1 ] , args [ 2 ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] , * err ) ;
2010-08-06 15:08:45 +02:00
return - 1 ;
}
}
else if ( * args [ arg ] ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s %s' only accepts 'if' or 'unless', in %s '%s' (got '%s') " ,
2010-08-06 15:08:45 +02:00
args [ 0 ] , args [ 1 ] , args [ 2 ] , proxy_type_str ( curpx ) , curpx - > id , args [ arg ] ) ;
return - 1 ;
}
return 0 ;
}
2010-09-23 17:56:44 +02:00
/* This function should be called to parse a line starting with the "tcp-response"
* keyword .
*/
static int tcp_parse_tcp_rep ( char * * args , int section_type , struct proxy * curpx ,
2012-09-18 20:02:48 +02:00
struct proxy * defpx , const char * file , int line ,
char * * err )
2010-09-23 17:56:44 +02:00
{
const char * ptr = NULL ;
unsigned int val ;
int warn = 0 ;
int arg ;
struct tcp_rule * rule ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
unsigned int where ;
2013-03-25 08:12:18 +01:00
const struct acl * acl ;
2013-03-31 22:59:32 +02:00
const char * kw ;
2010-09-23 17:56:44 +02:00
if ( ! * args [ 1 ] ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " missing argument for '%s' in %s '%s' " ,
args [ 0 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
if ( strcmp ( args [ 1 ] , " inspect-delay " ) = = 0 ) {
if ( curpx = = defpx | | ! ( curpx - > cap & PR_CAP_BE ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " %s %s is only allowed in 'backend' sections " ,
args [ 0 ] , args [ 1 ] ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
if ( ! * args [ 2 ] | | ( ptr = parse_time_err ( args [ 2 ] , & val , TIME_UNIT_MS ) ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s' expects a positive delay in milliseconds, in %s '%s' " ,
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
if ( ptr )
memprintf ( err , " %s (unexpected character '%c') " , * err , * ptr ) ;
2010-09-23 17:56:44 +02:00
return - 1 ;
}
if ( curpx - > tcp_rep . inspect_delay ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " ignoring %s %s (was already defined) in %s '%s' " ,
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
2010-09-23 17:56:44 +02:00
return 1 ;
}
curpx - > tcp_rep . inspect_delay = val ;
return 0 ;
}
2011-06-24 15:11:37 +09:00
rule = calloc ( 1 , sizeof ( * rule ) ) ;
2010-09-23 17:56:44 +02:00
LIST_INIT ( & rule - > list ) ;
arg = 1 ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
where = 0 ;
2010-09-23 17:56:44 +02:00
if ( strcmp ( args [ 1 ] , " content " ) = = 0 ) {
arg + + ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
if ( curpx - > cap & PR_CAP_FE )
where | = SMP_VAL_FE_RES_CNT ;
if ( curpx - > cap & PR_CAP_BE )
where | = SMP_VAL_BE_RES_CNT ;
2014-02-11 03:31:34 +01:00
if ( tcp_parse_response_rule ( args , arg , section_type , curpx , defpx , rule , err , where , file , line ) < 0 )
2010-09-23 17:56:44 +02:00
goto error ;
2013-03-25 08:12:18 +01:00
acl = rule - > cond ? acl_cond_conflicts ( rule - > cond , where ) : NULL ;
if ( acl ) {
if ( acl - > name & & * acl - > name )
memprintf ( err ,
" acl '%s' will never match in '%s %s' because it only involves keywords that are incompatible with '%s' " ,
acl - > name , args [ 0 ] , args [ 1 ] , sample_ckp_names ( where ) ) ;
else
memprintf ( err ,
" anonymous acl will never match in '%s %s' because it uses keyword '%s' which is incompatible with '%s' " ,
args [ 0 ] , args [ 1 ] ,
2013-03-31 22:59:32 +02:00
LIST_ELEM ( acl - > expr . n , struct acl_expr * , list ) - > kw ,
2013-03-25 08:12:18 +01:00
sample_ckp_names ( where ) ) ;
2010-09-23 17:56:44 +02:00
2013-03-25 08:12:18 +01:00
warn + + ;
}
else if ( rule - > cond & & acl_cond_kw_conflicts ( rule - > cond , where , & acl , & kw ) ) {
if ( acl - > name & & * acl - > name )
memprintf ( err ,
" acl '%s' involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
acl - > name , kw , sample_ckp_names ( where ) ) ;
2013-03-25 08:12:18 +01:00
else
memprintf ( err ,
" anonymous acl involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
kw , sample_ckp_names ( where ) ) ;
2010-09-23 17:56:44 +02:00
warn + + ;
}
LIST_ADDQ ( & curpx - > tcp_rep . inspect_rules , & rule - > list ) ;
}
else {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s' expects 'inspect-delay' or 'content' in %s '%s' (got '%s') " ,
args [ 0 ] , proxy_type_str ( curpx ) , curpx - > id , args [ 1 ] ) ;
2010-09-23 17:56:44 +02:00
goto error ;
}
return warn ;
error :
free ( rule ) ;
return - 1 ;
}
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
/* This function should be called to parse a line starting with the "tcp-request"
* keyword .
*/
static int tcp_parse_tcp_req ( char * * args , int section_type , struct proxy * curpx ,
2012-09-18 20:02:48 +02:00
struct proxy * defpx , const char * file , int line ,
char * * err )
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
{
const char * ptr = NULL ;
2008-08-17 17:13:47 +02:00
unsigned int val ;
2010-05-23 22:40:30 +02:00
int warn = 0 ;
2010-06-14 16:44:27 +02:00
int arg ;
2010-05-23 22:40:30 +02:00
struct tcp_rule * rule ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
unsigned int where ;
2013-03-25 08:12:18 +01:00
const struct acl * acl ;
2013-03-31 22:59:32 +02:00
const char * kw ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
if ( ! * args [ 1 ] ) {
2012-05-08 19:47:01 +02:00
if ( curpx = = defpx )
memprintf ( err , " missing argument for '%s' in defaults section " , args [ 0 ] ) ;
else
memprintf ( err , " missing argument for '%s' in %s '%s' " ,
args [ 0 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
return - 1 ;
}
if ( ! strcmp ( args [ 1 ] , " inspect-delay " ) ) {
if ( curpx = = defpx ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " %s %s is not allowed in 'defaults' sections " ,
args [ 0 ] , args [ 1 ] ) ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
return - 1 ;
}
if ( ! * args [ 2 ] | | ( ptr = parse_time_err ( args [ 2 ] , & val , TIME_UNIT_MS ) ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err ,
" '%s %s' expects a positive delay in milliseconds, in %s '%s' " ,
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
if ( ptr )
memprintf ( err , " %s (unexpected character '%c') " , * err , * ptr ) ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
return - 1 ;
}
if ( curpx - > tcp_req . inspect_delay ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " ignoring %s %s (was already defined) in %s '%s' " ,
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
return 1 ;
}
curpx - > tcp_req . inspect_delay = val ;
return 0 ;
}
2011-06-24 15:11:37 +09:00
rule = calloc ( 1 , sizeof ( * rule ) ) ;
2010-08-20 13:35:41 +02:00
LIST_INIT ( & rule - > list ) ;
2010-06-14 16:44:27 +02:00
arg = 1 ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
where = 0 ;
2010-06-14 16:44:27 +02:00
2010-08-06 15:08:45 +02:00
if ( strcmp ( args [ 1 ] , " content " ) = = 0 ) {
2010-08-03 19:34:32 +02:00
arg + + ;
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
if ( curpx - > cap & PR_CAP_FE )
where | = SMP_VAL_FE_REQ_CNT ;
if ( curpx - > cap & PR_CAP_BE )
where | = SMP_VAL_BE_REQ_CNT ;
2014-02-11 03:31:34 +01:00
if ( tcp_parse_request_rule ( args , arg , section_type , curpx , defpx , rule , err , where , file , line ) < 0 )
2010-06-14 16:44:27 +02:00
goto error ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
2013-03-25 08:12:18 +01:00
acl = rule - > cond ? acl_cond_conflicts ( rule - > cond , where ) : NULL ;
if ( acl ) {
if ( acl - > name & & * acl - > name )
memprintf ( err ,
" acl '%s' will never match in '%s %s' because it only involves keywords that are incompatible with '%s' " ,
acl - > name , args [ 0 ] , args [ 1 ] , sample_ckp_names ( where ) ) ;
else
memprintf ( err ,
" anonymous acl will never match in '%s %s' because it uses keyword '%s' which is incompatible with '%s' " ,
args [ 0 ] , args [ 1 ] ,
2013-03-31 22:59:32 +02:00
LIST_ELEM ( acl - > expr . n , struct acl_expr * , list ) - > kw ,
2013-03-25 08:12:18 +01:00
sample_ckp_names ( where ) ) ;
2008-07-27 22:02:32 +02:00
2013-03-25 08:12:18 +01:00
warn + + ;
}
else if ( rule - > cond & & acl_cond_kw_conflicts ( rule - > cond , where , & acl , & kw ) ) {
if ( acl - > name & & * acl - > name )
memprintf ( err ,
" acl '%s' involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
acl - > name , kw , sample_ckp_names ( where ) ) ;
2013-03-25 08:12:18 +01:00
else
memprintf ( err ,
" anonymous acl involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
kw , sample_ckp_names ( where ) ) ;
2008-07-27 22:02:32 +02:00
warn + + ;
}
2012-12-09 12:00:04 +01:00
2010-08-20 13:35:41 +02:00
LIST_ADDQ ( & curpx - > tcp_req . inspect_rules , & rule - > list ) ;
2010-06-14 21:04:55 +02:00
}
2010-08-06 15:08:45 +02:00
else if ( strcmp ( args [ 1 ] , " connection " ) = = 0 ) {
2010-06-14 21:04:55 +02:00
arg + + ;
2010-08-03 16:29:52 +02:00
2010-08-06 15:08:45 +02:00
if ( ! ( curpx - > cap & PR_CAP_FE ) ) {
2012-05-08 19:47:01 +02:00
memprintf ( err , " %s %s is not allowed because %s %s is not a frontend " ,
args [ 0 ] , args [ 1 ] , proxy_type_str ( curpx ) , curpx - > id ) ;
2011-07-15 13:14:06 +09:00
goto error ;
2010-08-06 15:08:45 +02:00
}
2010-08-03 16:29:52 +02:00
MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-01-07 15:42:20 +01:00
where | = SMP_VAL_FE_CON_ACC ;
2014-02-11 03:31:34 +01:00
if ( tcp_parse_request_rule ( args , arg , section_type , curpx , defpx , rule , err , where , file , line ) < 0 )
2010-08-03 16:29:52 +02:00
goto error ;
2013-03-25 08:12:18 +01:00
acl = rule - > cond ? acl_cond_conflicts ( rule - > cond , where ) : NULL ;
if ( acl ) {
if ( acl - > name & & * acl - > name )
memprintf ( err ,
" acl '%s' will never match in '%s %s' because it only involves keywords that are incompatible with '%s' " ,
acl - > name , args [ 0 ] , args [ 1 ] , sample_ckp_names ( where ) ) ;
else
memprintf ( err ,
" anonymous acl will never match in '%s %s' because it uses keyword '%s' which is incompatible with '%s' " ,
args [ 0 ] , args [ 1 ] ,
2013-03-31 22:59:32 +02:00
LIST_ELEM ( acl - > expr . n , struct acl_expr * , list ) - > kw ,
2013-03-25 08:12:18 +01:00
sample_ckp_names ( where ) ) ;
2010-06-14 21:04:55 +02:00
2013-03-25 08:12:18 +01:00
warn + + ;
}
else if ( rule - > cond & & acl_cond_kw_conflicts ( rule - > cond , where , & acl , & kw ) ) {
if ( acl - > name & & * acl - > name )
2012-05-08 19:47:01 +02:00
memprintf ( err ,
2013-03-25 08:12:18 +01:00
" acl '%s' involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
acl - > name , kw , sample_ckp_names ( where ) ) ;
2013-03-25 08:12:18 +01:00
else
2012-05-08 19:47:01 +02:00
memprintf ( err ,
2013-03-25 08:12:18 +01:00
" anonymous acl involves keyword '%s' which is incompatible with '%s' " ,
2013-03-31 22:59:32 +02:00
kw , sample_ckp_names ( where ) ) ;
2010-08-06 15:08:45 +02:00
warn + + ;
}
2012-12-09 12:00:04 +01:00
2010-08-20 13:35:41 +02:00
LIST_ADDQ ( & curpx - > tcp_req . l4_rules , & rule - > list ) ;
2010-06-14 21:04:55 +02:00
}
2010-05-23 22:40:30 +02:00
else {
2012-05-08 19:47:01 +02:00
if ( curpx = = defpx )
memprintf ( err ,
" '%s' expects 'inspect-delay', 'connection', or 'content' in defaults section (got '%s') " ,
args [ 0 ] , args [ 1 ] ) ;
else
memprintf ( err ,
" '%s' expects 'inspect-delay', 'connection', or 'content' in %s '%s' (got '%s') " ,
2013-04-10 16:31:11 +02:00
args [ 0 ] , proxy_type_str ( curpx ) , curpx - > id , args [ 1 ] ) ;
2010-06-14 16:44:27 +02:00
goto error ;
2010-05-23 22:40:30 +02:00
}
return warn ;
2010-06-14 16:44:27 +02:00
error :
free ( rule ) ;
return - 1 ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
}
2010-05-24 20:55:15 +02:00
2012-04-23 23:13:20 +02:00
/************************************************************************/
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
/* All supported sample fetch functions must be declared here */
2012-04-23 23:13:20 +02:00
/************************************************************************/
2012-04-25 17:31:42 +02:00
/* fetch the connection's source IPv4/IPv6 address */
2010-05-24 20:55:15 +02:00
static int
2012-04-25 17:31:42 +02:00
smp_fetch_src ( struct proxy * px , struct session * l4 , void * l7 , unsigned int opt ,
2013-07-22 16:29:32 +02:00
const struct arg * args , struct sample * smp , const char * kw )
2010-05-24 20:55:15 +02:00
{
2013-10-01 10:45:07 +02:00
struct connection * cli_conn = objt_conn ( l4 - > si [ 0 ] . end ) ;
if ( ! cli_conn )
return 0 ;
switch ( cli_conn - > addr . from . ss_family ) {
2011-12-16 17:49:52 +01:00
case AF_INET :
2013-10-01 10:45:07 +02:00
smp - > data . ipv4 = ( ( struct sockaddr_in * ) & cli_conn - > addr . from ) - > sin_addr ;
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_IPV4 ;
2011-12-16 17:49:52 +01:00
break ;
case AF_INET6 :
2013-10-01 10:45:07 +02:00
smp - > data . ipv6 = ( ( struct sockaddr_in6 * ) & cli_conn - > addr . from ) - > sin6_addr ;
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_IPV6 ;
2011-12-16 17:49:52 +01:00
break ;
default :
2010-10-22 17:14:01 +02:00
return 0 ;
2011-12-16 17:49:52 +01:00
}
2010-10-22 17:14:01 +02:00
2012-04-23 16:16:37 +02:00
smp - > flags = 0 ;
2010-05-24 20:55:15 +02:00
return 1 ;
}
2011-12-16 17:06:15 +01:00
/* set temp integer to the connection's source port */
2010-05-24 20:55:15 +02:00
static int
2012-04-25 16:21:44 +02:00
smp_fetch_sport ( struct proxy * px , struct session * l4 , void * l7 , unsigned int opt ,
2013-07-22 16:29:32 +02:00
const struct arg * args , struct sample * smp , const char * kw )
2010-05-24 20:55:15 +02:00
{
2013-10-01 10:45:07 +02:00
struct connection * cli_conn = objt_conn ( l4 - > si [ 0 ] . end ) ;
if ( ! cli_conn )
return 0 ;
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_UINT ;
2013-10-01 10:45:07 +02:00
if ( ! ( smp - > data . uint = get_host_port ( & cli_conn - > addr . from ) ) )
2010-10-22 17:14:01 +02:00
return 0 ;
2012-04-23 16:16:37 +02:00
smp - > flags = 0 ;
2010-05-24 20:55:15 +02:00
return 1 ;
}
2012-04-25 17:31:42 +02:00
/* fetch the connection's destination IPv4/IPv6 address */
2010-05-24 20:55:15 +02:00
static int
2012-04-25 17:31:42 +02:00
smp_fetch_dst ( struct proxy * px , struct session * l4 , void * l7 , unsigned int opt ,
2013-07-22 16:29:32 +02:00
const struct arg * args , struct sample * smp , const char * kw )
2010-05-24 20:55:15 +02:00
{
2013-10-01 10:45:07 +02:00
struct connection * cli_conn = objt_conn ( l4 - > si [ 0 ] . end ) ;
2010-05-24 20:55:15 +02:00
2013-10-01 10:45:07 +02:00
if ( ! cli_conn )
return 0 ;
conn_get_to_addr ( cli_conn ) ;
switch ( cli_conn - > addr . to . ss_family ) {
2011-12-16 17:49:52 +01:00
case AF_INET :
2013-10-01 10:45:07 +02:00
smp - > data . ipv4 = ( ( struct sockaddr_in * ) & cli_conn - > addr . to ) - > sin_addr ;
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_IPV4 ;
2011-12-16 17:49:52 +01:00
break ;
case AF_INET6 :
2013-10-01 10:45:07 +02:00
smp - > data . ipv6 = ( ( struct sockaddr_in6 * ) & cli_conn - > addr . to ) - > sin6_addr ;
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_IPV6 ;
2011-12-16 17:49:52 +01:00
break ;
default :
2010-10-22 17:14:01 +02:00
return 0 ;
2011-12-16 17:49:52 +01:00
}
2010-10-22 17:14:01 +02:00
2012-04-23 16:16:37 +02:00
smp - > flags = 0 ;
2010-05-24 20:55:15 +02:00
return 1 ;
}
2011-12-16 17:06:15 +01:00
/* set temp integer to the frontend connexion's destination port */
2010-05-24 20:55:15 +02:00
static int
2012-04-25 16:21:44 +02:00
smp_fetch_dport ( struct proxy * px , struct session * l4 , void * l7 , unsigned int opt ,
2013-07-22 16:29:32 +02:00
const struct arg * args , struct sample * smp , const char * kw )
2010-05-24 20:55:15 +02:00
{
2013-10-01 10:45:07 +02:00
struct connection * cli_conn = objt_conn ( l4 - > si [ 0 ] . end ) ;
if ( ! cli_conn )
return 0 ;
conn_get_to_addr ( cli_conn ) ;
2010-05-24 20:55:15 +02:00
2012-04-23 18:53:56 +02:00
smp - > type = SMP_T_UINT ;
2013-10-01 10:45:07 +02:00
if ( ! ( smp - > data . uint = get_host_port ( & cli_conn - > addr . to ) ) )
2010-10-22 17:14:01 +02:00
return 0 ;
2012-04-23 16:16:37 +02:00
smp - > flags = 0 ;
2010-05-24 20:55:15 +02:00
return 1 ;
}
2012-11-24 11:55:28 +01:00
# ifdef IPV6_V6ONLY
2012-11-24 15:07:23 +01:00
/* parse the "v4v6" bind keyword */
static int bind_parse_v4v6 ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
{
struct listener * l ;
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET6 )
l - > options | = LI_O_V4V6 ;
}
return 0 ;
}
2012-11-24 11:55:28 +01:00
/* parse the "v6only" bind keyword */
static int bind_parse_v6only ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
{
struct listener * l ;
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET6 )
l - > options | = LI_O_V6ONLY ;
}
return 0 ;
}
# endif
2013-05-08 22:49:23 +02:00
# ifdef CONFIG_HAP_TRANSPARENT
2012-09-12 23:27:21 +02:00
/* parse the "transparent" bind keyword */
2012-09-20 16:48:07 +02:00
static int bind_parse_transparent ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
2012-09-12 23:27:21 +02:00
{
struct listener * l ;
2012-09-20 16:48:07 +02:00
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET | | l - > addr . ss_family = = AF_INET6 )
l - > options | = LI_O_FOREIGN ;
2012-09-12 23:27:21 +02:00
}
return 0 ;
}
# endif
# ifdef TCP_DEFER_ACCEPT
/* parse the "defer-accept" bind keyword */
2012-09-20 16:48:07 +02:00
static int bind_parse_defer_accept ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
2012-09-12 23:27:21 +02:00
{
struct listener * l ;
2012-09-20 16:48:07 +02:00
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET | | l - > addr . ss_family = = AF_INET6 )
l - > options | = LI_O_DEF_ACCEPT ;
2012-09-12 23:27:21 +02:00
}
return 0 ;
}
# endif
2012-10-05 16:21:00 +02:00
# ifdef TCP_FASTOPEN
2013-02-13 23:35:39 +01:00
/* parse the "tfo" bind keyword */
2012-10-05 16:21:00 +02:00
static int bind_parse_tfo ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
{
struct listener * l ;
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET | | l - > addr . ss_family = = AF_INET6 )
l - > options | = LI_O_TCP_FO ;
}
return 0 ;
}
# endif
2012-09-12 23:27:21 +02:00
# ifdef TCP_MAXSEG
/* parse the "mss" bind keyword */
2012-09-20 16:48:07 +02:00
static int bind_parse_mss ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
2012-09-12 23:27:21 +02:00
{
struct listener * l ;
int mss ;
if ( ! * args [ cur_arg + 1 ] ) {
2012-09-20 19:43:14 +02:00
memprintf ( err , " '%s' : missing MSS value " , args [ cur_arg ] ) ;
2012-09-12 23:27:21 +02:00
return ERR_ALERT | ERR_FATAL ;
}
mss = atoi ( args [ cur_arg + 1 ] ) ;
if ( ! mss | | abs ( mss ) > 65535 ) {
2012-09-20 19:43:14 +02:00
memprintf ( err , " '%s' : expects an MSS with and absolute value between 1 and 65535 " , args [ cur_arg ] ) ;
2012-09-12 23:27:21 +02:00
return ERR_ALERT | ERR_FATAL ;
}
2012-09-20 16:48:07 +02:00
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET | | l - > addr . ss_family = = AF_INET6 )
l - > maxseg = mss ;
}
2012-09-12 23:27:21 +02:00
return 0 ;
}
# endif
# ifdef SO_BINDTODEVICE
/* parse the "mss" bind keyword */
2012-09-20 16:48:07 +02:00
static int bind_parse_interface ( char * * args , int cur_arg , struct proxy * px , struct bind_conf * conf , char * * err )
2012-09-12 23:27:21 +02:00
{
struct listener * l ;
if ( ! * args [ cur_arg + 1 ] ) {
2012-09-20 19:43:14 +02:00
memprintf ( err , " '%s' : missing interface name " , args [ cur_arg ] ) ;
2012-09-12 23:27:21 +02:00
return ERR_ALERT | ERR_FATAL ;
}
2012-09-20 16:48:07 +02:00
list_for_each_entry ( l , & conf - > listeners , by_bind ) {
if ( l - > addr . ss_family = = AF_INET | | l - > addr . ss_family = = AF_INET6 )
l - > interface = strdup ( args [ cur_arg + 1 ] ) ;
}
2012-09-12 23:27:21 +02:00
global . last_checks | = LSTCHK_NETADM ;
return 0 ;
}
# endif
2013-06-21 23:16:39 +02:00
static struct cfg_kw_list cfg_kws = { ILH , {
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
{ CFG_LISTEN , " tcp-request " , tcp_parse_tcp_req } ,
2010-09-23 17:56:44 +02:00
{ CFG_LISTEN , " tcp-response " , tcp_parse_tcp_rep } ,
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
{ 0 , NULL , NULL } ,
} } ;
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
2012-04-19 18:42:05 +02:00
/* Note: must not be declared <const> as its list will be overwritten.
* Please take care of keeping this list alphabetically sorted .
*/
2013-06-21 23:16:39 +02:00
static struct acl_kw_list acl_kws = { ILH , {
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
{ /* END */ } ,
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
} } ;
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
2012-04-25 17:31:42 +02:00
/* Note: must not be declared <const> as its list will be overwritten.
* Note : fetches that may return multiple types must be declared as the lowest
* common denominator , the type that can be casted into all other ones . For
* instance v4 / v6 must be declared v4 .
*/
2013-06-21 23:16:39 +02:00
static struct sample_fetch_kw_list sample_fetch_keywords = { ILH , {
MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-01-07 21:59:07 +01:00
{ " dst " , smp_fetch_dst , 0 , NULL , SMP_T_IPV4 , SMP_USE_L4CLI } ,
{ " dst_port " , smp_fetch_dport , 0 , NULL , SMP_T_UINT , SMP_USE_L4CLI } ,
{ " src " , smp_fetch_src , 0 , NULL , SMP_T_IPV4 , SMP_USE_L4CLI } ,
{ " src_port " , smp_fetch_sport , 0 , NULL , SMP_T_UINT , SMP_USE_L4CLI } ,
{ /* END */ } ,
2010-05-24 20:55:15 +02:00
} } ;
2012-09-12 23:27:21 +02:00
/************************************************************************/
/* All supported bind keywords must be declared here. */
/************************************************************************/
/* Note: must not be declared <const> as its list will be overwritten.
* Please take care of keeping this list alphabetically sorted , doing so helps
* all code contributors .
* Optional keywords are also declared with a NULL - > parse ( ) function so that
* the config parser can report an appropriate error when a known keyword was
* not enabled .
*/
2012-09-18 18:24:39 +02:00
static struct bind_kw_list bind_kws = { " TCP " , { } , {
2012-09-12 23:27:21 +02:00
# ifdef TCP_DEFER_ACCEPT
{ " defer-accept " , bind_parse_defer_accept , 0 } , /* wait for some data for 1 second max before doing accept */
# endif
# ifdef SO_BINDTODEVICE
{ " interface " , bind_parse_interface , 1 } , /* specifically bind to this interface */
# endif
# ifdef TCP_MAXSEG
{ " mss " , bind_parse_mss , 1 } , /* set MSS of listening socket */
# endif
2012-10-05 16:21:00 +02:00
# ifdef TCP_FASTOPEN
{ " tfo " , bind_parse_tfo , 0 } , /* enable TCP_FASTOPEN of listening socket */
# endif
2013-05-08 22:49:23 +02:00
# ifdef CONFIG_HAP_TRANSPARENT
2012-09-12 23:27:21 +02:00
{ " transparent " , bind_parse_transparent , 0 } , /* transparently bind to the specified addresses */
2012-11-24 11:55:28 +01:00
# endif
# ifdef IPV6_V6ONLY
2012-11-24 15:07:23 +01:00
{ " v4v6 " , bind_parse_v4v6 , 0 } , /* force socket to bind to IPv4+IPv6 */
2012-11-24 11:55:28 +01:00
{ " v6only " , bind_parse_v6only , 0 } , /* force socket to bind to IPv6 only */
2012-09-12 23:27:21 +02:00
# endif
/* the versions with the NULL parse function*/
{ " defer-accept " , NULL , 0 } ,
{ " interface " , NULL , 1 } ,
{ " mss " , NULL , 1 } ,
{ " transparent " , NULL , 0 } ,
2012-11-24 15:07:23 +01:00
{ " v4v6 " , NULL , 0 } ,
2012-11-24 11:55:28 +01:00
{ " v6only " , NULL , 0 } ,
2012-09-12 23:27:21 +02:00
{ NULL , NULL , 0 } ,
} } ;
2007-10-29 01:09:36 +01:00
__attribute__ ( ( constructor ) )
static void __tcp_protocol_init ( void )
{
protocol_register ( & proto_tcpv4 ) ;
protocol_register ( & proto_tcpv6 ) ;
2012-04-27 21:37:17 +02:00
sample_register_fetches ( & sample_fetch_keywords ) ;
[MAJOR] implement tcp request content inspection
Some people need to inspect contents of TCP requests before
deciding to forward a connection or not. A future extension
of this demand might consist in selecting a server farm
depending on the protocol detected in the request.
For this reason, a new state CL_STINSPECT has been added on
the client side. It is immediately entered upon accept() if
the statement "tcp-request inspect-delay <xxx>" is found in
the frontend configuration. Haproxy will then wait up to
this amount of time trying to find a matching ACL, and will
either accept or reject the connection depending on the
"tcp-request content <action> {if|unless}" rules, where
<action> is either "accept" or "reject".
Note that it only waits that long if no definitive verdict
can be found earlier. That generally implies calling a fetch()
function which does not have enough information to decode
some contents, or a match() function which only finds the
beginning of what it's looking for.
It is only at the ACL level that partial data may be processed
as such, because we need to distinguish between MISS and FAIL
*before* applying the term negation.
Thus it is enough to add "| ACL_PARTIAL" to the last argument
when calling acl_exec_cond() to indicate that we expect
ACL_PAT_MISS to be returned if some data is missing (for
fetch() or match()). This is the only case we may return
this value. For this reason, the ACL check in process_cli()
has become a lot simpler.
A new ACL "req_len" of type "int" has been added. Right now
it is already possible to drop requests which talk too early
(eg: for SMTP) or which don't talk at all (eg: HTTP/SSL).
Also, the acl fetch() functions have been extended in order
to permit reporting of missing data in case of fetch failure,
using the ACL_TEST_F_MAY_CHANGE flag.
The default behaviour is unchanged, and if no rule matches,
the request is accepted.
As a side effect, all layer 7 fetching functions have been
cleaned up so that they now check for the validity of the
layer 7 pointer before dereferencing it.
2008-07-14 23:54:42 +02:00
cfg_register_keywords ( & cfg_kws ) ;
acl_register_keywords ( & acl_kws ) ;
2012-09-12 23:27:21 +02:00
bind_register_keywords ( & bind_kws ) ;
2007-10-29 01:09:36 +01:00
}
/*
* Local variables :
* c - indent - level : 8
* c - basic - offset : 8
* End :
*/