From 985fc567347a0bb9db742017f087489f3f876079 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Sun, 1 Apr 2007 09:44:10 +0200 Subject: [PATCH] [DOC] added some docs about http headers storage and acls --- doc/design-thoughts/config-language.txt | 145 ++++++++++++++++++++++++ doc/internals/header-tree.txt | 124 ++++++++++++++++++++ doc/internals/http-docs.txt | 5 + 3 files changed, 274 insertions(+) create mode 100644 doc/internals/header-tree.txt create mode 100644 doc/internals/http-docs.txt diff --git a/doc/design-thoughts/config-language.txt b/doc/design-thoughts/config-language.txt index 539c7a5c3..510ada68e 100644 --- a/doc/design-thoughts/config-language.txt +++ b/doc/design-thoughts/config-language.txt @@ -115,3 +115,148 @@ Sinon, peut- req in switch URI =^ "/images/" images:"/" + +2007/03/31 - Besoins plus précis. + +1) aucune extension de branchement ou autre dans les "listen", c'est trop complexe. + +Distinguer les données entrantes (in) et sortantes (out). + +Le frontend ne voit que les requetes entrantes et les réponses sortantes. +Le backend voir les requêtes in/out et les réponses in/out. +Le frontend permet les branchements d'ensembles de filtres de requêtes vers +d'autres. Le frontend et les ensembles de filtres de requêtes peuvent brancher +vers un backend. + +-----------+--------+----------+----------+---------+----------+ + \ Where | | | | | | + \______ | Listen | Frontend | ReqRules | Backend | RspRules | + \| | | | | | +Capability | | | | | | +-----------+--------+----------+----------+---------+----------+ +Frontend | X | X | | | | +-----------+--------+----------+----------+---------+----------+ +FiltReqIn | X | X | X | X | | +-----------+--------+----------+----------+---------+----------+ +JumpFiltReq| X | X | X | | | \ +-----------+--------+----------+----------+---------+----------+ > = ReqJump +SetBackend | X | X | X | | | / +-----------+--------+----------+----------+---------+----------+ +FiltReqOut | | | | X | | +-----------+--------+----------+----------+---------+----------+ +FiltRspIn | X | | | X | X | +-----------+--------+----------+----------+---------+----------+ +JumpFiltRsp| | | | X | X | +-----------+--------+----------+----------+---------+----------+ +FiltRspOut | | X | | X | X | +-----------+--------+----------+----------+---------+----------+ +Backend | X | | | X | | +-----------+--------+----------+----------+---------+----------+ + +En conclusion +------------- + +Il y a au moins besoin de distinguer 8 fonctionnalités de base : + - capacité à recevoir des connexions (frontend) + - capacité à filtrer les requêtes entrantes + - capacité à brancher vers un backend ou un ensemble de règles de requêtes + - capacité à filtrer les requêtes sortantes + - capacité à filtrer les réponses entrantes + - capacité à brancher vers un autre ensemble de règles de réponses + - capacité à filtrer la réponse sortante + - capacité à gérer des serveurs (backend) + +Remarque +-------- + - on a souvent besoin de pouvoir appliquer un petit traitement sur un ensemble + host/uri/autre. Le petit traitement peut consister en quelques filtres ainsi + qu'une réécriture du couple (host,uri). + + +Proposition : ACL + +Syntaxe : +--------- + + acl ... + +Ceci créera une acl référencée sous le nom qui sera validée si +l'application d'au moins une des valeurs avec l'opérateur +sur le sujet est validée. + +Opérateurs : +------------ + +Toujours 2 caractères : + + [=!][~=*^%/.] + +Premier caractère : + '=' : OK si test valide + '!' : OK si test échoué. + +Second caractère : + '~' : compare avec une regex + '=' : compare chaîne à chaîne + '*' : compare la fin de la chaîne (ex: =* ".mydomain.com") + '^' : compare le début de la chaîne (ex: =^ "/images/") + '%' : recherche une sous-chaîne + '/' : compare avec un mot entier en acceptant le '/' comme délimiteur + '.' : compare avec un mot entier en acceptant el '.' comme délimiteur + +Ensuite on exécute une action de manière conditionnelle si l'ensemble des ACLs +mentionnées sont validées (ou invalidées pour celles précédées d'un "!") : + + on [!] ... + + +Exemple : +--------- + + acl www_pub host =. www www01 dev preprod + acl imghost host =. images + acl imgdir uri =/ img + acl imagedir uri =/ images + acl msie h(user-agent) =% "MSIE" + + set_host "images" on www_pub imgdir + remap_uri "/img" "/" on www_pub imgdir + remap_uri "/images" "/" on www_pub imagedir + setbe images on imghost + reqdel "Cookie" on all + + + +Actions possibles : + + req {in|out} {append|delete|rem|add|set|rep|mapuri|rewrite|reqline|deny|allow|setbe|tarpit} + resp {in|out} {append|delete|rem|add|set|rep|maploc|rewrite|stsline|deny|allow} + + req in append + req in delete + req in rem
+ req in add
+ req in set
+ req in rep
+ req in mapuri + req in rewrite + req in reqline + req in deny + req in allow + req in tarpit + req in setbe + + resp out maploc + resp out stsline + +Les chaînes doivent être délimitées par un même caractère au début et à la fin, +qui doit être échappé s'il est présent dans la chaîne. Tout ce qui se trouve +entre le caractère de fin et les premiers espace est considéré comme des +options passées au traitement. Par exemple : + + req in rep host /www/i /www/ + req in rep connection /keep-alive/i "close" + +Il serait pratique de pouvoir effectuer un remap en même temps qu'un setbe. + +Captures: les séparer en in/out. Les rendre conditionnelles ? diff --git a/doc/internals/header-tree.txt b/doc/internals/header-tree.txt new file mode 100644 index 000000000..9a9736129 --- /dev/null +++ b/doc/internals/header-tree.txt @@ -0,0 +1,124 @@ +2007/03/30 - Header storage in trees + +This documentation describes how to store headers in radix trees, providing +fast access to any known position, while retaining the ability to grow/reduce +any arbitrary header without having to recompute all positions. + +Principle : + We have a radix tree represented in an integer array, which represents the + total number of bytes used by all headers whose position is below it. This + ensures that we can compute any header's position in O(log(N)) where N is + the number of headers. + +Example with N=16 : + + +-----------------------+ + | | + +-----------+ +-----------+ + | | | | + +-----+ +-----+ +-----+ +-----+ + | | | | | | | | + +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ + | | | | | | | | | | | | | | | | + + 0 1 2 3 4 5 6 7 8 9 A B C D E F + + To reach header 6, we have to compute hdr[0]+hdr[4]+hdr[6] + + With this method, it becomes easy to grow any header and update the array. + To achieve this, we have to replace one after the other all bits on the + right with one 1 followed by zeroes, and update the position if it's higher + than current position, and stop when it's above number of stored headers. + + For instance, if we want to grow hdr[6], we proceed like this : + + 6 = 0110 (BIN) + + Let's consider the values to update : + + (bit 0) : (0110 & ~0001) | 0001 = 0111 = 7 > 6 => update + (bit 1) : (0110 & ~0011) | 0010 = 0110 = 6 <= 6 => leave it + (bit 2) : (0110 & ~0111) | 0100 = 0100 = 4 <= 6 => leave it + (bit 4) : (0110 & ~1111) | 1000 = 1000 = 8 > 6 => update + (bit 5) : larger than array size, stop. + + +It's easy to walk through the tree too. We only have one iteration per bit +changing from X to the ancestor, and one per bit from the ancestor to Y. +The ancestor is found while walking. To go from X to Y : + + pos = pos(X) + + while (Y != X) { + if (Y > X) { + // walk from Y to ancestor + pos += hdr[Y] + Y &= (Y - 1) + } else { + // walk from X to ancestor + pos -= hdr[X] + X &= (X - 1) + } + } + +However, it is not trivial anymore to linearly walk the tree. We have to move +from a known place to another known place, but a jump to next entry costs the +same as a jump to a random place. + +Other caveats : + - it is not possible to remove a header, it is only possible to empty it. + - it is not possible to insert a header, as that would imply a renumbering. + => this means that a "defrag" function is required. Headers should preferably + be added, then should be stuffed on top of destroyed ones, then only + inserted if absolutely required. + + +When we have this, we can then focus on a 32-bit header descriptor which would +look like this : + +{ + unsigned line_len :13; /* total line length, including CRLF */ + unsigned name_len :6; /* header name length, max 63 chars */ + unsigned sp1 :5; /* max spaces before value : 31 */ + unsigned sp2 :8; /* max spaces after value : 255 */ +} + +Example : + + Connection: close \r\n + <---------+-----+-----+-------------> line_len + <-------->| | | name_len + <-----> | sp1 + <-------------> sp2 +Rem: + - if there are more than 31 spaces before the value, the buffer will have to + be moved before being registered + + - if there are more than 255 spaces after the value, the buffer will have to + be moved before being registered + + - we can use the empty header name as an indicator for a deleted header + + - it would be wise to format a new request before sending lots of random + spaces to the servers. + + - normal clients do not send such crap, so those operations *may* reasonably + be more expensive than the rest provided that other ones are very fast. + +It would be handy to have the following macros : + + hdr_eon(hdr) => end of name + hdr_sov(hdr) => start of value + hdr_eof(hdr) => end of value + hdr_vlen(hdr) => length of value + hdr_hlen(hdr) => total header length + + +A 48-bit encoding would look like this : + + Connection: close \r\n + <---------+------+---+--------------> eoh = 16 bits + <-------->| | | eon = 8 bits + <--------------->| | sov = 8 bits + <---> vlen = 16 bits + diff --git a/doc/internals/http-docs.txt b/doc/internals/http-docs.txt new file mode 100644 index 000000000..4ed24806d --- /dev/null +++ b/doc/internals/http-docs.txt @@ -0,0 +1,5 @@ +Many interesting RFC and drafts linked to from this site : + + http://www.web-cache.com/Writings/protocols-standards.html + +