[DOC] added some docs about http headers storage and acls

2007-04-01 09:44:10 +02:00 · 2007-04-01 09:44:10 +02:00 · 985fc56734
commit 985fc56734
parent 422505801f
3 changed files with 274 additions and 0 deletions
--- a/doc/design-thoughts/config-language.txt
+++ b/doc/design-thoughts/config-language.txt
@ -115,3 +115,148 @@ Sinon, peut-
        req in  switch   URI     =^ "/images/" images:"/"
 2007/03/31 - Besoins plus précis.
 1) aucune extension de branchement ou autre dans les "listen", c'est trop complexe.
 Distinguer les données entrantes (in) et sortantes (out).
 Le frontend ne voit que les requetes entrantes et les réponses sortantes.
 Le backend voir les requêtes in/out et les réponses in/out.
 Le frontend permet les branchements d'ensembles de filtres de requêtes vers
 d'autres. Le frontend et les ensembles de filtres de requêtes peuvent brancher
 vers un backend.
 -----------+--------+----------+----------+---------+----------+
  \  Where |        |          |          |         |          |
   \______ | Listen | Frontend | ReqRules | Backend | RspRules |
          \|        |          |          |         |          |
 Capability |        |          |          |         |          |
 -----------+--------+----------+----------+---------+----------+
 Frontend   |    X   |     X    |          |         |          |
 -----------+--------+----------+----------+---------+----------+
 FiltReqIn  |    X   |     X    |     X    |    X    |          |
 -----------+--------+----------+----------+---------+----------+
 JumpFiltReq|    X   |     X    |     X    |         |          | \
 -----------+--------+----------+----------+---------+----------+  > = ReqJump
 SetBackend |    X   |     X    |     X    |         |          | /
 -----------+--------+----------+----------+---------+----------+
 FiltReqOut |        |          |          |    X    |          |
 -----------+--------+----------+----------+---------+----------+
 FiltRspIn  |    X   |          |          |    X    |     X    |
 -----------+--------+----------+----------+---------+----------+
 JumpFiltRsp|        |          |          |    X    |     X    |
 -----------+--------+----------+----------+---------+----------+
 FiltRspOut |        |     X    |          |    X    |     X    |
 -----------+--------+----------+----------+---------+----------+
 Backend    |    X   |          |          |    X    |          |
 -----------+--------+----------+----------+---------+----------+
 En conclusion
 -------------
 Il y a au moins besoin de distinguer 8 fonctionnalités de base :
 - capacité à recevoir des connexions (frontend)
 - capacité à filtrer les requêtes entrantes
 - capacité à brancher vers un backend ou un ensemble de règles de requêtes
 - capacité à filtrer les requêtes sortantes
 - capacité à filtrer les réponses entrantes
 - capacité à brancher vers un autre ensemble de règles de réponses
 - capacité à filtrer la réponse sortante
 - capacité à gérer des serveurs (backend)
 Remarque
 --------
 - on a souvent besoin de pouvoir appliquer un petit traitement sur un ensemble
   host/uri/autre. Le petit traitement peut consister en quelques filtres ainsi
   qu'une réécriture du couple (host,uri).
 Proposition : ACL
 Syntaxe :
 ---------
   acl <name> <what> <operator> <value> ...
 Ceci créera une acl référencée sous le nom <name> qui sera validée si
 l'application d'au moins une des valeurs <value> avec l'opérateur <operator>
 sur le sujet <what> est validée.
 Opérateurs :
 ------------
 Toujours 2 caractères :
  [=!][~=*^%/.]
 Premier caractère :  
   '=' : OK si test valide
   '!' : OK si test échoué.
 Second caractère :
   '~' : compare avec une regex
   '=' : compare chaîne à chaîne
   '*' : compare la fin de la chaîne (ex: =* ".mydomain.com")
   '^' : compare le début de la chaîne (ex: =^ "/images/")
   '%' : recherche une sous-chaîne
   '/' : compare avec un mot entier en acceptant le '/' comme délimiteur
   '.' : compare avec un mot entier en acceptant el '.' comme délimiteur
 Ensuite on exécute une action de manière conditionnelle si l'ensemble des ACLs
 mentionnées sont validées (ou invalidées pour celles précédées d'un "!") :
   <what> <where> <action> on [!]<aclname> ...
 Exemple :
 ---------
   acl www_pub host =. www www01 dev preprod
   acl imghost host =. images
   acl imgdir   uri =/ img
   acl imagedir uri =/ images
   acl msie h(user-agent) =% "MSIE"
   set_host  "images"       on www_pub imgdir 
   remap_uri "/img"    "/"  on www_pub imgdir
   remap_uri "/images" "/"  on www_pub imagedir
   setbe images             on imghost
   reqdel "Cookie"          on all
 Actions possibles :
   req  {in|out} {append|delete|rem|add|set|rep|mapuri|rewrite|reqline|deny|allow|setbe|tarpit}
   resp {in|out} {append|delete|rem|add|set|rep|maploc|rewrite|stsline|deny|allow}
   req in append <line>
   req in delete <line_regex>
   req in rem <header>
   req in add <header> <new_value>
   req in set <header> <new_value>
   req in rep <header> <old_value> <new_value>
   req in mapuri  <old_uri_prefix> <new_uri_prefix>
   req in rewrite <old_uri_regex>  <new_uri>
   req in reqline <old_req_regex>  <new_req>
   req in deny
   req in allow
   req in tarpit
   req in setbe <backend>
   resp out maploc <old_location_prefix> <new_loc_prefix>
   resp out stsline <old_sts_regex> <new_sts_regex>
 Les chaînes doivent être délimitées par un même caractère au début et à la fin,
 qui doit être échappé s'il est présent dans la chaîne. Tout ce qui se trouve
 entre le caractère de fin et les premiers espace est considéré comme des
 options passées au traitement. Par exemple :
   req in rep host /www/i /www/
   req in rep connection /keep-alive/i "close"
 Il serait pratique de pouvoir effectuer un remap en même temps qu'un setbe.
 Captures: les séparer en in/out. Les rendre conditionnelles ?
--- a/doc/internals/header-tree.txt
+++ b/doc/internals/header-tree.txt
@ -0,0 +1,124 @@
 2007/03/30 - Header storage in trees
 This documentation describes how to store headers in radix trees, providing
 fast access to any known position, while retaining the ability to grow/reduce
 any arbitrary header without having to recompute all positions.
 Principle :
  We have a radix tree represented in an integer array, which represents the
  total number of bytes used by all headers whose position is below it. This
  ensures that we can compute any header's position in O(log(N)) where N is
  the number of headers.
 Example with N=16 :
   +-----------------------+
   |                       |
   +-----------+           +-----------+
   |           |           |           |
   +-----+     +-----+     +-----+     +-----+
   |     |     |     |     |     |     |     |
   +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+
   |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
   To reach header 6, we have to compute hdr[0]+hdr[4]+hdr[6]
   With this method, it becomes easy to grow any header and update the array.
   To achieve this, we have to replace one after the other all bits on the
   right with one 1 followed by zeroes, and update the position if it's higher
   than current position, and stop when it's above number of stored headers.
   For instance, if we want to grow hdr[6], we proceed like this :
   6 = 0110 (BIN)
   Let's consider the values to update :
   (bit 0) : (0110 & ~0001) | 0001 = 0111 = 7 >  6 => update
   (bit 1) : (0110 & ~0011) | 0010 = 0110 = 6 <= 6 => leave it
   (bit 2) : (0110 & ~0111) | 0100 = 0100 = 4 <= 6 => leave it
   (bit 4) : (0110 & ~1111) | 1000 = 1000 = 8 >  6 => update
   (bit 5) : larger than array size, stop.
 It's easy to walk through the tree too. We only have one iteration per bit
 changing from X to the ancestor, and one per bit from the ancestor to Y.
 The ancestor is found while walking. To go from X to Y :
   pos = pos(X)
   while (Y != X) {
     if (Y > X) {
       // walk from Y to ancestor
       pos += hdr[Y]
       Y &= (Y - 1)
     } else {
       // walk from X to ancestor
       pos -= hdr[X]
       X &= (X - 1)
     }
   }
 However, it is not trivial anymore to linearly walk the tree. We have to move
 from a known place to another known place, but a jump to next entry costs the
 same as a jump to a random place.
 Other caveats :
  - it is not possible to remove a header, it is only possible to empty it.
  - it is not possible to insert a header, as that would imply a renumbering.
  => this means that a "defrag" function is required. Headers should preferably
     be added, then should be stuffed on top of destroyed ones, then only
     inserted if absolutely required.
 When we have this, we can then focus on a 32-bit header descriptor which would
 look like this :
 {
  unsigned line_len :13; /* total line length, including CRLF */
  unsigned name_len  :6; /* header name length, max 63 chars */
  unsigned sp1       :5; /* max spaces before value : 31 */
  unsigned sp2       :8; /* max spaces after value : 255 */
 }
 Example :
  Connection:      close           \r\n
  <---------+-----+-----+-------------> line_len
  <-------->|     |     |               name_len
            <----->     |               sp1
                        <-------------> sp2
 Rem:
  - if there are more than 31 spaces before the value, the buffer will have to
    be moved before being registered
  - if there are more than 255  spaces after the value, the buffer will have to
    be moved before being registered
  - we can use the empty header name as an indicator for a deleted header
  - it would be wise to format a new request before sending lots of random
    spaces to the servers.
  - normal clients do not send such crap, so those operations *may* reasonably
    be more expensive than the rest provided that other ones are very fast.
 It would be handy to have the following macros :
  hdr_eon(hdr)  => end of name
  hdr_sov(hdr)  => start of value
  hdr_eof(hdr)  => end of value
  hdr_vlen(hdr) => length of value
  hdr_hlen(hdr) => total header length
 A 48-bit encoding would look like this :
  Connection:      close           \r\n
  <---------+------+---+--------------> eoh = 16 bits
  <-------->|      |   |                eon = 8 bits
  <--------------->|   |                sov = 8 bits
                   <--->                vlen = 16 bits
--- a/doc/internals/http-docs.txt
+++ b/doc/internals/http-docs.txt
@ -0,0 +1,5 @@
 Many interesting RFC and drafts linked to from this site :
  http://www.web-cache.com/Writings/protocols-standards.html
		`@ -0,0 +1,5 @@`
							`Many interesting RFC and drafts linked to from this site :`

							`http://www.web-cache.com/Writings/protocols-standards.html`