From 985fc567347a0bb9db742017f087489f3f876079 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Sun, 1 Apr 2007 09:44:10 +0200
Subject: [PATCH] [DOC] added some docs about http headers storage and acls

---
 doc/design-thoughts/config-language.txt | 145 ++++++++++++++++++++++++
 doc/internals/header-tree.txt           | 124 ++++++++++++++++++++
 doc/internals/http-docs.txt             |   5 +
 3 files changed, 274 insertions(+)
 create mode 100644 doc/internals/header-tree.txt
 create mode 100644 doc/internals/http-docs.txt
diff --git a/doc/design-thoughts/config-language.txt b/doc/design-thoughts/config-language.txt
index 539c7a5c3..510ada68e 100644
--- a/doc/design-thoughts/config-language.txt
+++ b/doc/design-thoughts/config-language.txt
@@ -115,3 +115,148 @@ Sinon, peut-
 
         req in  switch   URI     =^ "/images/" images:"/"
 
+
+2007/03/31 - Besoins plus précis.
+
+1) aucune extension de branchement ou autre dans les "listen", c'est trop complexe.
+
+Distinguer les données entrantes (in) et sortantes (out).
+
+Le frontend ne voit que les requetes entrantes et les réponses sortantes.
+Le backend voir les requêtes in/out et les réponses in/out.
+Le frontend permet les branchements d'ensembles de filtres de requêtes vers
+d'autres. Le frontend et les ensembles de filtres de requêtes peuvent brancher
+vers un backend.
+
+-----------+--------+----------+----------+---------+----------+
+  \  Where |        |          |          |         |          |
+   \______ | Listen | Frontend | ReqRules | Backend | RspRules |
+          \|        |          |          |         |          |
+Capability |        |          |          |         |          |
+-----------+--------+----------+----------+---------+----------+
+Frontend   |    X   |     X    |          |         |          |
+-----------+--------+----------+----------+---------+----------+
+FiltReqIn  |    X   |     X    |     X    |    X    |          |
+-----------+--------+----------+----------+---------+----------+
+JumpFiltReq|    X   |     X    |     X    |         |          | \
+-----------+--------+----------+----------+---------+----------+  > = ReqJump
+SetBackend |    X   |     X    |     X    |         |          | /
+-----------+--------+----------+----------+---------+----------+
+FiltReqOut |        |          |          |    X    |          |
+-----------+--------+----------+----------+---------+----------+
+FiltRspIn  |    X   |          |          |    X    |     X    |
+-----------+--------+----------+----------+---------+----------+
+JumpFiltRsp|        |          |          |    X    |     X    |
+-----------+--------+----------+----------+---------+----------+
+FiltRspOut |        |     X    |          |    X    |     X    |
+-----------+--------+----------+----------+---------+----------+
+Backend    |    X   |          |          |    X    |          |
+-----------+--------+----------+----------+---------+----------+
+
+En conclusion
+-------------
+
+Il y a au moins besoin de distinguer 8 fonctionnalités de base :
+ - capacité à recevoir des connexions (frontend)
+ - capacité à filtrer les requêtes entrantes
+ - capacité à brancher vers un backend ou un ensemble de règles de requêtes
+ - capacité à filtrer les requêtes sortantes
+ - capacité à filtrer les réponses entrantes
+ - capacité à brancher vers un autre ensemble de règles de réponses
+ - capacité à filtrer la réponse sortante
+ - capacité à gérer des serveurs (backend)
+
+Remarque
+--------
+ - on a souvent besoin de pouvoir appliquer un petit traitement sur un ensemble
+   host/uri/autre. Le petit traitement peut consister en quelques filtres ainsi
+   qu'une réécriture du couple (host,uri).
+
+
+Proposition : ACL
+
+Syntaxe :
+---------
+
+   acl <name> <what> <operator> <value> ...
+
+Ceci créera une acl référencée sous le nom <name> qui sera validée si
+l'application d'au moins une des valeurs <value> avec l'opérateur <operator>
+sur le sujet <what> est validée.
+
+Opérateurs :
+------------
+
+Toujours 2 caractères :
+
+  [=!][~=*^%/.]
+
+Premier caractère :  
+   '=' : OK si test valide
+   '!' : OK si test échoué.
+
+Second caractère :
+   '~' : compare avec une regex
+   '=' : compare chaîne à chaîne
+   '*' : compare la fin de la chaîne (ex: =* ".mydomain.com")
+   '^' : compare le début de la chaîne (ex: =^ "/images/")
+   '%' : recherche une sous-chaîne
+   '/' : compare avec un mot entier en acceptant le '/' comme délimiteur
+   '.' : compare avec un mot entier en acceptant el '.' comme délimiteur
+
+Ensuite on exécute une action de manière conditionnelle si l'ensemble des ACLs
+mentionnées sont validées (ou invalidées pour celles précédées d'un "!") :
+
+   <what> <where> <action> on [!]<aclname> ...
+
+
+Exemple :
+---------
+
+   acl www_pub host =. www www01 dev preprod
+   acl imghost host =. images
+   acl imgdir   uri =/ img
+   acl imagedir uri =/ images
+   acl msie h(user-agent) =% "MSIE"
+
+   set_host  "images"       on www_pub imgdir 
+   remap_uri "/img"    "/"  on www_pub imgdir
+   remap_uri "/images" "/"  on www_pub imagedir
+   setbe images             on imghost
+   reqdel "Cookie"          on all
+
+
+
+Actions possibles :
+
+   req  {in|out} {append|delete|rem|add|set|rep|mapuri|rewrite|reqline|deny|allow|setbe|tarpit}
+   resp {in|out} {append|delete|rem|add|set|rep|maploc|rewrite|stsline|deny|allow}
+
+   req in append <line>
+   req in delete <line_regex>
+   req in rem <header>
+   req in add <header> <new_value>
+   req in set <header> <new_value>
+   req in rep <header> <old_value> <new_value>
+   req in mapuri  <old_uri_prefix> <new_uri_prefix>
+   req in rewrite <old_uri_regex>  <new_uri>
+   req in reqline <old_req_regex>  <new_req>
+   req in deny
+   req in allow
+   req in tarpit
+   req in setbe <backend>
+
+   resp out maploc <old_location_prefix> <new_loc_prefix>
+   resp out stsline <old_sts_regex> <new_sts_regex>
+
+Les chaînes doivent être délimitées par un même caractère au début et à la fin,
+qui doit être échappé s'il est présent dans la chaîne. Tout ce qui se trouve
+entre le caractère de fin et les premiers espace est considéré comme des
+options passées au traitement. Par exemple :
+
+   req in rep host /www/i /www/
+   req in rep connection /keep-alive/i "close"
+
+Il serait pratique de pouvoir effectuer un remap en même temps qu'un setbe.
+
+Captures: les séparer en in/out. Les rendre conditionnelles ?
diff --git a/doc/internals/header-tree.txt b/doc/internals/header-tree.txt
new file mode 100644
index 000000000..9a9736129
--- /dev/null
+++ b/doc/internals/header-tree.txt
@@ -0,0 +1,124 @@
+2007/03/30 - Header storage in trees
+
+This documentation describes how to store headers in radix trees, providing
+fast access to any known position, while retaining the ability to grow/reduce
+any arbitrary header without having to recompute all positions.
+
+Principle :
+  We have a radix tree represented in an integer array, which represents the
+  total number of bytes used by all headers whose position is below it. This
+  ensures that we can compute any header's position in O(log(N)) where N is
+  the number of headers.
+
+Example with N=16 :
+
+   +-----------------------+
+   |                       |
+   +-----------+           +-----------+
+   |           |           |           |
+   +-----+     +-----+     +-----+     +-----+
+   |     |     |     |     |     |     |     |
+   +--+  +--+  +--+  +--+  +--+  +--+  +--+  +--+
+   |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
+
+   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
+
+   To reach header 6, we have to compute hdr[0]+hdr[4]+hdr[6]
+
+   With this method, it becomes easy to grow any header and update the array.
+   To achieve this, we have to replace one after the other all bits on the
+   right with one 1 followed by zeroes, and update the position if it's higher
+   than current position, and stop when it's above number of stored headers.
+
+   For instance, if we want to grow hdr[6], we proceed like this :
+
+   6 = 0110 (BIN)
+
+   Let's consider the values to update :
+
+   (bit 0) : (0110 & ~0001) | 0001 = 0111 = 7 >  6 => update
+   (bit 1) : (0110 & ~0011) | 0010 = 0110 = 6 <= 6 => leave it
+   (bit 2) : (0110 & ~0111) | 0100 = 0100 = 4 <= 6 => leave it
+   (bit 4) : (0110 & ~1111) | 1000 = 1000 = 8 >  6 => update
+   (bit 5) : larger than array size, stop.
+
+
+It's easy to walk through the tree too. We only have one iteration per bit
+changing from X to the ancestor, and one per bit from the ancestor to Y.
+The ancestor is found while walking. To go from X to Y :
+
+   pos = pos(X)
+
+   while (Y != X) {
+     if (Y > X) {
+       // walk from Y to ancestor
+       pos += hdr[Y]
+       Y &= (Y - 1)
+     } else {
+       // walk from X to ancestor
+       pos -= hdr[X]
+       X &= (X - 1)
+     }
+   }
+
+However, it is not trivial anymore to linearly walk the tree. We have to move
+from a known place to another known place, but a jump to next entry costs the
+same as a jump to a random place.
+
+Other caveats :
+  - it is not possible to remove a header, it is only possible to empty it.
+  - it is not possible to insert a header, as that would imply a renumbering.
+  => this means that a "defrag" function is required. Headers should preferably
+     be added, then should be stuffed on top of destroyed ones, then only
+     inserted if absolutely required.
+
+
+When we have this, we can then focus on a 32-bit header descriptor which would
+look like this :
+
+{
+  unsigned line_len :13; /* total line length, including CRLF */
+  unsigned name_len  :6; /* header name length, max 63 chars */
+  unsigned sp1       :5; /* max spaces before value : 31 */
+  unsigned sp2       :8; /* max spaces after value : 255 */
+}
+
+Example :
+
+  Connection:      close           \r\n
+  <---------+-----+-----+-------------> line_len
+  <-------->|     |     |               name_len
+            <----->     |               sp1
+                        <-------------> sp2
+Rem:
+  - if there are more than 31 spaces before the value, the buffer will have to
+    be moved before being registered
+
+  - if there are more than 255  spaces after the value, the buffer will have to
+    be moved before being registered
+
+  - we can use the empty header name as an indicator for a deleted header
+
+  - it would be wise to format a new request before sending lots of random
+    spaces to the servers.
+
+  - normal clients do not send such crap, so those operations *may* reasonably
+    be more expensive than the rest provided that other ones are very fast.
+
+It would be handy to have the following macros :
+
+  hdr_eon(hdr)  => end of name
+  hdr_sov(hdr)  => start of value
+  hdr_eof(hdr)  => end of value
+  hdr_vlen(hdr) => length of value
+  hdr_hlen(hdr) => total header length
+
+
+A 48-bit encoding would look like this :
+
+  Connection:      close           \r\n
+  <---------+------+---+--------------> eoh = 16 bits
+  <-------->|      |   |                eon = 8 bits
+  <--------------->|   |                sov = 8 bits
+                   <--->                vlen = 16 bits
+
diff --git a/doc/internals/http-docs.txt b/doc/internals/http-docs.txt
new file mode 100644
index 000000000..4ed24806d
--- /dev/null
+++ b/doc/internals/http-docs.txt
@@ -0,0 +1,5 @@
+Many interesting RFC and drafts linked to from this site :
+
+  http://www.web-cache.com/Writings/protocols-standards.html
+
+