DOC: Documentation for hashing function, with test results.

Summary:
Added a document for hashing under internal docs explaining
hashing in haproxy along with the results of tests under the test
folder.

These documents together explain the motivation for adding
options for hashing algorithms with the option of enabling or
disabling of avalanche.
This commit is contained in:
Bhaskar Maddala 2013-11-20 12:55:21 -05:00 committed by Willy Tarreau
parent 125d099662
commit 10e26de4ea
2 changed files with 397 additions and 0 deletions

83
doc/internals/hashing.txt Normal file
View File

@ -0,0 +1,83 @@
2013/11/20 - How hashing works internally in haproxy - maddalab@gmail.com
This document describes how Haproxy implements hashing both map-based and
consistent hashing, both prior to versions 1.5 and the motivation and tests
that were done when providing additional options starting in version 1.5.
A note on hashing in general, hash functions strive to have little
correlation between input and output. The heart of a hash function is its
mixing step. The behavior of the mixing step largely determines whether the
hash function is collision-resistant. Hash functions that are collision
resistant are more likely to have an even distribution of load.
The purpose of the mixing function is to spread the effect of each message
bit throughout all the bits of the internal state. Ideally every bit in the
hash state is affected by every bit in the message. And we want to do that
as quickly as possible simply for the sake of program performance. A
function is said to satisfy the strict avalanche criterion if, whenever a
single input bit is complemented (toggled between 0 and 1), each of the
output bits should change with a probability of one half for an arbitrary
selection of the remaining input bits.
To guard against a combination of hash function and input that results in
high rate of collisions, haproxy implements an avalanche algorithm on the
result of the hashing function. In all versions 1.4 and prior avalanche is
always applied when using the consistent hashing directive. It is intended
to provide quite a good distribution for little input variations. The result
is quite suited to fit over a 32-bit space with enough variations so that
a randomly picked number falls equally before any server position, which is
ideal for consistently hashed backends, a common use case for caches.
In all versions 1.4 and prior Haproxy implements the SDBM hashing function.
However tests show that alternatives to SDBM have a better cache
distribution on different hashing criteria. Additional tests involving
alternatives for hash input and an option to trigger avalanche, we found
different algorithms perform better on different criteria. DJB2 performs
well when hashing ascii text and is a good choice when hashing on host
header. Other alternatives perform better on numbers and are a good choice
when using source ip. The results also vary by use of the avalanche flag.
The results of the testing can be found under the tests folder. Here is
a summary of the discussion on the results on 1 input criteria and the
methodology used to generate the results.
A note of the setup when validating the results independently, one
would want to avoid backend server counts that may skew the results. As
an example with DJB2 avoid 33 servers. Please see the implementations of
the hashing function, which can be found in the links under references.
The following was the set up used
(a) hash-type consistent/map-based
(b) avalanche on/off
(c) balanche host(hdr)
(d) 3 criteria for inputs
- ~ 10K requests, including duplicates
- ~ 46K requests, unique requests from 1 MM requests were obtained
- ~ 250K requests, including duplicates
(e) 17 servers in backend, all servers were assigned the same weight
Result of the hashing were obtained across the server via monitoring log
files for haproxy. Population Standard deviation was used to evaluate the
efficacy of the hashing algorithm. Lower standard deviation, indicates
a better distribution of load across the backends.
On 10K requests, when using consistent hashing with avalanche on host
headers, DJB2 significantly out performs SDBM. Std dev on SDBM was 48.95
and DJB2 was 26.29. This relationship is inverted with avalanche disabled,
however DJB2 with avalanche enabled out performs SDBM with avalanche
disabled.
On map-based hashing SDBM out performs DJB2 irrespective of the avalanche
option. SDBM without avalanche is marginally better than with avalanche.
DJB2 performs significantly worse with avalanche enabled.
Summary: The results of the testing indicate that there isn't a hashing
algorithm that can be applied across all input criteria. It is necessary
to support alternatives to SDBM, which is generally the best option, with
algorithms that are better for different inputs. Avalanche is not always
applicable and may result in less smooth distribution.
References:
Mixing Functions/Avalanche: http://home.comcast.net/~bretm/hash/3.html
Hash Functions: http://www.cse.yorku.ca/~oz/hash.html

314
tests/hashing-results.txt Normal file
View File

@ -0,0 +1,314 @@
These are the result of tests conducted to determine efficacy of hashing
algorithms and avalache application in haproxy. All results below were
generated using version 1.5. See the document on hashing under internal docs
for a detailed description on the tests the methodology and interpretation
of the results.
The following was the set up used
(a) hash-type consistent/map-bases
(b) avalanche on/off
(c) balanche host(hdr)
(d) 3 criteria for inputs
- ~ 10K requests, including duplicates
- ~ 46K requests, unique requests from 1 MM requests were obtained
- ~ 250K requests, including duplicates
(e) 17 servers in backend, all servers were assigned the same weight
The results can be interpreted across 3 dimensions corresponding to input
criteria (a)/(b) and (d) above
== 10 K requests ==
=== Consistent with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 576 637 592
2 552 608 600
3 539 559 551
4 578 586 493
5 534 555 549
6 614 607 576
7 519 556 554
8 591 565 607
9 529 604 575
10 642 550 678
11 537 591 506
12 568 571 567
13 589 606 572
14 648 568 711
15 645 557 603
16 583 627 591
17 699 596 618
-----------------------------
Std Dev 48.95 26.29 51.75
=== Consistent without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 612 627 579
2 631 607 563
3 585 605 604
4 594 502 518
5 583 526 602
6 589 594 555
7 591 602 511
8 518 540 623
9 550 519 523
10 600 637 647
11 568 536 550
12 552 605 645
13 547 556 564
14 615 674 635
15 642 624 618
16 575 585 609
17 591 604 597
-----------------------------
Std Dev 30.71 45.97 42.52
=== Map based without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 602 560 598
2 576 583 583
3 579 624 593
4 608 587 551
5 579 549 588
6 582 560 590
7 553 616 562
8 568 600 551
9 594 607 620
10 574 611 635
11 578 607 603
12 563 581 547
13 604 531 572
14 621 606 618
15 600 561 602
16 555 570 585
17 607 590 545
-----------------------------
Std Dev 19.24 25.56 26.29
=== Map based with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
Servers SDBM DJB2 WT6
1 548 641 597
2 612 563 655
3 596 536 595
4 609 574 537
5 586 610 570
6 600 568 562
7 589 573 578
8 584 549 573
9 561 636 603
10 607 553 603
11 554 602 616
12 560 577 568
13 597 534 570
14 597 647 570
15 563 581 647
16 575 647 565
17 605 552 534
-----------------------------
Std Dev 20.23 37.47 32.16
== Uniques in 1 MM ==
=== Consistent with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 2891 2963 2947
2 2802 2849 2771
3 2824 2854 2904
4 2704 2740 2763
5 2664 2699 2646
6 2902 2876 2935
7 2829 2745 2730
8 2648 2768 2800
9 2710 2741 2689
10 3070 3111 3106
11 2733 2638 2589
12 2828 2866 2885
13 2876 2961 2870
14 3090 2997 3044
15 2871 2879 2827
16 2881 2727 2921
17 2936 2845 2832
-----------------------------
Std Dev 121.66 118.59 131.61
=== Consistent without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 2879 2930 2863
2 2835 2856 2853
3 2875 2741 2899
4 2720 2718 2761
5 2703 2754 2689
6 2848 2901 2925
7 2829 2756 2838
8 2761 2779 2805
9 2719 2671 2746
10 3015 3176 3079
11 2620 2661 2656
12 2879 2773 2713
13 2829 2844 2925
14 3064 2951 3041
15 2898 2928 2877
16 2880 2867 2791
17 2905 2953 2798
-----------------------------
Std Dev 107.65 125.2 111.34
=== Map based without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 2863 2837 2923
2 2966 2829 2847
3 2865 2803 2808
4 2682 2816 2787
5 2847 2782 2815
6 2910 2862 2862
7 2821 2784 2793
8 2837 2834 2796
9 2857 2891 2859
10 2829 2906 2873
11 2742 2851 2841
12 2790 2837 2870
13 2765 2902 2794
14 2870 2732 2900
15 2898 2891 2759
16 2877 2860 2863
17 2840 2842 2869
-----------------------------
Std Dev 64.65 45.16 43.38
=== Map based with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 2816 2859 2895
2 2841 2739 2789
3 2846 2903 2888
4 2817 2878 2812
5 2750 2794 2852
6 2816 2917 2847
7 2792 2782 2786
8 2800 2814 2868
9 2854 2883 2842
10 2770 2854 2855
11 2851 2854 2837
12 2910 2846 2776
13 2904 2792 2882
14 2984 2767 2854
15 2766 2863 2823
16 2902 2797 2907
17 2840 2917 2746
-----------------------------
Std Dev 58.39 52.16 43.72
== 250K requests ==
=== Consistent with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 14182 12996 20924
2 14881 18376 8901
3 13537 17935 13639
4 11031 12582 19758
5 15429 10084 12112
6 18712 12574 14052
7 14271 11257 14538
8 12048 18582 16653
9 10570 10283 13949
10 11683 13081 23530
11 9288 14828 10818
12 10775 13607 19844
13 10036 19138 15413
14 31903 15222 11824
15 21276 11963 10405
16 17233 23116 11316
17 11437 12668 10616
-----------------------------
Std Dev 5355.95 3512.39 4096.65
=== Consistent without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 12411 17833 11831
2 14213 11165 14833
3 11431 10241 11671
4 14080 13913 20224
5 10886 12101 14272
6 15168 12470 14641
7 18802 12211 10164
8 18678 11852 12421
9 17468 10865 17655
10 19801 28493 13221
11 10885 20201 13507
12 20419 11660 14078
13 12591 18616 13906
14 12798 18200 24152
15 13338 10532 14111
16 11715 10478 14759
17 13608 17461 12846
-----------------------------
Std Dev 3113.33 4749.97 3256.04
=== Map based without avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 14660 12483 11472
2 11118 11552 12146
3 15407 19952 11032
4 15444 12218 14572
5 22091 11434 13738
6 18273 17587 21337
7 10527 16784 15118
8 13013 12010 17195
9 15754 9886 14611
10 13758 11613 14844
11 19564 16453 17403
12 9692 17246 14469
13 13905 11885 20024
14 19401 15350 10611
15 11889 25485 11172
16 13846 13928 12109
17 9950 12426 16439
-----------------------------
Std Dev 3481.45 3847.74 3031.93
=== Map based with avalanche ===
Servers SDBM DJB2 WT6
-----------------------------
1 15546 11454 12871
2 15388 11464 17587
3 11767 15527 14785
4 15843 13214 11420
5 11129 12192 15083
6 15647 17875 11051
7 18723 13629 23006
8 10938 11295 11223
9 12653 17202 23347
10 10108 12867 14178
11 12116 11190 20523
12 14982 12341 11881
13 13221 13929 11828
14 17642 19621 15320
15 12410 26171 11721
16 25075 14764 13544
17 15104 13557 8924
-----------------------------
Std Dev 3521.83 3742.21 4101.2