samba-mirror

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

/*

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								uint32_t ntdb_jenkins_hash(const void *key, size_t length, uint32_t seed,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

											  void *unused)
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return hash_stable((const unsigned char *)key, length, seed);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								uint32_t ntdb_hash(struct ntdb_context *ntdb, const void *ptr, size_t len)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

												 const struct ntdb_used_record *rec,
							

2012-06-19 07:13:09 +04:00

												 const NTDB_DATA *key,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

2012-06-19 07:13:09 +04:00

									rkey = ntdb_access_read(ntdb, off + sizeof(*rec),
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									if (NTDB_PTR_IS_ERR(rkey)) {
							

2012-06-19 07:13:09 +04:00

									if (memcmp(rkey, key->dptr, key->dsize) == 0) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									ntdb_access_release(ntdb, rkey);
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											   uint32_t hash,
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

											   const char **rptr)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (bits_from(hash, ntdb->hash_bits, NTDB_OFF_UPPER_STEAL)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

										ntdb->stats.compare_wrong_offsetbits++;
							

2012-06-19 07:13:09 +04:00

									return key_matches(ntdb, rec, off, key, rptr);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								static bool is_chain(ntdb_off_t val)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return val & (1ULL << NTDB_OFF_CHAIN_BIT);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								static ntdb_off_t hbucket_off(ntdb_off_t base, ntdb_len_t idx)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return base + sizeof(struct ntdb_used_record)
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								 * Otherwise, hinfo is filled in.
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

								 * If not found, the return value is 0.
							

2012-06-19 07:13:09 +04:00

											 NTDB_DATA key,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t off, val;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									enum NTDB_ERROR ecode;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									struct ntdb_used_record chdr;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									h->table = NTDB_HASH_OFFSET;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									if (ecode != NTDB_SUCCESS) {
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									off = hbucket_off(h->table, h->bucket);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Directly in hash table? */

2012-06-19 07:13:09 +04:00

									if (!likely(is_chain(val))) {
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										if (val) {
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

											berr = match(ntdb, h->h, &key, val, rec, rptr);
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											if (berr < 0) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											if (berr) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* If you want to insert here, make a chain. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										return 0;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Nope? Iterate through chain. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ecode = ntdb_read_convert(ntdb, h->table, &chdr, sizeof(chdr));
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									h->table_size = rec_data_length(&chdr) / sizeof(ntdb_off_t);
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

									arr = ntdb_access_read(ntdb, hbucket_off(h->table, 0),
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									found_empty = false;
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

										if (arr[i] == 0) {
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											if (!found_empty) {
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

											berr = match(ntdb, h->h, &key, arr[i], rec, rptr);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

											if (berr < 0) {
							

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

												ntdb_access_release(ntdb, arr);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

												goto fail;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We found it! */

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

												off = arr[i] & NTDB_OFF_MASK;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

												return off;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

}

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: take advantage of direct access across expand. This means we no longer have to unmap if we want to compare a record. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-22 04:14:41 +04:00

									ntdb_access_release(ntdb, arr);
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return 0;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_unlock_hash(ntdb, h->bucket, ltype);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									return NTDB_ERR_TO_OFF(ecode);
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								static ntdb_off_t encode_offset(const struct ntdb_context *ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t extra;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									assert((new_off & (1ULL << NTDB_OFF_CHAIN_BIT)) == 0);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return new_off | extra;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Simply overwrite the hash entry we found before. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return ntdb_write_off(ntdb, hbucket_off(h->table, h->bucket),
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								enum NTDB_ERROR delete_from_hash(struct ntdb_context *ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return ntdb_write_off(ntdb, hbucket_off(h->table, h->bucket), 0);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								enum NTDB_ERROR add_to_hash(struct ntdb_context *ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t chain;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We hit an empty bucket during search? That's where it goes. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Full at top-level? Create a 2-element chain. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* One element is old value, the other is the new value. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										chain = alloc(ntdb, 0, sizeof(pair), NTDB_CHAIN_MAGIC, true);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										ecode = ntdb_write_convert(ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Full bucket. Expand. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (rec_extra_padding(&chdr) >= sizeof(new_off)) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										ecode = set_header(ntdb, &chdr, NTDB_CHAIN_MAGIC, 0,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										if (ecode != NTDB_SUCCESS) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We need to reallocate the chain. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Map both and copy across old buckets. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									new = ntdb_access_write(ntdb, hbucket_off(chain, 0),
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									memcpy(new, old, h->bucket * sizeof(ntdb_off_t));
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ecode = ntdb_access_commit(ntdb, new);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Traverse support: returns offset of record, or 0 or -ve error. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t i;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* First load up chain header. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (rec_magic(&chdr) != NTDB_CHAIN_MAGIC) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Chain length is implied by data length. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									i = ntdb_find_nonzero_off(ntdb, hbucket_off(h->table, 0), h->bucket,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (i != h->table_size) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Go back up to hash table. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Keeps hash locked unless returns 0 or error. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t val, i;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									enum NTDB_ERROR ecode;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (h->table != NTDB_HASH_OFFSET) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We dropped lock, bucket might have moved! */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We don't remove chains: there should still be one there! */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Find next bucket in the chain. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										if (val != 0) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* OK, we've reset h back to top level. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* We do this unlocked, then re-check. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										val = ntdb_read_off(ntdb, hbucket_off(h->table, i));
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

										if (NTDB_OFF_IS_ERR(val)) {
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											goto unlock;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Lost race, and it's empty? */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

continue;

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

}

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										if (!is_chain(val)) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* Start at beginning of chain */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										val = iterate_chain(ntdb, val, h);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

										if (val != 0) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return 0;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

unlock:

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											     struct hash_info *h,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_off_t off;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									enum NTDB_ERROR ecode;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									off = lock_and_iterate_hash(ntdb, h);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (NTDB_OFF_IS_ERR(off)) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* The hash for this key is still locked. */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									kbuf->dsize = rec_key_length(&rec);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

/* They want data as well? */

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									if (NTDB_PTR_IS_ERR(kbuf->dptr)) {
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

unlock:

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

									return ecode;
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

											     struct hash_info *h,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

											     NTDB_DATA *kbuf, size_t *dlen)
							

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									h->table = NTDB_HASH_OFFSET;
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								static enum NTDB_ERROR chainlock(struct ntdb_context *ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									uint32_t h = ntdb_hash(ntdb, key->dptr, key->dsize);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return ntdb_lock_hash(ntdb, bits_from(h, 0, ntdb->hash_bits), ltype);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return chainlock(ntdb, &key, F_WRLCK);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									uint32_t h = ntdb_hash(ntdb, key.dptr, key.dsize);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_unlock_hash(ntdb, bits_from(h, 0, ntdb->hash_bits), F_WRLCK);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

								_PUBLIC_ enum NTDB_ERROR ntdb_chainlock_read(struct ntdb_context *ntdb,
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

{

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									return chainlock(ntdb, &key, F_RDLCK);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									uint32_t h = ntdb_hash(ntdb, key.dptr, key.dsize);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

ntdb: remove hash table trees. TDB2 started with a top-level hash of 1024 entries, divided into 128 groups of 8 buckets. When a bucket filled, the 8 bucket group expanded into pointers into 8 new 64-entry hash tables. When these filled, they expanded in turn, etc. It's a nice idea to automatically expand the hash tables, but it doesn't pay off. Remove it for NTDB. 1) It only beats TDB performance when the database is huge and the TDB hashsize is small. We are about 20% slower on medium-size databases (1000 to 10000 records), worse on really small ones. 2) Since we're 64 bits, our hash tables are already twice as expensive as TDB. 3) Since our hash function is good, it means that all groups tend to fill at the same time, meaning the hash enlarges by a factor of 128 all at once, leading to a very large database at that point. 4) Our efficiency would improve if we enlarged the top level, but that makes our minimum db size even worse: it's already over 8k, and jumps to 1M after about 1000 entries! 5) Making the sub group size larger gives a shallower tree, which performs better, but makes the "hash explosion" problem worse. 6) The code is complicated, having to handle delete and reshuffling groups of hash buckets, and expansion of buckets. 7) We have to handle the case where all the records somehow end up with the same hash value, which requires special code to chain records for that case. On the other hand, it would be nice if we didn't degrade as badly as TDB does when the hash chains get long. This patch removes the hash-growing code, but instead of chaining like TDB does when a bucket fills, we point the bucket to an array of record pointers. Since each on-disk NTDB pointer contains some hash bits from the record (we steal the upper 8 bits of the offset), 99.5% of the time we don't need to load the record to determine if it matches. This makes an array of offsets much more cache-friendly than a linked list. Here are the times (in ns) for tdb_store of N records, tdb_store of N records the second time, and a fetch of all N records. I've also included the final database size and the smbtorture local.[n]tdb_speed results. Benchmark details: 1) Compiled with -O2. 2) assert() was disabled in TDB2 and NTDB. 3) The "optimize fetch" patch was applied to NTDB. 10 runs, using tmpfs (otherwise massive swapping as db hits ~30M, despite plenty of RAM). Insert Re-ins Fetch Size dbspeed (nsec) (nsec) (nsec) (Kb) (ops/sec) TDB (10000 hashsize): 100 records: 3882 3320 1609 53 203204 1000 records: 3651 3281 1571 115 218021 10000 records: 3404 3326 1595 880 202874 100000 records: 4317 3825 2097 8262 126811 1000000 records: 11568 11578 9320 77005 25046 TDB2 (1024 hashsize, expandable): 100 records: 3867 3329 1699 17 187100 1000 records: 4040 3249 1639 154 186255 10000 records: 4143 3300 1695 1226 185110 100000 records: 4481 3425 1800 17848 163483 1000000 records: 4055 3534 1878 106386 160774 NTDB (8192 hashsize) 100 records: 4259 3376 1692 82 190852 1000 records: 3640 3275 1566 130 195106 10000 records: 4337 3438 1614 773 188362 100000 records: 4750 5165 1746 9001 169197 1000000 records: 4897 5180 2341 83838 121901 Analysis: 1) TDB wins on small databases, beating TDB2 by ~15%, NTDB by ~10%. 2) TDB starts to lose when hash chains get 10 long (fetch 10% slower than TDB2/NTDB). 3) TDB does horribly when hash chains get 100 long (fetch 4x slower than NTDB, 5x slower than TDB2, insert about 2-3x slower). 4) TDB2 databases are 40% larger than TDB1. NTDB is about 15% larger than TDB1

2012-06-19 07:13:04 +04:00

									ntdb_unlock_hash(ntdb, bits_from(h, 0, ntdb->hash_bits), F_RDLCK);
							

TDB2: Goodbye TDB2, Hello NTDB. This renames everything from tdb2 to ntdb: importantly, we no longer use the tdb_ namespace, so you can link against both ntdb and tdb if you want to. This also enables building of standalone ntdb by the autobuild script. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

2012-06-18 17:00:26 +04:00

}

625 lines 16 KiB C Raw Normal View History Unescape Escape

625 lines

16 KiB

C

Raw Normal View History