1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

275 Commits

Author SHA1 Message Date
Michael Adam
56f9231c8e tdb: use tdb_freelist_merge_adjacent in tdb_freelist_size()
So that we automatically defragment the free list when freelist_size is called
(unless the database is read only).

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
843a8a5c7b tdb: add tdb_freelist_merge_adjacent()
This is intended to be called to reduce the fragmentation in the
freelist. This is to make up the deficiency of the freelist
to be not doubly linked. If the freelist were doubly linked,
we could easily avoid the creation of adjacent freelist entries.
But with the current singly linked list, it is only possible
to cheaply merge a new free record into a freelist entry on the left,
not on the right...

This can be called periodically, e.g. in the vacuuming process
of a ctdb cluster.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
73c439f581 tdb: add utility function check_merge_ptr_with_left_record()
Variant of check_merge_with_left_record() that reads the record
itself if necessary.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
4bec28bfa9 tdb: simplify tdb_free() using check_merge_with_left_record()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
117807cd2d tdb: add utility function check_merge_with_left_record()
Check whether the record left of a given freelist record is
also a freelist record, and if so, merge the two records.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
66f3330be8 tdb: improve comments for tdb_free().
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
8be5c8a6db tdb: factor merge_with_left_record() out of tdb_free()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
63673aea9f tdb: fix debug message in tdb_free()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
08a76aabe9 tdb: reduce indentation in tdb_free() for merging left
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
87ac4ac523 tdb: increase readability of read_record_on_left()
by using early returns and better variable names,
and reducing indentation.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Michael Adam
f5a777a36c tdb: factor read_record_on_left() out of tdb_free()
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2014-06-26 10:00:11 +02:00
Volker Lendecke
db5bda56bf tdb: add TDB_MUTEX_LOCKING support
This adds optional support for locking based on
shared robust mutexes.

The caller can use the TDB_MUTEX_LOCKING flag
together with TDB_CLEAR_IF_FIRST after verifying
with tdb_runtime_check_for_robust_mutexes() that
it's supported by the current system.

The caller should be aware that using TDB_MUTEX_LOCKING
implies some limitations, e.g. it's not possible to
have multiple read chainlocks on a given hash chain
from multiple processes.

Note: that this doesn't make tdb thread safe!

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>
Pair-Programmed-With: Michael Adam <obnox@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-05-22 21:05:15 +02:00
Volker Lendecke
cbd73ba163 tdb: introduce tdb->hdr_ofs
This makes it possible to have some extra headers before
the real tdb content starts in the file.

This will be used used e.g. to implement locking based on robust mutexes.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>
Pair-Programmed-With: Michael Adam <obnox@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-05-22 21:05:15 +02:00
Stefan Metzmacher
c29e64d97e tdb: introduce TDB_SUPPORTED_FEATURE_FLAGS
This will allow to store a feature mask in the tdb header on disk,
so that openers can check if they can handle the features
other openers are using.

Pair-Programmed-With: Volker Lendecke <vl@samba.org>
Pair-Programmed-With: Michael Adam <obnox@samba.org>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-05-22 21:05:15 +02:00
Stefan Metzmacher
c0b0648555 tdb: use asprintf() to simplify tdb_summary()
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-05-22 21:05:15 +02:00
Stefan Metzmacher
e77cbe252f tdb: return ENOSYS if the tdb was created with spinlocks.
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Mon May 12 21:07:04 CEST 2014 on sn-devel-104
2014-05-12 21:07:04 +02:00
Michael Adam
d9566085c6 tdb: consolidate tdb allocation code - re-use dead records at hash top.
When in tdb_store we re-use a dead record reactivated from the
target hash chain itself, we currently leave it in its place in
the chain. When we re-use a dead record from a different chain or
from the freelist instead, we insert it at the beginning of the
target chain.

This patch changes the behaviour to always newly store a
record at the beginning of the hash chain. This removes
a special case and hence simplifies the allocation code.
On the other hand side, it introduces two additioal tdb_ofs_write
calls for the in-chain-case.

Note the subtelty of the patch that by moving the case of the candidate
record's chain as new case "i=0" into the for loop, we also reverse the
order of the two steps in the for-loop body (non blocking freelist alloc
and searching for dead record in a chain) in order to keep the overall
order of execution identical.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Michael Adam <obnox@samba.org>
Autobuild-Date(master): Wed Apr  9 10:37:08 CEST 2014 on sn-devel-104
2014-04-09 10:37:08 +02:00
Stefan Metzmacher
80dff80ee9 tdb: don't alter errno on success of tdb_open_ex()
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2014-04-02 09:03:42 +02:00
Volker Lendecke
3034a5a62b tdb: Reduce freelist contention
In a metadata-intensive benchmark we have seen the locking.tdb freelist to be
one of the central contention points. This patch removes most of the contention
on the freelist. Ages ago we already reduced freelist contention by using the
even much older DEAD records: If TDB_VOLATILE is set, don't directly put
deleted records on the freelist, but just mark a few of them just as DEAD. The
next new record can them re-use that space without consulting the freelist.

This patch builds upon the DEAD records: If we need space and the freelist is
busy, instead of doing a blocking wait on the freelist, start looking into
other chains for DEAD records and steal them from there. This way every hash
chain becomes a small freelist. Just wander around the hash chains as long as
the freelist is still busy.

With this patch and the tdb mutex patch (following hopefully some time soon)
you can see a heavily busy clustered smbd run without locking.tdb futex
syscalls.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
1461362e93 tdb: Make "tdb_purge_dead" internally public
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
92ce9fd9af tdb: Make "tdb_find_dead" internally public
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
4ca018692f tdb: Add "last_ptr" to tdb_find_dead
Will be used soon to unlink a dead record from a chain

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
cb09d7937c tdb: Move adding tailer space to tdb_find_dead
This aligns the tdb_find_dead API with the tdb_allocate API and thus makes it a
bit easier to understand, at least for me.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
255edd1b41 tdb: Do a best fit search for dead records
Hash chains are (or can be made) short enough that a full search for the
best-fitting dead record is feasible. The freelist can become much longer,
there we don't do the full search but accept records which are too large.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
d1ce0110f0 tdb: Don't purge records to a blocked freelist
If the freelist is heavily contended, we should avoid accessing it

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Volker Lendecke
5f7b481349 tdb: Fix a tdb corruption
tdb_purge_dead can change the next pointer of "rec" if we purge the record
right behind the current record to be deleted. Just overwrite the magic,
not the whole record with stale data.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Michael Adam <obnox@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2014-03-18 13:42:10 +01:00
Michael Adam
001b9582cc tdb: always open internal databases with incompatible hash.
This makes them more efficient due to better distribution
of keys across hash chains.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Sat Feb 15 08:26:07 CET 2014 on sn-devel-104
2014-02-15 08:26:06 +01:00
Michael Adam
41b7acacb3 tdb: in tdb_delete_hash, make lock/unlock bracket more obvious
by using the same variable as hash as in the lock.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Sat Feb 15 03:21:07 CET 2014 on sn-devel-104
2014-02-15 03:21:07 +01:00
Michael Adam
cde8e290c9 tdb: simplify tdb_delete_hash() a bit
Make the lock/unlock bracket more obvious by extracting
locking (and finding) from the special cases to the top
of the function. This also lets us take lock and find
the record outside the special case branches (use dead
records or not).

There is a small semantic change implied:

In the dead records case, the record to delete is looked
up before the current dead records are potentially purged.
Hence, if the record to delete is not found, the dead
records are also not purge. This does not make a big
difference though, because purging is only delayed until
directly befor the next record to delete is in fact found.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-02-14 15:55:46 -08:00
Michael Adam
adb2cd1eee tdb: tdbtool: dump record magic with fixed number of 8 hex digits
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-02-14 15:53:25 -08:00
Michael Adam
057adfae47 tdb: tdbtool: dump record hash with fixed number of 8 hex digits
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2014-02-14 15:53:25 -08:00
Volker Lendecke
f3556bd03b tdb: Avoid reallocs for lockrecs
In normal operations we have at most 3 entries in this array. Don't
bother with shrinking.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Sat Dec 14 13:19:47 CET 2013 on sn-devel-104
2013-12-14 13:19:47 +01:00
Christian Ambach
6d88bfcab4 lib/tdb: fix compiler warnings
about a variable shadowing a global declaration

Signed-off-by: Christian Ambach <ambi@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2013-12-12 14:21:27 -08:00
Volker Lendecke
1f269fcc6e tdb: Add another overflow check to tdb_expand_adjust
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Jun  3 14:08:54 CEST 2013 on sn-devel-104
2013-06-03 14:08:53 +02:00
Volker Lendecke
d9b4f19e73 tdb: Make tdb_recovery_allocate overflow-safe
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:32 +02:00
Volker Lendecke
8b215df445 tdb: Make tdb_recovery_size overflow-safe
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:31 +02:00
Stefan Metzmacher
7ae09a9695 tdb: add proper OOM/ENOSPC handling to tdb_expand()
Failing to do so will result in corrupt tdbs: We will overwrite
the hash chain pointers with 0x42424242.

Pair-Programmed-With: Volker Lendecke <vl@samba.org>

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:30 +02:00
Stefan Metzmacher
854c5f0aac tdb: add overflow detection to tdb_expand_adjust()
We round up at maximun to a new size of 4GB,
but still return at least the given size.

The caller has to deal with ENOSPC itself.

Pair-Programmed-With: Volker Lendecke <vl@samba.org>

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:28 +02:00
Stefan Metzmacher
e19d46f7e3 tdb: add overflow/ENOSPC handling to tdb_expand_file()
Pair-Programmed-With: Volker Lendecke <vl@samba.org>

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:27 +02:00
Stefan Metzmacher
a07ba17e0c tdb: add a 'new_size' helper variable to tdb_expand_file()
Pair-Programmed-With: Volker Lendecke <vl@samba.org>

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:22 +02:00
Volker Lendecke
4483bf143d tdb: Add overflow-checking tdb_add_off_t
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Rusty Russell <rusty@rustcorp.com.au>
2013-06-03 10:21:20 +02:00
Rusty Russell
3bd686c5ad tdb: fix logging of offets and lengths.
We can have offsets > 2G, so use unsigned values.  Fixes other prints to be
native types rather than casts, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Tue May 28 11:22:14 CEST 2013 on sn-devel-104
2013-05-28 11:22:14 +02:00
Christian Ambach
11f467d0bd tdb: include information about hash function being used in tdbtool info output
makes it possible to easily determine if the tdb under examination
uses jenkins hash or not

Signed-off-by: Christian Ambach <ambi@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
2013-05-14 14:34:20 +02:00
Volker Lendecke
a92c08e18b tdb: Little format change
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-03-26 10:11:47 +01:00
Volker Lendecke
68698b4e64 tdb: Slightly simplify tdb_expand_file
The "else" keywords are not necessary here, we return in the preceding
if clause

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Tue Mar  5 14:00:47 CET 2013 on sn-devel-104
2013-03-05 14:00:47 +01:00
Volker Lendecke
a7fdd4f7c2 tdb: Slightly simplify transaction_write
realloc(NULL, ...) is equivalent to malloc. We are already using this
realloc property for tdb->lockrecs. It should not make any difference
in speed, it just makes for a little simpler code.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Tue Feb 19 17:30:13 CET 2013 on sn-devel-104
2013-02-19 17:30:13 +01:00
Volker Lendecke
fcb345f5d6 tdb: Make tdb_release_transaction_locks use tdb_allrecord_unlock
The transaction code uses tdb_alrecord_lock/upgrade, so it should also
use the tdb_allrecord_unlock function just for symmetry reasons

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-19 15:46:45 +01:00
Volker Lendecke
3534e4e8d5 tdb: Factor out the retry loop from tdb_allrecord_upgrade
For the mutex code we will have to lock the hashchain and the record
lock area independently. So we will have to call the loop twice. And,
it's a small refactoring for the better anyway I think.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-19 15:46:45 +01:00
Volker Lendecke
1f93f08364 tdb: Simplify fcntl_lock() a bit
All arguments but the cmd are the same. To me this looks a bit better
and saves some bytes in the object code.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-19 15:46:45 +01:00
Volker Lendecke
542400a966 tdb: Use tdb_null in freelistcheck
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-19 15:46:45 +01:00
Volker Lendecke
05235d5b44 tdb: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Simo Sorce <idra@samba.org>
Autobuild-Date(master): Sat Feb 16 17:13:32 CET 2013 on sn-devel-104
2013-02-16 17:13:32 +01:00
Volker Lendecke
72cd5d5ff6 tdb: Remove "header" from tdb_context
header.hash_size was the only thing we ever referenced outside of
tdb_open_ex and its direct callees. So this shrinks the tdb_context by
164 bytes.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Tue Feb  5 13:18:28 CET 2013 on sn-devel-104
2013-02-05 13:18:28 +01:00
Volker Lendecke
71247ec4bd tdb: Pass argument "header" to check_header_hash
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-05 08:55:09 +01:00
Volker Lendecke
1436107b07 tdb: Pass argument "header" to tdb_new_database
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2013-02-05 08:54:28 +01:00
Volker Lendecke
f2d67af7bc tdb: Fix undefined prototype warnings
These functions are deliberately left without prototypes according to
3fdeaa399, but without prototypes we get warnings.

Reviewed-by: Rusty Russell <rusty@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Jan  7 11:20:19 CET 2013 on sn-devel-104
2013-01-07 11:20:19 +01:00
Volker Lendecke
a444bb95a2 tdb: Add a comment explaining the "check"
I had to ask git blame to find why we have to do it here...

Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>

Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Fri Dec 21 13:54:39 CET 2012 on sn-devel-104
2012-12-21 13:54:39 +01:00
Volker Lendecke
3109b541c9 tdb: Make tdb_new_database() follow a more conventional style
We usually "goto fail" on every error and then in normal flow set the
return variable to success. This patch removes a comment which from my
point of view is now obsolete. It violates the {} rule from README.Coding
here in favor of the style used in this function.

Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:57:01 +01:00
Volker Lendecke
d972e6fa74 tdb: Fix a typo
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:56:47 +01:00
Volker Lendecke
c04de8f3a4 tdb: Fix a typo
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:56:38 +01:00
Volker Lendecke
24755d75b0 tdb: Use tdb_lock_covered_by_allrecord_lock in tdb_unlock
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:56:20 +01:00
Volker Lendecke
f8dafe5685 tdb: Factor out tdb_lock_covered_by_allrecord_lock from tdb_lock_list
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:56:09 +01:00
Volker Lendecke
26b8545df4 tdb: Simplify logic in tdb_lock_list slightly
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:55:55 +01:00
Volker Lendecke
0f4e7a1401 tdb: Slightly simplify tdb_lock_list
Avoid an else {} branch when we can do an early return

Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:55:15 +01:00
Volker Lendecke
116ec13bb0 tdb: Fix blank line endings
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:54:53 +01:00
Volker Lendecke
7237fdd4dd tdb: Fix a comment
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:54:47 +01:00
Volker Lendecke
d2b852d79b tdb: Fix a typo
Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:54:40 +01:00
Volker Lendecke
2c3fd8a13e tdb: Fix a missing CONVERT
methods->tdb_write expects data in on-disk format. For reading that
record, methods->tdb_read() has taken care of the on-disk to in-memory
representation according to the DOCONV() flag passed down. tdb_rec_write()
is a wrapper around methods->tdb_write just doing the CONVERT() on the
way to disk.

Reviewed-by: Rusty Russell <rusty@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
2012-12-21 11:54:33 +01:00
Volker Lendecke
c62f8baff8 tdb: Make tdb robust against improper CLEAR_IF_FIRST restart
When winbind is restarted, there is a potential crash in tdb. Following
situation: We are in a cluster with ctdb. A winbind child hangs
in a request to the DC. Cluster monitoring decides the node has a
problem. Cluster monitoring decides to kill ctdbd. winbind child
still hangs in a RPC request. winbind parent figures that ctdb is
dead and immediately commits suicide. winbind parent is restarted by
cluster management, overwriting gencache.tdb with CLEAR_IF_FIRST. The
CLEAR_IF_FIRST logic as implemented now will not see that a child still
has the tdb open, only the parent holds the ACTIVE_LOCK due to performance
reasons. During the CLEAR_IF_FIRST logic is done, there is a very small
window where we ftruncate(tfd, 0) the file and re-write a proper header
without a lock. When during this small window the winbind child comes
back, wanting to store something into gencache.tdb, that winbind child
will crash with a SIGBUS.

Sounds unlikely? See:

[2012/09/29 07:02:31.871607,  0] lib/util.c:1183(smb_panic)
  PANIC (pid 1814517): internal error
[2012/09/29 07:02:31.877596,  0] lib/util.c:1287(log_stack_trace)
  BACKTRACE: 35 stack frames:
   #0 winbindd(log_stack_trace+0x1a) [0x7feb7d4ca18a]
   #1 winbindd(smb_panic+0x2b) [0x7feb7d4ca25b]
   #2 winbindd(+0x1a3cc4) [0x7feb7d4bacc4]
   #3 /lib64/libc.so.6(+0x32900) [0x7feb7a929900]
   #4 /lib64/libc.so.6(memcpy+0x35) [0x7feb7a97f355]
   #5 /usr/lib64/libtdb.so.1(+0x6e76) [0x7feb7b0b0e76]
   #6 /usr/lib64/libtdb.so.1(+0x3d37) [0x7feb7b0add37]
   #7 /usr/lib64/libtdb.so.1(+0x863d) [0x7feb7b0b263d]
   #8 /usr/lib64/libtdb.so.1(+0x8700) [0x7feb7b0b2700]
   #9 /usr/lib64/libtdb.so.1(+0x2505) [0x7feb7b0ac505]
   #10 /usr/lib64/libtdb.so.1(+0x25b7) [0x7feb7b0ac5b7]
   #11 /usr/lib64/libtdb.so.1(tdb_fetch+0x13) [0x7feb7b0ac633]
   #12 winbindd(gencache_set_data_blob+0x259) [0x7feb7d4d8449]
   #13 winbindd(gencache_set+0x53) [0x7feb7d4d85b3]
   #14 winbindd(gencache_del+0x5e) [0x7feb7d4d879e]
   #15 winbindd(saf_delete+0x93) [0x7feb7d54b693]
   #16 winbindd(+0xe507e) [0x7feb7d3fc07e]
   #17 winbindd(+0xe85e5) [0x7feb7d3ff5e5]
   #18 winbindd(+0xe65be) [0x7feb7d3fd5be]
   #19 winbindd(+0xe7562) [0x7feb7d3fe562]
   #20 winbindd(init_dc_connection+0x2e) [0x7feb7d3fe5be]
   #21 winbindd(+0xe75d9) [0x7feb7d3fe5d9]
   #22 winbindd(cm_connect_netlogon+0x58) [0x7feb7d3fe658]
   #23 winbindd(_wbint_PingDc+0x61) [0x7feb7d410991]
   #24 winbindd(+0x103175) [0x7feb7d41a175]
   #25 winbindd(winbindd_dual_ndrcmd+0xb7) [0x7feb7d4107d7]
   #26 winbindd(+0xf8609) [0x7feb7d40f609]
   #27 winbindd(+0xf9075) [0x7feb7d410075]
   #28 winbindd(tevent_common_loop_immediate+0xe8) [0x7feb7d4db198]
   #29 winbindd(run_events_poll+0x3c) [0x7feb7d4d93fc]
   #30 winbindd(+0x1c2b52) [0x7feb7d4d9b52]
   #31 winbindd(_tevent_loop_once+0x90) [0x7feb7d4d9f60]
   #32 winbindd(main+0x7b3) [0x7feb7d3e7aa3]
   #33 /lib64/libc.so.6(__libc_start_main+0xfd) [0x7feb7a915cdd]
   #34 winbindd(+0xce2a9) [0x7feb7d3e52a9]

This is in a winbind child, logfiles surrounding indicate the parent
was restarted.

This patch takes all chain locks around the CLEAR_IF_FIRST introduced
tdb_new_database.
2012-10-06 13:23:42 +02:00
Rusty Russell
37fd93194d tdb: Make robust against shrinking tdbs
When probing for a size change (eg. just before tdb_expand, tdb_check,
tdb_rescue) we call tdb_oob(tdb, tdb->map_size, 1, 1).  Unfortunately
this does nothing if the tdb has actually shrunk, which as Volker
demonstrated, can actually happen if a "longlived" parent crashes.

So move the map/update size/remap before the limit check.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-06 13:23:41 +02:00
Rusty Russell
90f463b25f tdb: add tdb_rescue()
This allows for an emergency best-effort dump.  It's a little better than
strings(1).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-04 09:04:19 +09:30
Volker Lendecke
a168a7c791 tdb: Fix a typo
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Oct  2 19:52:16 CEST 2012 on sn-devel-104
2012-10-02 19:52:16 +02:00
Rusty Russell
1783fe3443 tdb: make TDB_NOSYNC merely disable sync.
(As suggested by Stefan Metzmacher, based on the change to ntdb.)

Since commit ec96ea690e, we handle the case
where a process dies during a transaction commit.  Unfortunately, TDB_NOSYNC
means this no longer works, as it disables the recovery area as well as the
actual msync/fsync.  We should do everything except the syncs.

This also means we can do a complete test with $TDB_NO_FSYNC set; just
to get more complete coverage, we disable it explicitly for one test
(where we override the actual sync calls anyway).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-06-22 07:35:17 +02:00
Amitay Isaacs
3fdeaa3992 lib/tdb: Add/expose lock functions to support CTDB
This patch adds two lock functions used by CTDB to perform asynchronous
locking. These functions do not actually perform any fcntl operations,
but only increment internal counters.

 - tdb_transaction_write_lock_mark()
 - tdb_transaction_write_lock_unmark()

It also exposes two internal functions
 - tdb_lock_nonblock()
 - tdb_unlock()

These functions are NOT exposed in include/tdb.h to prevent any further
uses of these functions. If you ever need to use these functions, consider
using tdb2.

Signed-off-by: Amitay Isaacs <amitay@gmail.com>
2012-03-29 20:07:03 +10:30
Rusty Russell
4442c0b2c9 lib/tdb: fix transaction issue for HAVE_INCOHERENT_MMAP.
We unmap the tdb on expand, the remap.  But when we have INCOHERENT_MMAP
(ie. OpenBSD) and we're inside a transaction, doing the expand can mean
we need to read from the database to partially fill a transaction block.
This fails, because if mmap is incoherent we never allow accessing the
database via read/write.

The solution is not to unmap and remap until we've actually written the
padding at the end of the file.

Reported-by: Amitay Isaacs <amitay@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Fri Mar 23 02:53:15 CET 2012 on sn-devel-104
2012-03-23 02:53:15 +01:00
Rusty Russell
330e3e1b91 lib/tdb: fix missing return 0 code.
fde694274e made tdb_mmap return an int,
but didn't put the return 0 on the "internal db" case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-03-23 10:41:55 +10:30
Rusty Russell
fde694274e lib/tdb: fix OpenBSD incoherent mmap.
This comment appears in two places in the code (commit
4c6a8273c6 from 2001):

	/*
	 * We must ensure the file is unmapped before doing this
	 * to ensure consistency with systems like OpenBSD where
	 * writes and mmaps are not consistent.
	 */

But this doesn't help, because if one process is using mmap and another
using pwrite, we get incoherent results.  As demonstrated by OpenBSD's
failure on the tdb unit tests.

Rather than disable mmap on OpenBSD, we test for this issue and force mmap
to be enabled.  This means that we will fail on very large TDBs on 32-bit
systems, but it's better than the horrendous performance penalty on every
OpenBSD system.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-03-22 01:57:37 +01:00
Rusty Russell
390b9a2dd8 tdb: make tdb_private.h idempotent.
The most convenient way to write unit tests in C is to directly
#include the C files (CCAN uses this, for example).  That works quite
well, but it means that tdb_private.h now needs to be protected
against multiple inclusions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-02-14 04:04:43 +10:30
Ira Cooper
7b42ceb414 Fix compile when TDB_TRACE is enabled.
Autobuild-User: Jeremy Allison <jra@samba.org>
Autobuild-Date: Fri Jan  6 04:16:41 CET 2012 on sn-devel-104
2012-01-06 04:16:41 +01:00
Volker Lendecke
c1e9537ed0 tdb: Use tdb_parse_record in tdb_update_hash
This avoids a tdb_fetch, thus a malloc/memcpy/free in the tdb_store path
2011-12-25 13:31:58 +01:00
Rusty Russell
5767224b7f tdb: don't free old recovery area when expanding if already at EOF.
We allocate a new recovery area by expanding the file.  But if the
recovery area is already at the end of file (as shown in at least one
client case), we can simply expand the record, rather than freeing it
and creating a new one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Wed Dec 21 06:25:40 CET 2011 on sn-devel-104
2011-12-21 06:25:40 +01:00
Rusty Russell
3a2a755e33 tdb: use same expansion factor logic when expanding for new recovery area.
If we're expanding because the current recovery area is too small, we
expand only the amount we need.  This can quickly lead to exponential
growth when we have a slowly-expanding record (hence a
slowly-expanding transaction size).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-12-21 14:17:16 +10:30
Volker Lendecke
664add1775 tdb: Avoid a malloc/memcpy in _tdb_store 2011-12-19 15:18:08 +01:00
Rusty Russell
b64494535d tdb: be more careful on 4G files.
I came across a tdb which had wrapped to 4G + 4K, and the contents had been
destroyed by processes which thought it only 4k long.  Fix this by checking
on open, and making tdb_oob() check for wrap itself.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Mon Dec 19 07:52:01 CET 2011 on sn-devel-104
2011-12-19 07:52:01 +01:00
Rusty Russell
ee720fc19c tdb: increment sequence number in tdb_wipe_all().
TDB2 testing revealed that tdb1 doesn't do this.  It's minor, but fix it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Tue Aug 16 10:47:41 CEST 2011 on sn-devel-104
2011-08-16 10:47:41 +02:00
Rusty Russell
4fa51257b2 tdb: enable VALGRIND to remove valgrind noise.
Andrew Bartlett complained that valgrind needs --partial-loads-ok=yes otherwise
the Jenkins hash makes it complain.

My benchmarking here revealed that at least with modern gcc (4.5) and CPU
(Intel i5 32 bit) there's no measurable performance penalty for the
"correct" code, so rip out the optimized one.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Wed Jun  8 11:05:47 CEST 2011 on sn-devel-104
2011-06-08 11:05:47 +02:00
Rusty Russell
36cfa7b79e tdb: make sure we skip over recovery area correctly.
If it's really the recovery area, we can trust the rec_len field, and
don't have to go groping for bitpatterns.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Tue Apr 19 14:15:22 CEST 2011 on sn-devel-104
2011-04-19 14:15:22 +02:00
Simo Sorce
cb884186a5 tdb_expand: limit the expansion with huge records
ldb can create huge records when saving indexes.
Limit the tdb expansion to avoid consuming a lot of memory for
no good reason if the record being saved is huge.
2011-04-18 22:15:11 +09:30
Rusty Russell
094ab60053 tdb: tdb_repack() only when it's worthwhile.
tdb_repack() is expensive and consumes memory, so we can spend some
effort to see if it's worthwhile.  In particular, tdbbackup doesn't
need to repack: it started with an empty database!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-18 22:15:11 +09:30
Rusty Russell
6aa72dae8f tdb: fix transaction recovery area for converted tdbs.
This is why macros are dangerous; these were converting the pointers, not the
things pointed to!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-18 22:15:11 +09:30
Volker Lendecke
0080f944b4 tdb: Fix Coverity ID 2238: SECURE_CODING 2011-03-30 09:58:32 +02:00
Volker Lendecke
25397de589 tdb: Fix Coverity ID 2192: NO_EFFECT
(ret < 0) can never be true
2011-03-27 22:22:12 +02:00
Volker Lendecke
91cad71390 tdb: Fix a C++ warning
Autobuild-User: Volker Lendecke <vlendec@samba.org>
Autobuild-Date: Sat Feb 12 19:50:55 CET 2011 on sn-devel-104
2011-02-12 19:50:55 +01:00
Rusty Russell
cac57328a6 tdb: tdb_summary() support.
Autobuild-User: Rusty Russell <rusty@rustcorp.com.au>
Autobuild-Date: Wed Dec 29 10:12:05 CET 2010 on sn-devel-104
2010-12-29 10:12:05 +01:00
Matthias Dieter Wallnöfer
989d8803f2 tdb:common/open.c - use "discard_const_p" for certain "tdb->name" assignments
In order to suppress compiler warnings.
2010-11-27 21:50:42 +01:00
Stefan Metzmacher
dedd064aa8 tdb: set tdb->name early, as it's needed for tdb_name()
tdb_name() might be used within the given log function,
which might be called from within tdb_open_ex().

metze

Autobuild-User: Stefan Metzmacher <metze@samba.org>
Autobuild-Date: Fri Nov 12 11:22:21 UTC 2010 on sn-devel-104
2010-11-12 11:22:21 +00:00
Jelmer Vernooij
62c4af9942 tdb: Set _PUBLIC_ in C file rather than header files (Debian bug 600898)
Autobuild-User: Jelmer Vernooij <jelmer@samba.org>
Autobuild-Date: Thu Oct 21 11:47:22 UTC 2010 on sn-devel-104
2010-10-21 11:47:22 +00:00
Rusty Russell
2dcf76c924 tdb: TDB_INCOMPATIBLE_HASH, to allow safe changing of default hash.
This flag to tdb_open/tdb_open_ex effects creation of a new database:
1) Uses the Jenkins lookup3 hash instead of the old gdbm hash if none is
   specified,
2) Places a non-zero field in header->rwlocks, so older versions of TDB will
   refuse to open it.

This means that the caller (ie Samba) can set this flag to safely
change the hash function.  Versions of TDB from this one on will either
use the correct hash or refuse to open (if a different hash is specified).
Older TDB versions will see the nonzero rwlocks field and refuse to open
it under any conditions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-27 10:48:28 +09:30
Rusty Russell
ccac258d14 tdb: automatically identify Jenkins hash tdbs
If the caller to tdb_open_ex() doesn't specify a hash, and tdb_old_hash
doesn't match, try tdb_jenkins_hash.

This was Metze's idea: it makes life simpler, especially with the upcoming
TDB_INCOMPATIBLE_HASH flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-27 10:48:28 +09:30
Rusty Russell
3258cf3f11 tdb: add Bob Jenkins lookup3 hash as helper hash.
This is a better hash than the default: shipping it with tdb makes it easy
for callers to use it as the hash by passing it to tdb_open_ex().

This version taken from CCAN and modified, which took it from
http://www.burtleburtle.net/bob/c/lookup3.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-27 10:48:28 +09:30
Günther Deschner
1585c4df68 lib/tdb: fix c++ build warning in tdb_header_hash().
Guenther
2010-09-20 16:15:11 -07:00
Andrew Tridgell
ff515ff477 tdb: added TDB_NO_FSYNC env variable
this might help reduce test times and load on test machines
2010-09-16 21:09:17 +10:00
Rusty Russell
786b726300 tdb: put example hashes into header, so we notice incorrect hash_fn.
This is Stefan Metzmacher <metze@samba.org>'s patch with minor changes:
1) Use the TDB_MAGIC constant so both hashes aren't of strings.
2) Check the hash in tdb_check (paranoia, really).
3) Additional check in the (unlikely!) case where both examples hash to 0.
4) Cosmetic changes to var names and complaint message.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-13 20:05:59 +09:30
Rusty Russell
f77708e962 tdb: fix tdb_check() on other-endian tdbs.
We must not endian-convert the magic string, just the rest.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-13 19:59:18 +09:30
Rusty Russell
82e5644c9d tdb: fix tdb_check() on read-only TDBs to actually work.
Commit bc1c82ea13 "Fix tdb_check() to work with read-only tdb databases."
claimed to do this, but tdb_lockall_read() fails on read-only databases.

Also make sure we can still do tdb_check() inside a transaction (weird,
but we previously allowed it so don't break the API).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-13 19:58:23 +09:30
Rusty Russell
9e0deff904 tdb: make check more robust against recovery failures.
We can end up with dead areas when we die during transaction commit;
tdb_check() fails on such a (valid) database.

This is particularly noticable now we no longer truncate on recovery;
if the recovery area was at the end of the file we used to remove it
that way.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-09-13 19:55:26 +09:30
Rusty Russell
11ab43084b tdb: workaround starvation problem in locking entire database.
We saw tdb_lockall() take 71 seconds under heavy load; this is because Linux
(at least) doesn't prevent new small locks being obtained while we're waiting
for a big log.

The workaround is to do divide and conquer using non-blocking chainlocks: if
we get down to a single chain we block.  Using a simple test program where
children did "hold lock for 100ms, sleep for 1 second" the time to do
tdb_lockall() dropped signifiantly.  There are ln(hashsize) locks taken in
the contended case, but that's slow anyway.

More analysis is given in my blog at http://rusty.ozlabs.org/?p=120

This may also help transactions, though in that case it's the initial
read lock which uses this gradual locking routine; the update-to-write-lock
code is separate and still tries to update in one go.

Even though ABI doesn't change, minor version bumped so behavior change
can be easily detected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-14 02:31:22 +09:30
Jeremy Allison
bc1c82ea13 Fix tdb_check() to work with read-only tdb databases. The function tdb_lockall() uses F_WRLCK internally, which doesn't work on a fd opened with O_RDONLY. Use tdb_lockall_read() instead.
Jeremy.
2010-07-29 08:56:35 +09:30
Günther Deschner
f7a3bd4fa4 tdb: fix the build on mac os x 10.6.4.
Guenther
2010-07-01 23:14:57 +02:00
Günther Deschner
2eab1d7fdc tdb: remove unused variable in tdb_new_database().
Guenther
2010-05-11 13:41:17 +02:00
Rusty Russell
91e4a1760d tdb: fix short write logic in tdb_new_database
Commit 207a213c/24fed55d purported to fix the problem of signals during
tdb_new_database (which could cause a spurious short write, hence a failure).
However, the code is wrong: newdb+written is not correct.

Fix this by introducing a general tdb_write_all() and using it here and in
the tracing code.

Cc: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-05 15:37:18 +09:30
Andrew Tridgell
773a8afbba tdb: update tdb ABI to use hide_symbols=True
We now use -fvisibilty=hidden to hide symbols from outside the tdb
shared library.

This also moved tdb_transaction_recover() into the tdb_private.h
header, as it should never have been a public API. For that reason we
are changing the version number. We're only doing a minor version
increment as it is extremely unlikely that anyone was actually using
tdb_transaction_recover() as its locking requirements were rather
unusual.

Pair-Programmed-With: Rusty Russell <rusty@samba.org>
2010-04-20 15:50:27 +10:00
Volker Lendecke
261c3b4f1b tdb: Add a non-blocking version of tdb_transaction_start 2010-03-26 14:27:47 -04:00
Volker Lendecke
59315887a0 tdb: Fix indentation in tdb_new_database() 2010-03-25 10:30:10 +01:00
Volker Lendecke
ea8e0d5d54 Fix some nonempty blank lines 2010-03-25 10:24:45 +01:00
Volker Lendecke
fb98f60594 tdb: If tdb_parse_record does not find a record, return -1 instead of 0 2010-02-28 17:40:59 +01:00
Rusty Russell
ec96ea690e tdb: handle processes dying during transaction commit.
tdb transactions were designed to be robust against the machine
powering off, but interestingly were never designed to handle the case
where an administrator kill -9's a process during commit.  Because
recovery is only done on tdb_open, processes with the tdb already
mapped will simply use it despite it being corrupt and needing
recovery.

The solution to this is to check for recovery every time we grab a
data lock: we could have gained the lock because a process just died.
This has no measurable cost: here is the time for tdbtorture -s 0 -n 1
-l 10000:

Before:
	2.75 2.50 2.81 3.19 2.91 2.53 2.72 2.50 2.78 2.77 = Avg 2.75

After:
	2.81 2.57 3.42 2.49 3.02 2.49 2.84 2.48 2.80 2.43 = Avg 2.74

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 13:23:58 +10:30
Rusty Russell
1bf482b9ef patch tdb-refactor-tdb_lock-and-tdb_lock_nonblock.patch 2010-02-24 13:18:06 +10:30
Rusty Russell
8c3fda4318 tdb: don't truncate tdb on recovery
The current recovery code truncates the tdb file on recovery.  This is
fine if recovery is only done on first open, but is a really bad idea
as we move to allowing recovery on "live" databases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:50:41 +10:30
Rusty Russell
9f295eecff tdb: remove lock ops
Now the transaction code uses the standard allrecord lock, that stops
us from trying to grab any per-record locks anyway.  We don't need to
have special noop lock ops for transactions.

This is a nice simplification: if you see brlock, you know it's really
going to grab a lock.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:49:22 +10:30
Rusty Russell
a84222bbaf tdb: rename tdb_release_extra_locks() to tdb_release_transaction_locks()
tdb_release_extra_locks() is too general: it carefully skips over the
transaction lock, even though the only caller then drops it.  Change
this, and rename it to show it's clearly transaction-specific.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 11:02:55 +10:30
Rusty Russell
dd1b508c63 tdb: cleanup: remove ltype argument from _tdb_transaction_cancel.
Now the transaction allrecord lock is the standard one, and thus is cleaned
in tdb_release_extra_locks(), _tdb_transaction_cancel() doesn't need to
know what type it is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 12:42:24 +10:30
Rusty Russell
fca1621965 tdb: tdb_allrecord_lock/tdb_allrecord_unlock/tdb_allrecord_upgrade
Centralize locking of all chains of the tdb; rename _tdb_lockall to
tdb_allrecord_lock and _tdb_unlockall to tdb_allrecord_unlock, and
tdb_brlock_upgrade to tdb_allrecord_upgrade.

Then we use this in the transaction code.  Unfortunately, if the transaction
code records that it has grabbed the allrecord lock read-only, write locks
will fail, so we treat this upgradable lock as a write lock, and mark it
as upgradable using the otherwise-unused offset field.

One subtlety: now the transaction code is using the allrecord_lock, the
tdb_release_extra_locks() function drops it for us, so we no longer need
to do it manually in _tdb_transaction_cancel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 15:42:15 +10:30
Rusty Russell
caaf5c6baa tdb: suppress record write locks when allrecord lock is taken.
Records themselves get (read) locked by the traversal code against delete.
Interestingly, this locking isn't done when the allrecord lock has been
taken, though the allrecord lock until recently didn't cover the actual
records (it now goes to end of file).

The write record lock, grabbed by the delete code, is not suppressed
by the allrecord lock.  This is now bad: it causes us to punch a hole
in the allrecord lock when we release the write record lock.  Make this
consistent: *no* record locks of any kind when the allrecord lock is
taken.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:45:26 +10:30
Rusty Russell
9341f230f8 tdb: cleanup: always grab allrecord lock to infinity.
We were previously inconsistent with our "global" lock: the
transaction code grabbed it from FREELIST_TOP to end of file, and the
rest of the code grabbed it from FREELIST_TOP to end of the hash
chains.  Change it to always grab to end of file for simplicity and
so we can merge the two.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:45:14 +10:30
Rusty Russell
1ab8776247 tdb: remove num_locks
This was redundant before this patch series: it mirrored num_lockrecs
exactly.  It still does.

Also, skip useless branch when locks == 1: unconditional assignment is
cheaper anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 15:01:07 +10:30
Rusty Russell
d48c3e4982 tdb: use tdb_nest_lock() for seqnum lock.
This is pure overhead, but it centralizes the locking.  Realloc (esp. as
most implementations are lazy) is fast compared to the fnctl anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:40:57 +10:30
Rusty Russell
4738d474c4 tdb: use tdb_nest_lock() for active lock.
Use our newly-generic nested lock tracking for the active lock.

Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:44:40 +10:30
Rusty Russell
9136818df3 tdb: use tdb_nest_lock() for open lock.
This never nests, so it's overkill, but it centralizes the locking into
lock.c and removes the ugly flag in the transaction code to track whether
we have the lock or not.

Note that we have a temporary hack so this places a real lock, despite
the fact that we are in a transaction.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-22 13:58:07 +10:30
Rusty Russell
e8fa70a321 tdb: use tdb_nest_lock() for transaction lock.
Rather than a boutique lock and a separate nest count, use our
newly-generic nested lock tracking for the transaction lock.

Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:37:34 +10:30
Rusty Russell
ce41411c84 tdb: cleanup: find_nestlock() helper.
Factor out two loops which find locks; we are going to introduce a couple
more so a helper makes sense.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:35:54 +10:30
Rusty Russell
db270734d8 tdb: cleanup: tdb_release_extra_locks() helper
Move locking intelligence back into lock.c, rather than open-coding the
lock release in transaction.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:41:15 +10:30
Rusty Russell
fba42f1fb4 tdb: cleanup: tdb_have_extra_locks() helper
In many places we check whether locks are held: add a helper to do this.

The _tdb_lockall() case has already checked for the allrecord lock, so
the extra work done by tdb_have_extra_locks() is merely redundant.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:34:26 +10:30
Rusty Russell
b754f61d23 tdb: don't suppress the transaction lock because of the allrecord lock.
tdb_transaction_lock() and tdb_transaction_unlock() do nothing if we
hold the allrecord lock.  However, the two locks don't overlap, so
this is wrong.

This simplification makes the transaction lock a straight-forward nested
lock.

There are two callers for these functions:
1) The transaction code, which already makes sure the allrecord_lock
   isn't held.
2) The traverse code, which wants to stop transactions whether it has the
   allrecord lock or not.  There have been deadlocks here before, however
   this should not bring them back (I hope!)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:31:49 +10:30
Rusty Russell
5d9de604d9 tdb: cleanup: tdb_nest_lock/tdb_nest_unlock
Because fcntl locks don't nest, we track them in the tdb->lockrecs array
and only place/release them when the count goes to 1/0.  We only do this
for record locks, so we simply place the list number (or -1 for the free
list) in the structure.

To generalize this:

1) Put the offset rather than list number in struct tdb_lock_type.
2) Rename _tdb_lock() to tdb_nest_lock, make it non-static and move the
   allrecord check out to the callers (except the mark case which doesn't
   care).
3) Rename _tdb_unlock() to tdb_nest_unlock(), make it non-static and
   move the allrecord out to the callers (except mark again).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:26:13 +10:30
Rusty Russell
e9114a7585 tdb: cleanup: rename global_lock to allrecord_lock.
The word global is overloaded in tdb.  The global_lock inside struct
tdb_context is used to indicate we hold a lock across all the chains.

Rename it to allrecord_lock.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:19:47 +10:30
Rusty Russell
7ab422d6fb tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK.
The word global is overloaded in tdb.  The GLOBAL_LOCK offset is used at
open time to serialize initialization (and by the transaction code to block
open).

Rename it to OPEN_LOCK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:18:33 +10:30
Rusty Russell
a6e0ef87d2 tdb: make _tdb_transaction_cancel static.
Now tdb_open() calls tdb_transaction_cancel() instead of
_tdb_transaction_cancel, we can make it static.

Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
2010-02-24 10:39:59 +10:30
Rusty Russell
452b4a5a6e tdb: cleanup: split brlock and brunlock methods.
This is taken from the CCAN code base: rather than using tdb_brlock for
locking and unlocking, we split it into brlock and brunlock functions.

For extra debugging information, brunlock says what kind of lock it is
unlocking (even though fnctl locks don't need this).  This requires an
extra argument to tdb_transaction_unlock() so we know whether the
lock was upgraded to a write lock or not.

We also use a "flags" argument tdb_brlock:
1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW).
2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype.
3) TDB_LOCK_PROBE replaces the "probe" argument.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:17:19 +10:30
Brad Hards
09e756b1d6 Spelling fixes for tdb.
Signed-off-by: Matthias Dieter Wallnöfer <mwallnoefer@yahoo.de>
2010-02-22 21:45:31 +01:00
Andrew Tridgell
1373e748aa tdb: use fdatasync() instead of fsync() in transactions
This might help on some filesystems
2010-02-13 22:36:11 +11:00
Volker Lendecke
6824c6f46b tdb: Apply some const, just for clarity 2010-02-13 12:19:09 +01:00
Rusty Russell
b37b452cb8 tdb: fix recovery reuse after crash
If a process (or the machine) dies after just after writing the
recovery head (pointing at the end of file), the recovery record will filled
with 0x42.  This will not invoke a recovery on open, since rec.magic
!= TDB_RECOVERY_MAGIC.

Unfortunately, the first transaction commit will happily reuse that
area: tdb_recovery_allocate() doesn't check the magic.  The recovery
record has length 0x42424242, and it writes that back into the
now-valid-looking transaction header) for the next comer (which
happens to be tdb_wipe_all in my tests).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-10 16:56:14 +10:30
Rusty Russell
6269cdcd15 tdb: give a name to the invalid recovery area constant (0)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-10 16:56:13 +10:30
Volker Lendecke
531059696e tdb: fix an early release of the global lock that can cause data corruption
There was a bug in tdb where the

                tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1);

(ending the transaction-"mutex") was done before the

                        /* remove the recovery marker */

This means that when a transaction is committed there is a window where another
opener of the file sees the transaction marker while the transaction committer
is still fully functional and working on it. This led to transaction being
rolled back by that second opener of the file while transaction_commit() gave
no error to the caller.

This patch moves the F_UNLCK to after the recovery marker was removed, closing
this window.
2010-02-01 15:06:29 +01:00
Stefan Metzmacher
3b9f19ed91 tdb: add TDB_DISALLOW_NESTING and make TDB_ALLOW_NESTING the default behavior
We need to keep TDB_ALLOW_NESTING as default behavior,
so that existing code continues to work.

However we may change the default together with a major version
number change in future.

metze
2009-11-20 09:45:36 +01:00
Ronnie Sahlberg
436b55db1f New attempt at TDB transaction nesting allow/disallow.
Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING.

If an application can cope with transaction nesting and the implicit
semantics of tdb_transaction_commit(), it can enable transaction nesting
by using the TDB_ALLOW_NESTING flag.
(cherry picked from ctdb commit 3e49e41c21eb8c53084aa8cc7fd3557bdd8eb7b6)

Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-11-20 09:45:34 +01:00
Stefan Metzmacher
85449b7bcc tdb: always set tdb->tracefd to -1 to be safe on goto fail
metze
2009-11-20 09:45:34 +01:00
Volker Lendecke
be88a126ea tdb: Fix a C++ warning 2009-11-08 00:28:22 +01:00
Kirill Smelkov
b4424f8234 tdb: reset tdb->fd to -1 in tdb_close()
So that erroneous double tdb_close() calls do not try to close() same
fd again. This is like SAFE_FREE() but for fd.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-29 10:14:33 +10:30
Andrew Tridgell
d4c0e8fdf0 tdb: detect tdb store of identical records and skip
This can help with ldb where we rewrite the index records
2009-10-25 13:15:18 +11:00