1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-12 09:18:10 +03:00
Commit Graph

110 Commits

Author SHA1 Message Date
Rusty Russell
ec96ea690e tdb: handle processes dying during transaction commit.
tdb transactions were designed to be robust against the machine
powering off, but interestingly were never designed to handle the case
where an administrator kill -9's a process during commit.  Because
recovery is only done on tdb_open, processes with the tdb already
mapped will simply use it despite it being corrupt and needing
recovery.

The solution to this is to check for recovery every time we grab a
data lock: we could have gained the lock because a process just died.
This has no measurable cost: here is the time for tdbtorture -s 0 -n 1
-l 10000:

Before:
	2.75 2.50 2.81 3.19 2.91 2.53 2.72 2.50 2.78 2.77 = Avg 2.75

After:
	2.81 2.57 3.42 2.49 3.02 2.49 2.84 2.48 2.80 2.43 = Avg 2.74

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 13:23:58 +10:30
Rusty Russell
1bf482b9ef patch tdb-refactor-tdb_lock-and-tdb_lock_nonblock.patch 2010-02-24 13:18:06 +10:30
Rusty Russell
8c3fda4318 tdb: don't truncate tdb on recovery
The current recovery code truncates the tdb file on recovery.  This is
fine if recovery is only done on first open, but is a really bad idea
as we move to allowing recovery on "live" databases.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:50:41 +10:30
Rusty Russell
9f295eecff tdb: remove lock ops
Now the transaction code uses the standard allrecord lock, that stops
us from trying to grab any per-record locks anyway.  We don't need to
have special noop lock ops for transactions.

This is a nice simplification: if you see brlock, you know it's really
going to grab a lock.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:49:22 +10:30
Rusty Russell
a84222bbaf tdb: rename tdb_release_extra_locks() to tdb_release_transaction_locks()
tdb_release_extra_locks() is too general: it carefully skips over the
transaction lock, even though the only caller then drops it.  Change
this, and rename it to show it's clearly transaction-specific.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 11:02:55 +10:30
Rusty Russell
dd1b508c63 tdb: cleanup: remove ltype argument from _tdb_transaction_cancel.
Now the transaction allrecord lock is the standard one, and thus is cleaned
in tdb_release_extra_locks(), _tdb_transaction_cancel() doesn't need to
know what type it is.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 12:42:24 +10:30
Rusty Russell
fca1621965 tdb: tdb_allrecord_lock/tdb_allrecord_unlock/tdb_allrecord_upgrade
Centralize locking of all chains of the tdb; rename _tdb_lockall to
tdb_allrecord_lock and _tdb_unlockall to tdb_allrecord_unlock, and
tdb_brlock_upgrade to tdb_allrecord_upgrade.

Then we use this in the transaction code.  Unfortunately, if the transaction
code records that it has grabbed the allrecord lock read-only, write locks
will fail, so we treat this upgradable lock as a write lock, and mark it
as upgradable using the otherwise-unused offset field.

One subtlety: now the transaction code is using the allrecord_lock, the
tdb_release_extra_locks() function drops it for us, so we no longer need
to do it manually in _tdb_transaction_cancel.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 15:42:15 +10:30
Rusty Russell
caaf5c6baa tdb: suppress record write locks when allrecord lock is taken.
Records themselves get (read) locked by the traversal code against delete.
Interestingly, this locking isn't done when the allrecord lock has been
taken, though the allrecord lock until recently didn't cover the actual
records (it now goes to end of file).

The write record lock, grabbed by the delete code, is not suppressed
by the allrecord lock.  This is now bad: it causes us to punch a hole
in the allrecord lock when we release the write record lock.  Make this
consistent: *no* record locks of any kind when the allrecord lock is
taken.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:45:26 +10:30
Rusty Russell
9341f230f8 tdb: cleanup: always grab allrecord lock to infinity.
We were previously inconsistent with our "global" lock: the
transaction code grabbed it from FREELIST_TOP to end of file, and the
rest of the code grabbed it from FREELIST_TOP to end of the hash
chains.  Change it to always grab to end of file for simplicity and
so we can merge the two.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:45:14 +10:30
Rusty Russell
1ab8776247 tdb: remove num_locks
This was redundant before this patch series: it mirrored num_lockrecs
exactly.  It still does.

Also, skip useless branch when locks == 1: unconditional assignment is
cheaper anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 15:01:07 +10:30
Rusty Russell
d48c3e4982 tdb: use tdb_nest_lock() for seqnum lock.
This is pure overhead, but it centralizes the locking.  Realloc (esp. as
most implementations are lazy) is fast compared to the fnctl anyway.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:40:57 +10:30
Rusty Russell
4738d474c4 tdb: use tdb_nest_lock() for active lock.
Use our newly-generic nested lock tracking for the active lock.

Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:44:40 +10:30
Rusty Russell
9136818df3 tdb: use tdb_nest_lock() for open lock.
This never nests, so it's overkill, but it centralizes the locking into
lock.c and removes the ugly flag in the transaction code to track whether
we have the lock or not.

Note that we have a temporary hack so this places a real lock, despite
the fact that we are in a transaction.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-22 13:58:07 +10:30
Rusty Russell
e8fa70a321 tdb: use tdb_nest_lock() for transaction lock.
Rather than a boutique lock and a separate nest count, use our
newly-generic nested lock tracking for the transaction lock.

Note that the tdb_have_extra_locks() and tdb_release_extra_locks()
functions have to skip over this lock now it is tracked.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:37:34 +10:30
Rusty Russell
ce41411c84 tdb: cleanup: find_nestlock() helper.
Factor out two loops which find locks; we are going to introduce a couple
more so a helper makes sense.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:35:54 +10:30
Rusty Russell
db270734d8 tdb: cleanup: tdb_release_extra_locks() helper
Move locking intelligence back into lock.c, rather than open-coding the
lock release in transaction.c.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 10:41:15 +10:30
Rusty Russell
fba42f1fb4 tdb: cleanup: tdb_have_extra_locks() helper
In many places we check whether locks are held: add a helper to do this.

The _tdb_lockall() case has already checked for the allrecord lock, so
the extra work done by tdb_have_extra_locks() is merely redundant.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:34:26 +10:30
Rusty Russell
b754f61d23 tdb: don't suppress the transaction lock because of the allrecord lock.
tdb_transaction_lock() and tdb_transaction_unlock() do nothing if we
hold the allrecord lock.  However, the two locks don't overlap, so
this is wrong.

This simplification makes the transaction lock a straight-forward nested
lock.

There are two callers for these functions:
1) The transaction code, which already makes sure the allrecord_lock
   isn't held.
2) The traverse code, which wants to stop transactions whether it has the
   allrecord lock or not.  There have been deadlocks here before, however
   this should not bring them back (I hope!)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:31:49 +10:30
Rusty Russell
5d9de604d9 tdb: cleanup: tdb_nest_lock/tdb_nest_unlock
Because fcntl locks don't nest, we track them in the tdb->lockrecs array
and only place/release them when the count goes to 1/0.  We only do this
for record locks, so we simply place the list number (or -1 for the free
list) in the structure.

To generalize this:

1) Put the offset rather than list number in struct tdb_lock_type.
2) Rename _tdb_lock() to tdb_nest_lock, make it non-static and move the
   allrecord check out to the callers (except the mark case which doesn't
   care).
3) Rename _tdb_unlock() to tdb_nest_unlock(), make it non-static and
   move the allrecord out to the callers (except mark again).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:26:13 +10:30
Rusty Russell
e9114a7585 tdb: cleanup: rename global_lock to allrecord_lock.
The word global is overloaded in tdb.  The global_lock inside struct
tdb_context is used to indicate we hold a lock across all the chains.

Rename it to allrecord_lock.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:19:47 +10:30
Rusty Russell
7ab422d6fb tdb: cleanup: rename GLOBAL_LOCK to OPEN_LOCK.
The word global is overloaded in tdb.  The GLOBAL_LOCK offset is used at
open time to serialize initialization (and by the transaction code to block
open).

Rename it to OPEN_LOCK.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:18:33 +10:30
Rusty Russell
a6e0ef87d2 tdb: make _tdb_transaction_cancel static.
Now tdb_open() calls tdb_transaction_cancel() instead of
_tdb_transaction_cancel, we can make it static.

Signed-off-by: Rusty Russell<rusty@rustcorp.com.au>
2010-02-24 10:39:59 +10:30
Rusty Russell
452b4a5a6e tdb: cleanup: split brlock and brunlock methods.
This is taken from the CCAN code base: rather than using tdb_brlock for
locking and unlocking, we split it into brlock and brunlock functions.

For extra debugging information, brunlock says what kind of lock it is
unlocking (even though fnctl locks don't need this).  This requires an
extra argument to tdb_transaction_unlock() so we know whether the
lock was upgraded to a write lock or not.

We also use a "flags" argument tdb_brlock:
1) TDB_LOCK_NOWAIT replaces lck_type = F_SETLK (vs F_SETLKW).
2) TDB_LOCK_MARK_ONLY replaces setting TDB_MARK_LOCK bit in ltype.
3) TDB_LOCK_PROBE replaces the "probe" argument.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-17 12:17:19 +10:30
Brad Hards
09e756b1d6 Spelling fixes for tdb.
Signed-off-by: Matthias Dieter Wallnöfer <mwallnoefer@yahoo.de>
2010-02-22 21:45:31 +01:00
Andrew Tridgell
1373e748aa tdb: use fdatasync() instead of fsync() in transactions
This might help on some filesystems
2010-02-13 22:36:11 +11:00
Volker Lendecke
6824c6f46b tdb: Apply some const, just for clarity 2010-02-13 12:19:09 +01:00
Rusty Russell
b37b452cb8 tdb: fix recovery reuse after crash
If a process (or the machine) dies after just after writing the
recovery head (pointing at the end of file), the recovery record will filled
with 0x42.  This will not invoke a recovery on open, since rec.magic
!= TDB_RECOVERY_MAGIC.

Unfortunately, the first transaction commit will happily reuse that
area: tdb_recovery_allocate() doesn't check the magic.  The recovery
record has length 0x42424242, and it writes that back into the
now-valid-looking transaction header) for the next comer (which
happens to be tdb_wipe_all in my tests).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-10 16:56:14 +10:30
Rusty Russell
6269cdcd15 tdb: give a name to the invalid recovery area constant (0)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-10 16:56:13 +10:30
Volker Lendecke
531059696e tdb: fix an early release of the global lock that can cause data corruption
There was a bug in tdb where the

                tdb_brlock(tdb, GLOBAL_LOCK, F_UNLCK, F_SETLKW, 0, 1);

(ending the transaction-"mutex") was done before the

                        /* remove the recovery marker */

This means that when a transaction is committed there is a window where another
opener of the file sees the transaction marker while the transaction committer
is still fully functional and working on it. This led to transaction being
rolled back by that second opener of the file while transaction_commit() gave
no error to the caller.

This patch moves the F_UNLCK to after the recovery marker was removed, closing
this window.
2010-02-01 15:06:29 +01:00
Stefan Metzmacher
3b9f19ed91 tdb: add TDB_DISALLOW_NESTING and make TDB_ALLOW_NESTING the default behavior
We need to keep TDB_ALLOW_NESTING as default behavior,
so that existing code continues to work.

However we may change the default together with a major version
number change in future.

metze
2009-11-20 09:45:36 +01:00
Ronnie Sahlberg
436b55db1f New attempt at TDB transaction nesting allow/disallow.
Make the default be that transaction is not allowed and any attempt to create a nested transaction will fail with TDB_ERR_NESTING.

If an application can cope with transaction nesting and the implicit
semantics of tdb_transaction_commit(), it can enable transaction nesting
by using the TDB_ALLOW_NESTING flag.
(cherry picked from ctdb commit 3e49e41c21eb8c53084aa8cc7fd3557bdd8eb7b6)

Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-11-20 09:45:34 +01:00
Stefan Metzmacher
85449b7bcc tdb: always set tdb->tracefd to -1 to be safe on goto fail
metze
2009-11-20 09:45:34 +01:00
Volker Lendecke
be88a126ea tdb: Fix a C++ warning 2009-11-08 00:28:22 +01:00
Kirill Smelkov
b4424f8234 tdb: reset tdb->fd to -1 in tdb_close()
So that erroneous double tdb_close() calls do not try to close() same
fd again. This is like SAFE_FREE() but for fd.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-29 10:14:33 +10:30
Andrew Tridgell
d4c0e8fdf0 tdb: detect tdb store of identical records and skip
This can help with ldb where we rewrite the index records
2009-10-25 13:15:18 +11:00
Stefan Metzmacher
3b62e250c0 tdb: rename 'struct list_struct' into 'struct tdb_record'
metze
2009-10-23 18:27:20 +02:00
Rusty Russell
022b4d4aa6 lib/tdb: add tdb_check()
ctdb wants a quick way to detect corrupt tdbs; particularly, tdbs with
loops in their hash chains.  tdb_check() provides this.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 00:10:34 +10:30
Rusty Russell
b77f41d58b lib/tdb: wean off TDB_ERRCODE.
It was a regrettable hack which I used to reduce line count in tdb; in fact it caused confusion as can be seen in this patch.
In particular, ecode now needs to be set before TDB_LOG anyway, and having it exposed in
the header is useless (the struct tdb_context isn't defined, so it's doubly useless).
Also, we should never set errno, as io.c was doing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 00:09:43 +10:30
Rusty Russell
703004340c lib/tdb: TDB_TRACE support (for developers)
When TDB_TRACE is defined (in tdb_private.h), verbose tracing of tdb operations is enabled.
This can be replayed using "replay_trace" from http://ccan.ozlabs.org/info/tdb.

The majority of this patch comes from moving internal functions to _<funcname> to
avoid double-tracing.  There should be no additional overhead for the normal (!TDB_TRACE)
case.

Note that the verbose traces compress really well with rzip.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 00:00:12 +10:30
Andrew Tridgell
46c99ec2a3 tdb: allow reads after prepare commit
We previously only allowed a commit to happen after a prepare
commit. It is in fact safe to allow reads between a prepare and a
commit, and the s4 replication code can make use of that, so allow it.
2009-09-15 14:04:22 -07:00
Günther Deschner
1c2f4919ab tdb: fix c++ build warning.
Guenther
2009-09-07 11:57:10 +02:00
Rusty Russell
398d0c2929 lib/tdb: don't overwrite TDBs with different version numbers.
In future, this may happen, and we don't want to clobber them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-08-28 13:43:05 +10:00
Jeremy Allison
4fc9f9c3f9 Add define guards around otherwise unused variable.
Jeremy.
2009-08-06 11:47:08 -07:00
Rusty Russell
252f7da702 There is one signedness issue in tdb which prevents traverses of TDB records
over the 2G offset on systems which support 64 bit file offsets.  This fixes
that case.

On systems with 32 bit offsets, expansion and fcntl locking on these records
will fail anyway.  SAMBA already does '#define _FILE_OFFSET_BITS 64' in
config.h (on my 32-bit x86 Linux system at least) to get 64 bit file offsets.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-08-06 13:13:42 +10:00
Rusty Russell
a207cca1d3 tdb: don't alter tdb->flags in tdb_reopen_all()
The flags are user-visible, via tdb_get_flags/add_flags/remove_flags.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-07-31 14:40:28 +02:00
Rusty Russell
3b2f074bda tdb: Reimplementation of Metze's "lib/tdb: if we know pwrite and pread are thread/fork safe tdb_reopen_all() should be a noop".
This version just wraps the reopen code, so we still re-grab the lock and do
the normal sanity checks.

The reason we do this at all is to avoid global fd limits, see:
http://forums.fedoraforum.org/showthread.php?t=210393

Note also that this whole reopen concept is fundamentally racy: if the parent
goes away before the child calls tdb_reopen_all, the database can be left
without an active lock and another TDB_CLEAR_IF_FIRST opener will clear it.
A fork_with_tdbs() wrapper could use a pipe to solve this, but it's hardly
elegant (what if there are other independent things which have similar needs?).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-07-31 14:40:28 +02:00
Rusty Russell
fa91bc6719 tdb: Revert "lib/tdb: if we know pwrite and pread are thread/fork safe tdb_reopen_all() should be a noop"
This reverts commit e17df483fb.

tdb_reopen_all also restores the active lock, required for TDB_CLEAR_IF_FIRST.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
2009-07-31 14:40:28 +02:00
Rusty Russell
36c0f0f99a realloc() has that horrible overloaded free semantic when size is 0:
current code does a free of the old record in this case, then fail.
2009-07-30 13:10:33 -07:00
Rusty Russell
a88c281ddc If the record is at the end of the database, pretending it has length 1
might take us out-of-bounds.  Only pretend to be length 1 for the malloc.
2009-07-30 13:09:33 -07:00
Rusty Russell
760104188d tdb: fix locking error
54a51839ea "Make tdb transaction lock
recursive (samba version)" was broken: I "cleaned it up" and prevented
it from ever unlocking.

To see the problem:
	$ bin/tdbtorture -s 1248142523
	tdb_brlock failed (fd=3) at offset 8 rw_type=1 lck_type=14 len=1
	tdb_transaction_lock: failed to get transaction lock
	tdb_transaction_start failed: Resource deadlock avoided

My testcase relied on the *count* being correct, which it was.  Fixing that
now.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
2009-07-21 10:21:53 +02:00
Rusty Russell
54a51839ea Make tdb transaction lock recursive (samba version)
This patch replaces 6ed27edbcd and
1a416ff13c, which fixed the bug where traversals
inside transactions would release the transaction lock early.

This solution is more general, and solves the more minor symptom that nested
traversals would also release the transaction lock early.  (It was also suggestd in
Volker's comment in 6ed27ed).

This patch also applies to ctdb, if the traverse.c part is removed (ctdb's tdb
code never received the previous two fixes).

Tested using the testsuite from ccan (adapted to the samba code).  Thanks to
Michael Adam for feedback.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael Adam <obnox@samba.org>
2009-07-20 22:17:20 +02:00
Andrew Tridgell
a6cc04a200 overallocate all records by 25%
This greatly reduces the fragmentation of databases where records
tend to grow slowly by a small amount each time. The case where this
is most seen is the ldb index records. Adding this overallocation
reduced the size of the resulting database by more than 20x when
running a test that adds 10k users.
2009-06-01 13:13:07 +10:00
Andrew Tridgell
a386173fa1 auto-repack in transactions that expand the tdb
The idea behind this is to recover from badly fragmented free
lists. Choosing the point where the file expands is fairly arbitrary,
but seems to work well.
2009-06-01 13:11:39 +10:00
Andrew Tridgell
4b4fec65db make TDB_NOSYNC affect all the fsync/msync calls in transactions
During a transaction commit tdb normally uses fsync/msync calls to
make it crash safe. This can be disabled using the TDB_NOSYNC flag,
but it wasn't disabling all the code paths that caused a fsync/msync.
2009-05-28 16:08:28 +10:00
Jim McDonough
a91bcbccf8 Detect tight loop in tdb_find() 2009-05-21 16:29:48 -04:00
Tim Prouty
42c0931441 tdb: Remove unused variable 2009-03-31 16:24:07 -07:00
Howard Chu
b90863c0b7 Add tdb_transaction_prepare_commit()
Using tdb_transaction_prepare_commit() gives us 2-phase commits. This
allows us to safely commit across multiple tdb databases at once, with
reasonable transaction semantics

Signed-off-by: tridge@samba.org
2009-03-31 13:15:54 +11:00
Stefan Metzmacher
e17df483fb lib/tdb: if we know pwrite and pread are thread/fork safe tdb_reopen_all() should be a noop
The reason for tdb_reopen_all() is that the seek pointer on fds are shared between
parent and child.

metze
2009-02-25 13:57:11 -08:00
Andrew Tridgell
936d76802f imported the tdb_repack() code from CTDB
The tdb_repack() function repacks a TDB so that it has a single
freelist entry. The file doesn't shrink, but it does remove all
freelist fragmentation. This code originated in the CTDB vacuuming
code, but will now be used in ldb to cope with fragmentation from
re-indexing
2008-12-16 14:38:17 +11:00
Jelmer Vernooij
94855cd692 Move common libraries from root to lib/. 2008-09-17 14:11:12 +02:00