1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

80 Commits

Author SHA1 Message Date
Martin Schwenke
3f37b4418e ctdbd: Update confusing log message
Inactive can also mean stopped.  To add information, just print the
flags instead.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit a8605f7e06076e7edf84e0cc160fd3d9ab5c4b64)
2013-05-23 16:18:23 +10:00
Michael Adam
ce0916f61b ltdb_server: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 61f17e53576197def46bc61fdf0cdb5282333a3e)
2013-04-24 18:48:47 +10:00
Michael Adam
fd01c464d1 Fix a severe recovery bug that can lead to data corruption for SMB clients.
Problem:
Recovery can under certain circumstances lead to old record copies
resurrecting: Recovery selects the newest record copy purely by RSN. At
the end of the recovery, the recovery master is the dmaster for all
records in all (non-persistent) databases. And the other nodes locally
hold the complete copy of the databases. The bug is that the recovery
process does not increment the RSN on the recovery master at the end of
the recovery. Now clients acting directly on the Recovery master will
directly change a record's content on the recmaster without migration
and hence without RSN bump.  So a subsequent recovery can not tell that
the recmaster's copy is newer than the copies on the other nodes, since
their RSN is the same. Hence, if the recmaster is not node 0 (or more
precisely not the active node with the lowest node number), the recovery
will choose copies from nodes with lower number and stick to these.

Here is how to reproduce:

- assume we have a cluster with at least 2 nodes
- ensure that the recmaster is not node 0
  (maybe ensure with "onnode 0 ctdb setrecmasterrole off")
  say recmaster is node 1
- choose a new database name, say "test1.tdb"
  (make sure it is not yet attached as persistent)
- choose a key name, say "key1"
- all clustere nodes should ok and no recovery running
- now do the following on node 1:

1. dbwrap_tool test1.tdb store key1 uint32 1
2. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 1
3. ctdb recover
4. dbwrap_tool test1.tdb store key1 uint32 2
5. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 2
4. ctdb recover
7. dbwrap_tool test1.tdb fetch key1 uint32
   ==> 1
   ==> BUG

This is a very severe bug, since when applied to Samba's locking.tdb
database, it means that for SMB clients on clustered Samba there is
the potential for locking out oneself from previously opened files
or even worse, data corruption:

Case 1: locking out

- client on recmaster opens file
- recovery propagates open file handle (entry in locking.tdb) to
  other nodes
- client closes file
- client opens the same file
- recovery resurrects old copy of open file record in locking.tdb
  from lower node
- client closes file but fails to delete entry in locking.tdb
- client tries to open same file again but fails, since
  the old record locks it out (since the client is still connected)

Case 2: data corruption

- clien1 on recmaster opens file
- recovery propagates open file info to other nodes
- client1 closes the file and disconnects
- client2 opens the same file
- recovery resurrects old copy of locking.tdb record,
  where client2 has no entry, but client1 has.
- but client2 believes it still has a handle
- client3 opens the file and succees without
  conflicting with client2
  (the detached entry for client1 is discarded because
   the server does not exist any more).
=> both client2 and client3 believe they have exclusive
  access to the file and writing creates data corruption

Fix:

When storing a record on the dmaster, bump its RSN.

The ctdb_ltdb_store_server() is the central function for storing
a record to a local tdb from the ctdbd server context.
So this is also the place where the RSN of the record to be stored
should be incremented, when storing on the dmaster.

For the case of the record migration, this is currently done in
ctdb_become_dmaster() in ctdb_call.c, but there are other places
such as in recovery, where we should bump the RSN, but currently
don't do it.

So moving the RSN incrementation into ctdb_ltdb_store_server fixes
the recovery-record-resurrection bug.

Signed-off-by: Michael Adam <obnox@samba.org>
Reviewed-By: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit feb1d40b21a160737aead22e398f3c34ff3be8de)
2013-04-17 21:16:17 +10:00
Volker Lendecke
2d1d5d312e Add a \n to an error message
(This used to be ctdb commit 9be3b23adbfc844b71bf1d4ddf0fbc3b269f15fa)
2012-10-25 17:11:15 +11:00
Amitay Isaacs
a00e50e503 ctdbd: Replace lockwait with locking API and remove ctdb_lockwait.c
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 2126795153dacb255e441abcb36ee05107b6282a)
2012-10-20 02:48:44 +11:00
Gregor Beck
3fd0b8a5a5 ctdbd: refuse attaching with "persistent" to a non-persistent db and v.v.
Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit 1ebbaa620b3cfb9ff373828e4aaa84246cf3ec25)
2012-07-03 11:30:04 +02:00
Ronnie Sahlberg
59565c05cf STATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount
and add mechanisms to dump it using the ctdb dbstatistics command

(This used to be ctdb commit 8307c70ed98996b430c470e9641a09fdeeb81bd8)
2012-06-13 16:19:18 +10:00
Amitay Isaacs
4392591555 Remove explicit include of lib/tevent/tevent.h.
Signed-off-by: Amitay Isaacs <amitay@gmail.com>

(This used to be ctdb commit 0681014ca5ed2a9b56f63fdace7f894beccf8a9a)
2012-04-13 17:28:14 +10:00
Ronnie Sahlberg
fa3a06246a STICKY: add prototype code to make records stick to a node to "calm" down if they are found to be very hot and accessed by a lot of clients.
This can improve performance and stop clients from having to chase a rapidly migrating/bouncing record

(This used to be ctdb commit d0d98f7e45e5084b81335b004d50bddc80cdc219)
2012-03-20 17:12:19 +11:00
Ronnie Sahlberg
cdc232f2dd READONLY: dont schedule for fast vacuum deletion if any of the readonly record flags are set
(This used to be ctdb commit b3307d78fd15f446b423f8cdd1e403f89fbe8ac8)
2012-02-21 06:54:09 +11:00
Ronnie Sahlberg
61762a96e3 ReadOnly: Make sure we dont try to fast-vacuum records that are set for readonly delegation
(This used to be ctdb commit 303134cf10a08ce61954d5de9025d9bbcb5f75ef)
2012-02-20 21:13:46 +11:00
Ronnie Sahlberg
73f8be16c6 ReadOnly: add per-database statistics to view how much delegations/revokes we have
(This used to be ctdb commit 751ed46197661eb841042ab6a02855a51dd0b17c)
2012-02-08 15:29:27 +11:00
Michael Adam
6f0b22234f ctdb_ltdb_store_server: when storing a record that is not to be scheduled for deletion, remove it from the delete queue
Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 489148e465e2b8aed87ea836e3518f43490671ca)
2011-12-23 17:39:01 +01:00
Martin Schwenke
f186dd90b6 Move some common functions to common/ctdb_ltdb.c
Move identical copies of ctdb_null_func(), ctdb_fetch_func(),
ctdb_fetch_with_header_func() from ctdb_client.c and
ctdb_ltdb_server.c to somewhere common.

This is in the context of wanting to run CCAN-style tests where most
of the ctdbd code is just included in the test program.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit 126cb0d369b2b1aed63801dc4ba0554399e8b7e4)
2011-11-11 14:31:50 +11:00
Martin Schwenke
3b47e5fa49 Fix typo in ctdb_ltdb_store_server()
The if statement uses ret but means to use ret2.

Signed-off-by: Martin Schwenke <martin@meltin.net>

(This used to be ctdb commit f40101a615f8b9826a484e4697bfea6ee2b9ba88)
2011-11-09 14:55:07 +11:00
Ronnie Sahlberg
0e79b2d1e8 Record Fetch Collapse: Collapse multiple fetch request into one single request.
When multiple clients fetch the same record concurrently, send only one single
fetch across the network and deferr all other fetches locally.
This improves performance for hot records and reduces cpu load on ctdb.

(This used to be ctdb commit 82d6946ad8b3348e8b9d3d971f24925ade02d1be)
2011-11-08 16:08:28 +11:00
Ronnie Sahlberg
90c0c235fa Remove debug message
(This used to be ctdb commit db0fdc2281c4742113c92d697371c37815db35a0)
2011-10-24 12:21:55 +11:00
Ronnie Sahlberg
0dc5584101 Merge branch 'master-readonly-records' into foo
Conflicts:

	Makefile.in
	tools/ctdb.c

(This used to be ctdb commit 0fedef0ffba4178126eee9544c5e2db52f5db893)
2011-09-12 09:34:34 +10:00
Michael Adam
a3e0079568 Add a tunable "AllowClientDBAttach" with default value 1.
When set to 0, clients will not be able to attach to databases
via the db_attach control. This might can be useful for maintenance
where ctdb should be kept running but clients should not be able
to modify databases.

(This used to be ctdb commit ddfeecda87955b4e46777599f678e6926d37f4c4)
2011-09-05 16:17:39 +10:00
Ronnie Sahlberg
206a3c0c66 ReadOnly: add a new control to activate readonly lock capability for a database.
let all databases default to not support this  until enabled through this control

(This used to be ctdb commit 908a07c42e5135a3ba30a625fc4f4e4916de197a)
2011-09-01 11:08:18 +10:00
Ronnie Sahlberg
9729d3e339 ReadOnly: Check the readonly flag instead of whether the tdb pointer is NULL or not
(This used to be ctdb commit 01314c2cb3a480917d6a632b83c39f0a48bba0e7)
2011-08-23 10:41:52 +10:00
Ronnie Sahlberg
1441b77cce ReadOnly: Add "readonly" flag to the ctdb_db_context to indicate if this database supports readonly operations or not. Add a private lock-less tdb file to the ctdb_db_context to use for tracking delegarions for records
Assume all databases will support readonly mode for now and se thte flag for all databases. At later stage we will add support to control on a per database level whether delegations will be supported or not.

(This used to be ctdb commit 502f86f79944df4bac9094f716e54110c511dc24)
2011-08-23 10:24:26 +10:00
Ronnie Sahlberg
00a870f759 ReadOnly records: Add a new RPC function FETCH_WITH_HEADER.
This function differs from the old FETCH in that this function will also fetch the record header and not just the record data

(This used to be ctdb commit c7196d16e8e03bb2a64be164d15a7502300eae0e)
2011-08-23 10:06:59 +10:00
Ronnie Sahlberg
cee8c4be94 Deferred attach: create the timed event as a child context of the da context we want to delete.
Othwervise the da context can be timed out and talloc_free()d
but the event for this already freed object will still trigger,
causing a talloc error and shutdown.

CQ S1022515

(This used to be ctdb commit 2fd27bdedb1e0d6558c07e1b74fc8e70ddf593dc)
2011-03-16 16:08:45 +11:00
Ronnie Sahlberg
3cc230b5ee Dont allow clients to connect to databases untile we are well past and through
the initial recovery phase

CQ S1022412

Signed-off-by: Michael Adam <obnox@samba.org>

(This used to be ctdb commit e02bbd915b7151c615ff64f09ad9abc9720bef7d)
2011-03-14 13:35:53 +01:00
Michael Adam
e77ed68c1a ctdb_ltdb_store_server: honour the AUTOMATIC record flag
Do not delete empty records that carry this flag but store
them and schedule them for deletetion. Do not store the flag
in the ltdb though, since this is internal only and should not
be visible to the client.

(This used to be ctdb commit f898ff21fa338358179e79381215b13a6bc77c53)
2011-03-14 13:35:51 +01:00
Michael Adam
6506314c4a ctdb_ltdb_store_server: add ability to send SCHEDULE_FOR_DELETION control to ctdb_ltdb_store.
(This used to be ctdb commit ab2711701999a5ecc23a36b3d9ba8e94f92e4c87)
2011-03-14 13:35:51 +01:00
Michael Adam
1924d0d365 ctdb_ltdb_store_server: Improve debug message in ctdb_ltdb_store when store or delete fails.
(This used to be ctdb commit 2559b2a45eb11834da3b0e0963e24351c8b7477f)
2011-03-14 13:35:51 +01:00
Michael Adam
7088e2144f ctdb_ltdb_store_server: always store the data when ctdb_ltdb_store() is called from the client
This also fixes a segfault since ctdb_lmaster uses the vnn_map.

(This used to be ctdb commit e58c8f51f27e468897af5210b80e5f5f45c3c4bb)
2011-03-14 13:35:51 +01:00
Michael Adam
6384512eb7 ctdb_ltdb_store_server: implement fastpath vacuuming deletion based on VACUUM_MIGRATED flag.
When the record has been obtained by the lmaster as part of the vacuuming-fetch
handler and it is empty and never been migrated with data, then such records
are deleted instead of being stored. These records have automatically been
deleted when leaving the former dmaster, so that they vanish for good when
hitting the lmaster in this way. This will reduces the load on traditional
vacuuming.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit c9b65f3602f51bcbf0e6d82c12076c31e4aebe38)
2011-03-14 13:35:51 +01:00
Michael Adam
7602f9a9af ctdb_ltdb_store_server: delete an empty record that is safe to delete instead of storing locally.
When storing a record that is being migrated off to another node
and has never been migrated with data, then we can safely delete it
from the local tdb instead of storing the record with empty data.

Note: This record is not deleted if we are its lmaster or dmaster.

Pair-Programmed-With: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 3cca0d4b48325d86de2cb0b44bb7811a30701352)
2011-03-14 13:35:50 +01:00
Michael Adam
9e8d6b82b5 server: Use the ctdb_ltdb_store_server() in the ctdb daemon for non-persistent dbs
This is realized by adding a ctdb_ltdb_store_fn function pointer to the db
context and filling it in the attach procedure for non-persistent dbs.

(This used to be ctdb commit df49ec44de80affa5ccc637dec12a20a26e8706e)
2011-03-14 13:35:50 +01:00
Michael Adam
7948be380c server: create a server variant ctdb_ltdb_store_server() of ctdb_ltdb_store().
This is supposed to contain logic for deleting records that are safe
to delete and scheduling records for deletion. It will be called in
server context for non-persistent databases instead of the standard
ctdb_ltdb_store() function.

(This used to be ctdb commit 23631ffc152486aed9ce5b69a391e52bc4947833)
2011-03-14 13:35:50 +01:00
Michael Adam
b9c9b989ce When attaching to a non-persistent DB, initialize the delete_queue.
(This used to be ctdb commit 0aff1b61dd1b683c6739478008a5b014b933df50)
2011-03-14 13:35:45 +01:00
Ronnie Sahlberg
b611de93ad ATTACH_DB: simplify the code slightly and change the semantics to only
refuse a db attach during recovery IF we can associate the request from a
genuine real client instead of deciding this on whether client_id is zero or

This will suppress/avoid messages like these :
DB Attach to database %s refused. Can not match clientid...

(This used to be ctdb commit b05ccf366df985e0a3365aacc75761ebd438deaf)
2011-03-01 12:13:46 +11:00
Ronnie Sahlberg
8acb677c9c Deferred attach : at early startup, defer any db attach calls until we are out of recovery.
(This used to be ctdb commit eeaabd579841f60ab2c5b004cbbb1f5de2bfe685)
2011-03-01 12:13:34 +11:00
Ronnie Sahlberg
e00ca55fa4 Dont return error if trying to set db priority on a db that does not yet exist.
Just treat as a nop.

When the database is created later it will get its priority set properly.

(This used to be ctdb commit 05c934b10ad2690be9d75c9033a0b849bf16455d)
2011-02-25 10:25:01 +11:00
Ronnie Sahlberg
8eb5cfc053 Add support to create TDB databases using the new jenkins hash.
SRVID for the control to attach to a database is used to pass
tdb flags from samba to ctdb when samba attached to a database.
This has been used earlier for TDB_NOSYNC flag.

Add TDB_INCOMPATIBLE_HASH as a supported tdb flag to store in the
SRVID field when attaching to a database.

This allows samba to control if ctdb should create databases using the
new jenkins hash, or using the old hash.
This only affects new databases when they are initially created.
Existing databases remain using the old hash when attached to.

(This used to be ctdb commit e0eda175ac979828b376e8a6779b4608af52eb32)
2010-10-27 08:10:07 +11:00
Rusty Russell
f93440c4b7 event: Update events to latest Samba version 0.9.8
In Samba this is now called "tevent", and while we use the backwards
compatibility wrappers they don't offer EVENT_FD_AUTOCLOSE: that is now
a separate tevent_fd_set_auto_close() function.

This is based on Samba version 7f29f817fa.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 85e5e760cc91eb3157d3a88996ce474491646726)
2010-08-18 09:16:31 +09:30
Ronnie Sahlberg
641da4c691 We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus
( a child process might be holding the lock, but not the main daemon)

(This used to be ctdb commit 9b4a83e49c5df80df8498b7384c5f53f390c1d9d)
2010-06-09 15:13:22 +10:00
Ronnie Sahlberg
fa618aa66a add additional logging when tdb_chainunlock() fails
so we can see where it was called from when it fails

(This used to be ctdb commit 0c091b3db6bdefd371787d87bc749593ea8e3c76)
2010-06-09 14:37:16 +10:00
Ronnie Sahlberg
4c722fe34c fix a conflict in the merge from rusty
Merge commit 'rusty/ctdb-no-setsched'

Conflicts:

	server/ctdb_vacuum.c

(This used to be ctdb commit b4365045797f520a7914afdb69ebd1a8dacfa0d9)
2009-12-17 08:18:04 +11:00
Rusty Russell
af2613e16f ctdb: use mlockall, cautiously
We don't want ctdb stalling due to paging; this can be far worse than
scheduling delays.  But if we simply do mlockall(MCL_FUTURE), it
increases the risk that mmap (ie. tdb open) or malloc will fail,
causing us to abort.

This patch is a compromise: we mlock all current pages (including
10k of future stack for expansion) and then relock when a client
asks us to open a TDB.  We warn, but don't exit, if it fails.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


(This used to be ctdb commit 82f778e85440bc713d3f87c08ddc955d3cfce926)
2009-12-16 20:57:20 +10:30
Rusty Russell
f148735928 Add --valgringing flag instead of --nosetsched
The do_setsched was being tested for whether to mmap tdbs: let's make it
explicit.  We can also happily move the kill-child eventscript hack under
this flag.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> 


(This used to be ctdb commit 2ee86cc1f311d7b7504c7b14d142b9c4f6f4b469)
2009-12-16 20:59:15 +10:30
Stefan Metzmacher
f1f0af2b67 server: add CTDB_CONTROL_DB_SET_HEALTHY and CTDB_CONTROL_DB_GET_HEALTH
metze

(This used to be ctdb commit 7332d900538f0cbcd953a723417a0fe31dc9807c)
2009-12-16 08:08:29 +01:00
Stefan Metzmacher
94bc40307a server: Use tdb_check to verify persistent tdbs on startup
Depending on --max-persistent-check-errors we allow ctdb
to start with unhealthy persistent databases.

The default is 0 which means to reject a startup with
unhealthy dbs.

The health of the persistent databases is checked after each
recovery. Node monitoring and the "startup" is deferred
until all persistent databases are healthy.

Databases can become healthy automaticly by a completely
HEALTHY node joining the cluster. Or by an administrator
with "ctdb backupdb/restoredb" or "ctdb wipedb".

metze

(This used to be ctdb commit 15f133d5150ed1badb4fef7d644f10cd08a25cb5)
2009-12-16 08:06:10 +01:00
Stefan Metzmacher
b74918b465 server: open /var/ctdb/state/persistent_health.tdb.X on startup
This node internal tdb will store the HEALTH state of persistent
tdbs.

metze

(This used to be ctdb commit cbda4666be88c11a810a192a70667b57f773ace1)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
9a96ae0c97 server: only do the mkdir() calls for db_directory* once at the start
metze

(This used to be ctdb commit f30f33685db50860b6cd6fd1b6bdc3066620a78f)
2009-12-16 08:03:56 +01:00
Stefan Metzmacher
cda5884854 server: create tdbs with 0600 permissions in ctdb_local_attach()
metze

(This used to be ctdb commit 6529a1328b9ec304ad306674651b2a67e4426e23)
2009-12-16 08:03:55 +01:00
Stefan Metzmacher
003985acfd ctdb: pass TDB_DISALLOW_NESTING to all tdb_open/tdb_wrap_open calls
metze

Signed-off-by: Stefan Metzmacher <metze@samba.org>

(This used to be ctdb commit 1635e931b909c66eb3b1f5357e3a549b1a0da70d)
2009-12-16 08:03:55 +01:00