1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

84 Commits

Author SHA1 Message Date
Andrew Tridgell
a14fd9d29c make sure we don't increment rx_cnt for redirected packets, or for packets that have been requeued after a lockwait
(This used to be ctdb commit 92e5569407dba173a27e9645b4339ce3e2c00520)
2007-05-19 13:45:24 +10:00
Andrew Tridgell
28f2fc669b a better way to resend calls after recovery
(This used to be ctdb commit 444f52e134fc22aaf254d05c86d8b357ded876f4)
2007-05-19 00:56:49 +10:00
Andrew Tridgell
346dfc1bef - up rx_cnt on all packet types
- notice when a node becomes available again

(This used to be ctdb commit e05110dd6112e81f224937dfd7370d963ce9531a)
2007-05-18 23:23:36 +10:00
Ronnie Sahlberg
db4c479568 add dead node detection so that if a node does not generate any
keepalive traffic for x seconds   it is deemed dead


this triggers a recovery after a while if a ctdbd has been STOPPED    
but it doesnt recover automatically when the node reappears

(This used to be ctdb commit d6324afe0d13b5e21d06e347caca433c6b36a32a)
2007-05-18 19:19:35 +10:00
Andrew Tridgell
c105f6d789 - merge from ronnie
- fixed a memory leak found by dmitry

(This used to be ctdb commit ae87bf0005666b50850161c3843d6bc7cb5c8971)
2007-05-16 18:10:26 +10:00
Andrew Tridgell
38491de84f check for error on ctdb_ltdb_store
(This used to be ctdb commit c4a34bac4ad4d2f9699e08074668d25586e3c0da)
2007-05-15 10:16:59 +10:00
Andrew Tridgell
df49a66de4 ensure we propogate the correct rsn for a request dmaster
(This used to be ctdb commit 70c1c67db865db8a49b56e8e3e8fd56ec5063208)
2007-05-12 19:55:18 +10:00
Andrew Tridgell
36ccc10389 make sure we ignore requeued ctdb_call packets of older generations except for packets from the client
(This used to be ctdb commit facab105fbd7fe50f96bdd763ae50ddc54fbdacc)
2007-05-12 18:08:50 +10:00
Andrew Tridgell
63acf8ab95 - merge from ronnie
- increment rsn only in become_dmaster
- add torture check for rsn regression in ctdb_ltdb_store

(This used to be ctdb commit 8047506a08bb53ee01aa64f25c9f72839e1e2d68)
2007-05-11 10:33:43 +10:00
Ronnie Sahlberg
9eeb4f1a51 we must bump the rsn everytime we do a REQ_DMASTER or a REPLY_DMASTER
to make sure that the "merge records based on rsn during recovery" will
merge correctly.

this is extra important since samba3 never bumps the record when it 
writes new data to it !

(This used to be ctdb commit 857e67204065603592c2dbbadbd8667ebba9ccdb)
2007-05-11 06:08:17 +10:00
Andrew Tridgell
f8765b19bf - got rid of the complex hand marshalling in the recovery controls
- fixed the re-send of ctdb calls after a generation change

- fixed a reqid idr leak in controls

- removed the write_record test code

- use the new nonblock lockall code to prevent ctdbd from ever doing a
  blocking lock that could deadlock with smbd

- moved more of the recovery controls into ctdb_recover.c

(This used to be ctdb commit 565a21aa4f1e842309986ab97d6244801153deec)
2007-05-10 17:43:45 +10:00
Andrew Tridgell
15bc97cdaa better timeout handling for calls, controls and traverses
(This used to be ctdb commit 63346a6c59d4821b4c443939b5d88db8cd20f5fe)
2007-05-10 14:06:48 +10:00
Andrew Tridgell
2dc24c7d56 added a hopcount in ctdb_call
(This used to be ctdb commit 36d838801a2a2008c50322cdbfff65a308b1cd1a)
2007-05-01 13:25:02 +10:00
Andrew Tridgell
e21f69107f yay! finally fixed the bug that volker, ronnie and I have been chasing
for 2 days.

The main bug was in smbd, but there was a secondary (and more subtle)
bug in ctdb that the bug in smbd exposed. When we get send a dmaster
reply, we have to correctly update the dmaster in the recipient even
if the original requst has timed out, otherwise ctdbd can get into a
loop fighting over who will handle a key.

This patch also cleans up the packet allocation, and makes ctdbd
become a real daemon.

(This used to be ctdb commit 59405e59ef522b97d8e20e4b14310a217141ac7c)
2007-04-29 16:19:40 +02:00
Andrew Tridgell
1627a5d749 removed unnecessary variable
(This used to be ctdb commit ef0027faa631b00c7fc1a7c4538fbf3080248f0b)
2007-04-28 18:55:37 +02:00
Andrew Tridgell
6e09bfdaf9 much simpler redirect logic
(This used to be ctdb commit 95db9afa7dd039e1700e2f3078782f6ac66e9f51)
2007-04-28 18:18:33 +02:00
Andrew Tridgell
4b6d00974d added status all and debug all control operations
(This used to be ctdb commit 7f902f6c4270adc0543096c78415d335b88d6232)
2007-04-28 17:13:30 +02:00
Andrew Tridgell
353a82f87c factor out the packet allocation code
(This used to be ctdb commit 4cbaf22cc4bede8c95dd9b87e21bbe886307e23b)
2007-04-28 10:50:32 +02:00
Ronnie Sahlberg
916c55ec2f add a generation field to the pdu header.
this will allow a node to verify that a received pdu is sent from a node 
in the same generation instance of a cluster.

(This used to be ctdb commit e32d3ca9a622237c4e2622de98825c0962760d48)
2007-04-28 01:06:26 +10:00
Andrew Tridgell
c1a4b3c687 merge from ronnie
(This used to be ctdb commit 37ef65737571a4290a150c28cf2b0a6b221253fd)
2007-04-26 11:13:49 +02:00
Ronnie Sahlberg
bd62c78154 split the 32bit idr field into two.
store the idr as the high 16 bits and use a rotating counter for the low 
16 bits.

(This used to be ctdb commit 7c763b7b5e6ca54a6df4586893ddaf1b508b4c22)
2007-04-23 18:19:50 +10:00
Ronnie Sahlberg
42971d6565 add a comment that sometimes sending remote calls straight to the
lmaster instead of what the nodes think is the dmaster (which might be 
stale) improve performance.

(This used to be ctdb commit f535f79e6a2a6c6d07141b96e0b957fa93c684f4)
2007-04-23 17:05:09 +10:00
Andrew Tridgell
f651581460 added max_redirect_count status field
(This used to be ctdb commit ecea04741fe552aa409ab165d7c69ead9649986c)
2007-04-22 18:57:22 +02:00
Andrew Tridgell
9e8002dd67 fixed the reverse of the last bug - handle the case when the new dmaster is the lmaster
(This used to be ctdb commit b2599834d2ace7369a1b36f85fdf6eb62f047e30)
2007-04-22 18:19:49 +02:00
Andrew Tridgell
107d91e391 - when handling a record migration in the lmaster, bypass the usual
dmaster request stage, and instead directly send a dmaster
  reply. This avoids a race condition where a new call comes in for
  the same record while processing the dmaster request

- don't keep any redirect records during a ctdb call.  This prevents a
  memory leak in case of a redirect storm

(This used to be ctdb commit 59889ca0fd606c7d2156839383a09dfc5a2e4853)
2007-04-22 14:26:45 +02:00
Andrew Tridgell
520f7971cd - prevent sending dmaster requests to ourselves
- add some debug code

(This used to be ctdb commit 26ad1ec3a3b872520a735e4fe4f224f716643731)
2007-04-21 09:22:46 +02:00
Andrew Tridgell
00c706c2b8 - fixed a problem with packets to ourselves. The packets were being
processed immediately, but the input routines indirectly assumed
  they were being called as a new event (for example, a calling
  routine might queue the packet, then afterwards modify the ltdb
  record). The solution was to make self packets queue via a zero
  timeout.

- fixed unlinking of the socket in a exit in the lockwait code. Needed
  an _exit instead of exit so atexit() doesn't trigger

- print latency of lockwait delays

(This used to be ctdb commit 1b0684b4f6a976f4c5fe54394ac54d121810b298)
2007-04-20 17:58:37 +10:00
Andrew Tridgell
e5c5a91a7b - split out ctdb_ltdb_lock_fetch_requeue() into a simpler
ctdb_ltdb_lock_requeue() and a small wrapper

- use ctdb_ltdb_lock_requeue() to fix a possible hang in
  ctdb_reply_dmaster(), where the ctdb_ltdb_store() could hang waiting
  for a client. We now requeue the reply_dmaster packet until we have
  the lock

(This used to be ctdb commit 97cd7aa09ce3abbb5e3e965c5c81668e0c0133a5)
2007-04-19 17:43:27 +10:00
Andrew Tridgell
273a3944a8 - added a --torture option to all ctdb tools. This sets
CTDB_FLAG_TORTURE, which forces some race conditions to be much more
  likely. For example a 20% chance of not getting the lock on the
  first try in the daemon

- abstraced the ctdb_ltdb_lock_fetch_requeue() code to allow it to
  work with both inter-node packets and client->daemon packets

- fixed a bug left over in ctdb_call from when the client updated the
  header on a call reply

- removed CTDB_FLAG_CONNECT_WAIT flag (not needed any more)

(This used to be ctdb commit 7559dcd184666c3853127e3c8f5baef4fea327c4)
2007-04-19 16:27:56 +10:00
Andrew Tridgell
e830dfd18d much simpler fetch code!
fetch is now confined to the client code, no spcial code at
all in the daemon. 

(This used to be ctdb commit 3ec801c9717e250b902760862df188e03c9bdbf4)
2007-04-19 11:56:37 +10:00
Andrew Tridgell
d0af75d1fa - fully separate the client version of ctdb_call from the daemon
version. The client version is different enough that this is
  worthwhile

- enable local shortcut for client version of ctdb_call

- add idr_find_type(), with better error reporting in case of type
  mismatch

(This used to be ctdb commit 2388094c5f7b6ce003e86b05620c06217d63b49c)
2007-04-19 11:28:01 +10:00
Andrew Tridgell
b79e29c779 - make he packet allocation routines take a mem_ctx, which allows
us to put memory directly in the right context, avoiding quite a few
  talloc_steal calls, and simplifying the code

- make the fetch lock code in the daemon fully async

(This used to be ctdb commit d98b4b4fcadad614861c0d44a3854d97b01d0f74)
2007-04-19 10:37:44 +10:00
Andrew Tridgell
fde5a66531 avoid a deadlock the fetch_lock code. The deadlock could happen when
a client held the chainlock, and the daemon received a dmaster reply
at the same time. The daemon would not be able to process the dmaster
reply, due to the lock, but the fetch lock cannot make progres until
the dmaster reply is processed.

The solution is to not hold the lock in the client while talking to
the daemon. The client has to retry the lock after the record has
migrated. This means that forward progress is not guaranteed. We'll
have to see if that matters in practice.

(This used to be ctdb commit 737e5a1253cb048222c595a474aff71c99fc554f)
2007-04-19 10:03:20 +10:00
Andrew Tridgell
8f059f4d91 - merge volkers debug changes
- fixed memory leaks in the 3 packet receive routines. The problem was
  that the ctdb_call logic would occasionally complete and free a
  incoming packet, which would then be freed again in the packet
  receive routine. The solution is to make the packet a child of a
  temporary context in the receive routine then free that temporary
  context. That allows other routines to keep or free the packet if
  they want to, while allowing us to safely free it (via a free of the
  temporary context) in the receive function

(This used to be ctdb commit 304aaaa7235febbe97ff9ecb43875b7265ac48cd)
2007-04-18 11:20:24 +10:00
Volker Lendecke
27837c197a Clean up the call_states correctly
(This used to be ctdb commit 9fcc40a2ddd8f7f62bdd8b5ab71d182220e23af0)
2007-04-17 23:40:33 +02:00
Volker Lendecke
84d276a5be Some more debug and two memleak fixes
(This used to be ctdb commit 1e2802422794956827263265306952df5e69b377)
2007-04-17 23:03:30 +02:00
Volker Lendecke
6c597d3e83 typo
(This used to be ctdb commit bf2799504498ae452bb7244ae3eb6a51797afe9b)
2007-04-17 21:23:22 +02:00
Andrew Tridgell
7758511568 use the common cmdline code in ctdbd
add a basic debug system with -dXX

(This used to be ctdb commit af9f21cef79f888c57d3b50a23ca787c9567ce60)
2007-04-17 22:13:06 +10:00
Andrew Tridgell
0d91f8043e fixed a missing idr remove, and check the types after idr_find()
(This used to be ctdb commit 74028de89d18bfedcea17415d6d6dc2f7c69b076)
2007-04-17 20:03:01 +10:00
Andrew Tridgell
6fce6e419a update destination in a redirect reply
(This used to be ctdb commit b2836974ad270e823c630e3acf12327b53c37d88)
2007-04-17 17:11:12 +10:00
Andrew Tridgell
eba2a4b88c start using ctdb_ltdb_lock_fetch_requeue()
(This used to be ctdb commit f89ab3a06b4677f56c92768c3a8ae5ec9f5abbc2)
2007-04-17 16:54:03 +10:00
Andrew Tridgell
1a1aedf78f when we get a lmaster request, skip updating the header when we are also the new dmaster
(This used to be ctdb commit 6c48dcc5df7b855fc8e0774c9572c7b2af618348)
2007-04-17 16:35:28 +10:00
Andrew Tridgell
296b0c2a20 - send the record header from the client to the daemon when doing a
fetch, to avoid the daemon re-reading it

- suffix the database name with the node name so that testing on
  loopback doesn't result in a name collision in the database open

(This used to be ctdb commit ad30a4db75450643ff146c40faa306a021de3dd2)
2007-04-17 16:20:32 +10:00
Andrew Tridgell
6f9b29da22 - removed the non-daemon mode from ctdb, in order to simplify the
code. It may be added back later once everything is working nicely,
  or simulated using a in-process pipe instead of a unix domain socket

- rewrote the ctdb_fetch_lock() code to follow the new design

(This used to be ctdb commit 5024dd1f305fe1ecc262db2240c56f773b4f28f0)
2007-04-17 14:52:51 +10:00
Ronnie sahlberg
bccf3c7e8e create symbols for fetch lock response status
(This used to be ctdb commit d8243f474897dc65fb7286225b07bdf48b6faed0)
2007-04-17 12:42:52 +10:00
Ronnie sahlberg
11b5345afc finalize fetch lock changes to get rid of the record handle
(This used to be ctdb commit 36c1e98a5533214d5507699dc5d8bdec35cb28c2)
2007-04-17 12:36:31 +10:00
Ronnie sahlberg
481e029768 initial change to remove store_unlock pdu and use tdb chainlock in the client
(This used to be ctdb commit 87dd265d2c61125ca2fa922cfcf9371a234fff0c)
2007-04-17 11:34:45 +10:00
Andrew Tridgell
34bf25e227 - fix includes to work in both samba4 and ctdb standalone
- when we do a store_unlock the lock record becomes unlocked, so we
  must destroy it (or we leak memory)

(This used to be ctdb commit d85955640e670dd580073da96b25fb8a10c08d18)
2007-04-16 10:21:44 +10:00
Andrew Tridgell
65cdf2297a private -> private_data for samba3
(This used to be ctdb commit 080b6901173afb2ad618dd0621876ff478c7d6e5)
2007-04-13 20:38:24 +10:00
Ronnie sahlberg
03c49c0526 add store_unlock pdu's for the domain socket.
note that the store_unlock does not actually do anything yet apart from passing the pdu from client to daemon   and daemon responds.

next is to make sure the daemon actually stores the data in a database

(This used to be ctdb commit 167d6993e78f6a1d0f6607ef66925a14993ae6a1)
2007-04-13 09:41:15 +10:00