samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00

Author	SHA1	Message	Date
Martin Schwenke	095fac9491	scripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions This improves maintainability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e2aaa64925cca359c71520e01a18fc9461b0da4d)	2013-01-08 15:18:47 +11:00
Martin Schwenke	d801b02681	scripts: Make drop_all_public_ips() more robust Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures IPs are deleted even if they have the wrong netmask or are on the wrong interface. Factoring out some of the code will allow it to be used elsewhere. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367)	2013-01-08 15:18:47 +11:00
Martin Schwenke	4157efdcbb	scripts: debug-hung-script.sh doesn't need functions/loadconfig Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8507303b525d20c74e8ec4e7c4f5f275945cd3b6)	2013-01-08 15:18:47 +11:00
Martin Schwenke	f5226c9a75	scripts: statd-callout should calculate CTDB_BASE if it is not set Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 376015ba5ad6b7703ae9949a1d40a0c72dfaba0c)	2013-01-08 15:18:46 +11:00
Martin Schwenke	297b98d5b6	eventscripts: Each script should set CTDB_BASE if it is not set This makes it easier to run the scripts externally. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 740ea8ea5084149c8b552a01ee1c98c558b12384)	2013-01-08 15:18:46 +11:00
Martin Schwenke	0eb757329e	scripts: Move drop_all_public_ips() to the functions file ... so it can be improved and used elsewhere. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b23c30253cc9eb274b895cac0f8c65245ba0a200)	2013-01-08 15:18:46 +11:00
Martin Schwenke	217ad07b72	Eventscripts: Change the default reconfigure action to do nothing A default action of restarting the service doesn't obey the principle of least surprise. It cause the NFS service to be implicitly reintroduced. This allows no-op functions to be removed from some eventscripts and service restart functions to be added to others. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0)	2013-01-07 10:35:39 +11:00
Martin Schwenke	3d408ca1e1	Eventscripts: Do not restart NFS on reconfigure It looks like this restart was accidentally reintroduced in commit fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure became unset so the default action of restarting the service would occur. From there cleanups have explicitly reintroduced it and carried it through the code. Also update the unit tests affected by this change. The restart was originally removed in commit bc481c3f1a44c50648488c4f8a7f15ec395d446f. The default reconfigure action of restarting a service is clearly suboptimal and will be addressed in a separate patch. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37)	2013-01-07 10:35:39 +11:00
Martin Schwenke	df7152fe87	Initscript: when checking status, print output of "ctdb ping" if it fails At the moment the caller has no idea why it thinks CTDB isn't running and we can't debug failures... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 776590bf84d221092298346a28d7fc0552a67c9d)	2013-01-07 10:35:38 +11:00
Michael Adam	b64e237f9b	events/50.samba: fix testparm background update creating the smb.conf cache with "-v" results in a cache file that fails to load with "testparm -s ..." later on due to "copy = " not being processable. (Copying the empty service name fails). Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 81788cfabe960497b050c5ee4e4e487ee061012a)	2013-01-05 01:15:19 +01:00
Martin Schwenke	8fad7670f1	Eventscripts: 10.interface should list configured interfaces The current code lists available interfaces. If IPs are configured in some other way than the public addresses file (e.g. ctdb addip) and their interfaces default to being marked down then, since down interfaces are not available, these interfaces can never be marked up. The configured interfaces should be listed instead. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d8f010355b715e49709836e057a5d0f110919897)	2012-11-19 15:54:50 +11:00
Martin Schwenke	f082f4006f	Eventscripts: 10.interface startup event should only process interfaces once Provided that monitor_interfaces() sets the state of each interface, there's no need to mark all interfaces as up before running monitor_interfaces() in the startup event. monitor_interfaces() will set the true status of each interface anyway. The duplication is unnecessary and may cause extra action in the recovery daemon because the state of some interfaces is changed an extra time. Instead, add a comment at the top of the loop in monitor_interfaces() to warn against early loop exits. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f243a916ee71013f7402b9c396c2ead88eb3aab0)	2012-11-14 10:57:48 +11:00
Volker Lendecke	295dfa771a	Avoid a bashism in 60.ganesha This file is #!/bin/sh. On sn-devel at least, with this /bin/sh the shell does not like == for string equality. (This used to be ctdb commit e2213db479129ce9c2b2fb88ec8c53cbd33d54b3)	2012-10-24 18:31:16 +11:00
Martin Schwenke	9f6b30a517	scripts: Refactor logging code in initscript and functions file Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5ee242c949a98bb7397e0f7368b20d44c06fe772)	2012-10-18 20:05:43 +11:00
Martin Schwenke	ad8eb45fe2	initscript: Check that rc.ctdb is executable before running it Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 59a47c0674bacfebc17a1b44f0244727bf2fa7a4)	2012-10-18 20:05:43 +11:00
Martin Schwenke	66d0aba85b	Revert "Eventscripts - add facility to 10.interface to delete unmanaged IPs" This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099. This is dangerous and, on reflection, I can't see it being useful. There are often permanent IPs on interfaces that CTDB shares with its public IPs. (This used to be ctdb commit 16aba4eb620844626a1c71c58b51658caf44dea6)	2012-10-18 20:05:42 +11:00
Martin Schwenke	34a6c07e99	Eventscripts: "recovered" event should not fail on NATGW failure The recovery process has no protection against the "recovered" event failing, so this can cause a recovery loop. Instead of failing the "recovered" event, add a "monitor" event and fail that instead. In this case the failure semantics are well defined. A separate patch should ban nodes if the "recovered" event fails for an unknown reason. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f)	2012-10-18 20:05:42 +11:00
Martin Schwenke	8d7562f3f8	common: Debug ctdb_addr_to_str() using new function ctdb_external_trace() We've seen this function report "Unknown family, 0" and then CTDB disappeared without a trace. If we can reproduce it then this might help us to debug it. The idea is that you do something like the following in /etc/sysconfig/ctdb: export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh" When we hit this error than we call out to gcore to get a core file so we can do forensics. This might block CTDB for a few seconds. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7895bc003f087ab2f3181df3c464386f59bfcc39)	2012-10-18 20:05:42 +11:00
Michael Adam	6372592982	config/functions: fix a comment ctdb_check_counter_limits does not fail but succeed if count >= limit Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit af540ef728303b4a0a188b17c695e9aefab34489)	2012-10-17 21:56:58 +02:00
Amitay Isaacs	cc763c455d	doc: Add info about execute permissions on event scripts Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 25d886060b138bc5e78fe93d7bebe3990264f29d)	2012-10-17 11:39:39 +11:00
Amitay Isaacs	efe77d0e35	doc: Fix documentation for setup event Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 36d25e96a2f8ae1461c5a708a2922f0475a39900)	2012-10-17 11:39:39 +11:00
Amitay Isaacs	ce210f6978	scripts: Remove duplicate code from init script to set tunables The tunable variables defined in CTDB configuration file are currently set up from init script as well as part of "setup" event in 00.ctdb eventscript. Remove the duplication of this code and set tunable variables only from setup event. During the "setup" event, it's possible that ctdb tool commands can timeout if CTDB daemon is not ready. To guard against such eventuality, wait till "ctdb ping" command succeeds before executing any other ctdb tool commands. Signed-off-by: Amitay Isaacs <amitay@gmail.com> (This used to be ctdb commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2)	2012-10-17 11:32:41 +11:00
Martin Schwenke	74843dadad	Eventscripts: Add support for "reconfigure" pseudo-event for policy routing This rebuilds all policy routes and can be used if the configuration changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c185ffd2822fcee26d07398464c59b66c61f53fa)	2012-10-11 12:10:45 +11:00
Martin Schwenke	d33b12a1c5	Eventscripts: Add service-start and service-stop pseudo-events Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit be4ad110ede9981b181ac28f31ffd855a879d5df)	2012-10-10 14:54:53 +11:00
Martin Schwenke	2d719e5c84	eventscripts: Auto-start/stop services in background If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done in the background with logging. Fix some unit tests for samba and winbind. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a)	2012-10-03 08:48:23 +10:00
Martin Schwenke	f3ae31e741	Eventscripts: split 50.samba into 49.winbind and 50.samba winbind and samba can be separately managed. This makes the service starting and stopping code way too complicated, and even adds a small amount of complexity to the monitoring code. The sensible option is to split this eventscript in two. There are two potentially backward incompatible changes here: * Functionality has been removed that allowed 50.samba to manage winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf "security" parameter was set to "ADS" or "DOMAIN". Maintaining this functionality would have required moving the testparm-related code to the functions file, deciding where the cache file should go, and then calling it from both 49.winbind and 50.samba. This feature wasn't of great value and asking administrators to set an extra variable in exchange for code simplicity seems like a reasonable deal. * External code will need to be changed if it calls 50.samba directly with winbind-related expectations. This is fairly obvious! Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34535ae64420926b9a3bf7d453fed4e6f4c90115)	2012-10-03 08:46:32 +10:00
Martin Schwenke	e2d4250731	Initscript: Kill any existing ctdbd processes if the ping succeeds Initialising a new ctdbd will destroy the Unix domain socket so existing processes will be useless anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 043ef77086797a703aec436a26a05c56a1bcbf2b)	2012-10-02 17:37:53 +10:00
Martin Schwenke	530415b671	Eventscripts: Indent error when a route delete fails in 11.per_ip_routing This puts it under the umbrella of the previous warning that should also have been printed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958)	2012-09-11 12:52:22 +10:00
Martin Schwenke	0d35a8c439	eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocated Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0d0a6f19960f233224970b8d5d19b0e37222616)	2012-09-11 12:52:22 +10:00
Martin Schwenke	e1348221d6	eventscripts: Print a warning on failure to delete a routing rule del_routing_for_ip() currently fails silently, which could hide real errors. In add_routing_for_ip() we don't want to see any error when calling del_routing_for_ip(), since we don't expect the rule to be there. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 30d69defa7e97ab5e3ba0492a27868dde2616494)	2012-09-11 12:52:22 +10:00
Martin Schwenke	5bbf4b6e30	Eventscripts: 13.per_ip_routing should always fail if config is missing Currently, if the configuration file is specified by $CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the absent) monitor event "succeeds", so the state of a node will flip-flop. Instead of this, if the configuration file is missing then fail early on for all events. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c64c6c77c3f6aa2898e5a575547b587bea868c76)	2012-07-30 15:57:56 +10:00
Martin Schwenke	ff0830037e	Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing" When the configuration file is missing this causes the node to flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor event here). Will reimplement this properly. This reverts commit 351ca413eec460330571ca8b01ad269728fe15df. (This used to be ctdb commit 5277d749c9111716fd723647d5421907476422bf)	2012-07-30 15:57:56 +10:00
Martin Schwenke	35de9f2583	Eventscripts: Clean up 11.routing The loops can all be done without cat or grep. The pair of loops in updateip is combined into a single loop. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 96fdda124f5511fb76190e7c7a7f0b98e6b01a31)	2012-07-30 11:24:59 +10:00
Martin Schwenke	748c3d7eb6	Initscript: clean up drop_all_public_ips() This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset. This is OK because that's not an interesting code path. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5b2725d1ae052e848c2487cb10c5393a877d118c)	2012-07-26 22:05:43 +10:00
Martin Schwenke	4d4768ef26	statd-callout: Fix a bug in the calculations of $STATE It is just meant to be even, so divided and multiplied by 2. Use $(( )) to make it more readable. While touching this code, make the related calculation a bit more readable too. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 25d45e69f4ffc2b26061ac13038d52a353e79e61)	2012-07-26 21:24:15 +10:00
Martin Schwenke	6717698cba	Eventscripts: Default route on NAT gateway should have a metric of 10 At the moment routes from 11.routing can fail to be added because they conflict with the default route added by 11.natgw. NAT gateway is meant to be a last resort, so routes from 11.routing should override it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 624f4677e99ed1710a0ace76201150349b1a0335)	2012-07-26 21:14:58 +10:00
Martin Schwenke	31bdf91933	Eventscripts: Update/remove stale comments in 11.natgw Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5d713d5e5be67f5914a661694c15d938bd67dea3)	2012-07-26 21:14:58 +10:00
Martin Schwenke	05359689f6	Eventscripts: Retrieve and build NAT gateway details better in 11.natgw * "ctdb natgw" is run twice when it doesn't need to be. * Tweak the parsing of "ctdb natgw" output so that it is done by the shell instead of a bunch of external processes. * Make default NAT gateway be -1, even on error. If the process failed entirely then it could previously be empty. * Streamline the error handling using die() for when there is no NAT gateway. * Downcase script-local variable names. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 630cfe6451ba23d959fa4907fbba42702337ed3b)	2012-07-26 21:14:58 +10:00
Martin Schwenke	e7325ebcd5	Eventscripts: Optimise building the host address in 11.natgw It can be build without forking unnecessary processes. Also downcase variable name because it is local to script. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 34f58a0773618c4508a55ad75fc4602dad5a5f4c)	2012-07-26 21:14:58 +10:00
Martin Schwenke	9a7a199132	Eventscripts: Clean up startup sanity check in 11.natgw Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f6e421e8bf935cae790a6dc2b861eb9c7f8610b4)	2012-07-26 21:14:57 +10:00
Martin Schwenke	573fb0497a	Eventscripts: remove redundant firewall rules from 11.natgw aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it redundantly duplicated them instead. That commit also fixed the problem because it moved the rules after delete_all() not out of the startup event as claimed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 07149edaecb3caa672163e5a3b89715557d5205a)	2012-07-26 21:14:57 +10:00
Martin Schwenke	c0b7fbf2a4	Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation $CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without forking lots of processes. Also "local" isn't supported by POSIX. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e20fdb974158061f4627d6f360c168d764690e6f)	2012-07-26 21:14:57 +10:00
Martin Schwenke	1ba9fa2e48	Eventscripts: Fix deprecated iptables ! usage This currently causes warning in the logs. This change is not SLES10-compatible but we already have some other non-SLES10-compatible changes. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7640352c6697f9d4e0d13afbc8523afc64e7d462)	2012-05-25 15:26:07 +10:00
Ronnie Sahlberg	383711ac82	Merge remote branch 'martins/ganesha' (This used to be ctdb commit f23b5a160184db8c92f8c69307dc4a64adae839d)	2012-05-17 11:48:07 +10:00
Ronnie Sahlberg	dce5969d12	Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung. Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect. For now we only collect a pstree so we can see what part of the script we hung in. S1037271 (This used to be ctdb commit 6e68797af67bee36f2bad045f94806e7e98f27e9)	2012-05-17 10:29:03 +10:00
Martin Schwenke	835e0b6d49	Eventscripts: Modernise 60.ganesha to match 60.nfs Originally from Srikrishan Malik <srikrishan.malik@in.ibm.com> with some style changes by me. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 637cab6304dae66b85668506028c76ea1ee88980)	2012-05-16 17:24:21 +10:00
Martin Schwenke	ffbe59bd44	Eventscripts: restart lockd in the background when going unhealthy Sometimes the restart can hang when there are I/O problems. Then the eventscript times out and gets killed so the node never marked as unhealthy. Restarting in the background avoids this. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 13acd58c41fba1a33894fbd654fed69ea0eac322)	2012-05-16 17:19:55 +10:00
Martin Schwenke	92eb004162	Eventscript functions: add optional version to nfs_check_rpc_service() This can be optional because the 1st item of each action-triple is a test comparison that starts with '-'. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 92f74fd589467b46c758e116e97417edfe8773d7)	2012-05-16 17:05:05 +10:00
Martin Schwenke	fd048a1771	Eventscript functions: add optional version to nfs_check_rpc_service() This can be optional because the 1st item of each action-triple is a test comparison that starts with '-'. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1957d53b78f101cd0cd37d9705a225deef5174a2)	2012-05-11 10:33:27 +10:00
Martin Schwenke	0c8f785628	Eventscripts: fix basename -> dirname typo I fixed one of these previously but didn't notice this one... :-( Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0c674efd19368d41d9cc28909d2b16c1af54c86c)	2012-04-27 15:42:42 +10:00
Martin Schwenke	012015b32c	Eventscripts - Fix typo in 13.per_ip_routing support for __auto_link_local__ Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9542e770a9780740b49122f1f52f08b32eca4b35)	2012-04-27 15:40:43 +10:00
Martin Schwenke	a3ee4a900f	Initscript - add backup of corrupt non-persistent databases Corrupt non-persistent databases never get analysed because ctdbd zeroes them at startup. Modify the initscript so that corrupt non-persistent databases are moved aside to a backup. If the number of backups for a particular database exceeds $CTDB_MAX_CORRUPT_DB_BACKUPS (default 10) then the oldest excess backups are garbage collected. Abstracts from and cleans up the code for checking persistent databases. Logging of related messages is done to syslog or a log file as specified. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 00cd75595685dae829758abf1a4cb644af7ed50e)	2012-03-28 15:02:07 +11:00
Martin Schwenke	2f5cb56017	Eventscripts - make 13.per_ip_routing fail gracefully if config is missing Currently it spews out random messages about the file being missing. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 351ca413eec460330571ca8b01ad269728fe15df)	2012-03-22 15:30:27 +11:00
Martin Schwenke	ac973b34df	Eventscripts - make 13.per_ip_routing try harder to find public_addresses Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d4621277240721e6d130a930b0100506b64467ea)	2012-03-22 15:30:27 +11:00
Martin Schwenke	020c8190c5	Eventscripts - use set_proc() rather than accessing /proc directly Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bdb4cdaf2aed79c8de6a8db8c01685b242808310)	2012-03-22 15:30:27 +11:00
Martin Schwenke	4f65737809	Eventscripts - 13.per_ip_routing should use dirname not basename for mkdir Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d034845ecea66b47004bc73f2554914a397b1c9d)	2012-03-22 15:30:27 +11:00
Martin Schwenke	56d90e930d	Eventscript support - Remove unused interface_modify.sh Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 994492f79275fe84155d842f6bc288c1858217dd)	2012-03-22 15:30:27 +11:00
Martin Schwenke	476cf45049	Eventscript functions - no longer require interface_modify.sh Make add_ip_to_iface() and delete_ip_from_iface() do their own locking so the external script is no longer required. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 93f90caf91246074d9359bf31a39b26212cccc42)	2012-03-22 15:30:27 +11:00
Martin Schwenke	0b2c3d7d24	Eventscript functions - remove now-unused route/IP re-add script logic This is no longer used by 13.per_ip_routing or anything else. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2a2ea6c61a05af2d0765e964abcc7ef04047431e)	2012-03-22 15:30:26 +11:00
Martin Schwenke	940efdb8e9	Eventscript functions - remove functions only used by 13.per_ip_routing The relevant functions are now in that script. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 45c3476d12bf0f52966b72d286f101fce1382cd2)	2012-03-22 15:30:26 +11:00
Martin Schwenke	95e10b20cb	Eventscripts - redesign and rewrite 13.per_ip_routing The current version is quite difficult to read. This one is hopefully clearer. Major changes: * The configuration file has a more forgiving syntax. Items can be separated by arbitrary whitespace. * Mappings between IP addresses and table IDs are no longer stored in files in a state directory. Instead they are stored in /etc/iproute2/rt_tables as mappings between table IDs and labels, as allowed by the ip command. The current structure of the labels is ctdb.<source-ip>. This means that once the labels are setup the routing tables can be referenced by just knowing the source IP. As with the old state directory, mappings in this file owned by CTDB are deleted when CTDB shuts down. * There are no release or re-add scripts. - Release scripts are not necessary as an optimisation because of the previous improvement (i.e. use of rt_tables). No lookup is necessary to delete rules or flush tables. - Re-add scripts are no longer used. Routes can still go missing when removal of a primary IP from an interfaces (or similar) causes removal of all other addresses (i.e. secondaries) and also all associated routes. However, any missing routes are now re-added in the "ipreallocated" event. This happens shortly after takeip/releaseip/updateip and means that the routes will only be re-added once. The window for missing routes is slightly bigger but is not expected to be significant. * The magic "__auto_link_local__" configuration value no longer causes a dynamic configuration file to be maintained in a state directory. The link local configuration is now generated when needed from the public_addresses file. This greatly simplifies the code. This approach is slightly less efficient but should not be significant. The above changes mean that, apart from maintaining mappings in the rt_tables file, there are no state files kept anymore. Some utility functions only used by this script have been rewritten and moved into this script. They will be removed from the functions file by a future commit. The route re-add code will also be removed from interface_modify.sh by a future commit. It is currently harmless. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f7cbbb55f26cf3c953e98fe5e7eaa12f59fbf78)	2012-03-22 15:30:26 +11:00
Martin Schwenke	0d67779c67	Eventscript functions - add new function die() Args: 1. Error message to be printed. 2. Option exit code (default 1) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 97b0c138cb97e30db27c40b4ee1481109ae90c78)	2012-03-22 15:30:26 +11:00
Ronnie Sahlberg	81fb334cff	when shutting down ctdb, allow it 30 seconds instead of 10 before will -9 the daemon (This used to be ctdb commit d8b400d76665f37ffd9de302eedcff9f23807225)	2012-02-21 19:02:36 +11:00
Ronnie Sahlberg	e3b85bba3f	Add a hoook to the ctdb initscript that we can call out to for applications that want to track and produce audit logs when someone runs "service ctdb <something>" S1033891 (This used to be ctdb commit 4f4fbd4080a3a7226d3b82637f803c4b71217d39)	2012-02-06 12:07:08 +11:00
Mathieu Parent	956f06f3ae	Fix ctdb-crash-cleanup sysconfig handling (This used to be ctdb commit 667b174d605646b53f4855e9aaf5f8ce4fdde532)	2011-12-06 11:55:46 +11:00
Martin Schwenke	162ac70f9e	Eventscripts - add facility to 10.interface to delete unmanaged IPs For a number of reasons (delip failure, admin stupidity, ...) an interface that hosts public addresses can also contain spurious, unmanaged addresses. Add functionality to 10.interfaces, controlled by new configuration variable CTDB_DELETE_UNEXPECTED_IPS, to delete these addresses when encountered as part of a monitor event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 88f88d86b0d08240f749fb721b8c401c2eeb1099)	2011-11-17 16:47:00 +11:00
Martin Schwenke	ba5e5f51cf	Eventscripts - remove $0 from error messages in 40.fs_use The script name is now prepended to output by ctdbd. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bfa0fe70db195413a6d7a98f46f7a1270aba678c)	2011-11-16 16:26:49 +11:00
Martin Schwenke	9187db869e	Eventscripts: Make 40.fs_use use less processes and arguably clearer. * $fs can be parsed using shell prefix and suffix removal. * df output can be parsed with a single call to sed. Failure is indicated by empty output from sed, so we check for that as the error condition, changing the associated message appropriately. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c5ef0d1440f1d952784cc67946c414d149722d01)	2011-11-16 16:26:45 +11:00
Mathieu Parent	4617a0e9cf	config can be in /etc/default/ instead of /etc/sysconfig/ ... on Debian system and derivated. (ctdb_diagnostics still hardcodes /etc/sysconfig/) (This used to be ctdb commit 1341329f6125d491b82c873f793af819e677f714)	2011-11-08 16:31:15 +11:00
Mathieu Parent	91431262be	config/functions: CTDB_VARDIR is /var/lib/ctdb on Debian-like systems (This used to be ctdb commit 56160eccb62178f645b017b1257677a1e854b2bc)	2011-11-08 16:31:03 +11:00
Mathieu Parent	0250b72a65	Fix bashism in 40.fs_use Also, add -P to df, to avoid multiline on Linux when device name is long (this is the case with LVM) (This used to be ctdb commit f4d5a5810f1a840a41c3541a3b822fce44d41e9a)	2011-10-12 20:08:40 +11:00
Mathieu Parent	a1919fd316	apache's service name is not always httpd Solution 2 of <https://bugzilla.samba.org/show_bug.cgi?id=8317> (This used to be ctdb commit 8b9ac5cd8d867ff4866ac464c570d9293d03a91e)	2011-10-12 20:07:45 +11:00
Mathieu Parent	7f1ff4dbd8	Less verbosity when there is no public addresses file This partialy reverts `81eff51`, but still avoid spam. (This used to be ctdb commit e646142f4d28b5401235cd5edee325f7a29f8193)	2011-10-12 20:07:03 +11:00
Martin Schwenke	205c7c7663	Eventscripts - enhance ctdb_replay_monitor_status() Print useful output and return a suitable exit code. The DISABLED and TIMEDOUT statuses use fake negative return codes, and these can't be faked from the shell. So we map DISABLED to OK and TIMEDOUT to ERROR - this should avoid nearly all surprises. When we do this we add a note to the beginning of the output. The alternative is to "fix" ctdbd to use only codes that can actually be returned by shell scripts. However, the reason for using negative codes is probably to distinguish them from real ones... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit dda44d026e0c1b02feb02185b8c200a542be341a)	2011-08-31 15:34:43 +10:00
Martin Schwenke	aa64622137	Eventscripts - use ctdb scriptstatus -Y when replaying status Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5be904fb1fbd546618d25509b41ab836db62a70a)	2011-08-30 16:34:43 +10:00
Martin Schwenke	b97625acb6	Eventscripts: add a synchronous synthetic reconfigure event. In the current code services can only be reconfigured asynchronously. This means that configuration file changes can be made, an asychronous reconfigure event can be triggered, and it always succeeds. Some time later when a service is actually reconfigured then a failure may be seen This adds a synthetic reconfigure event that reconfigures a service synchronously so that any failure is reported on exit. ctdb_service_check_reconfigure() is essentially reimplemented. If a reconfigure event is in flight and an ipreallocated or monitor event occurs then any scheduled asynchronous reconfigure is deferred until the next monitor cycle. This is to avoid reconfigures trampling on each other. In this case a monitor event will also replay the previous status to try to avoid exposing any temporary instability. If a reconfigure event collides with another reconfigure event it will exit with status 2, indicating that the reconfigure should be retried. The reconfigure event is implemented using a subprocess to control the exit from the synthetic event. As before, if a monitor event causes a scheduled synchronous reconfigure to occure then it will replay the previous status for the service, given that a reconfigure can cause temporary instability. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 220578bfd3507152b29ba4c28942f9d5e8733886)	2011-08-30 14:29:48 +10:00
Martin Schwenke	94c3429567	Eventscripts - call ctdb_check_args() in 00.ctdb This is the first eventscript. Sanity check as early as possible and everyone benefits. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0564717fcc1e21688ae5dacbd437fd493bcb8853)	2011-08-30 09:33:47 +10:00
Martin Schwenke	bc4e62be85	Eventscripts - call ctdb_check_args() instead of doing hand checking Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc5bc1948dcbe8b8b25185260927b94a4b529174)	2011-08-30 09:33:47 +10:00
Martin Schwenke	7980a4cb44	Eventscripts - new function ctdb_check_args() Pass this "$@" to do common eventscript argument checking. For regular use putting this in 00.ctdb would be enough. However, for developer testing it can be useful to call this in other eventscripts. For example, 10.interfaces and 13.per_ip_routing currently check these by hand. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 36de7e7fd6dfeed61ef9977b8d5b568f90a9707b)	2011-08-30 09:33:47 +10:00
Martin Schwenke	63729fc35d	Eventscripts - ctdb_check_tcp_ports() bug fix. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e8d9c0b251c84d6fdf6ea7d972e5f7d1d0222f9b)	2011-08-30 09:33:47 +10:00
Martin Schwenke	194de8faf8	Eventscripts - fix debugging buglet in ctdb_check_tcp_ports_ctdb() Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61000e38d6016e58f67e292393756d0bd5262ae5)	2011-08-30 09:33:47 +10:00
Martin Schwenke	9257b57f2c	Eventscripts: New configuration variable CTDB_SERVICE_AUTOSTARTSTOP. Some of the current auto-start/stop logic is broken, particularly for Samba. Fixing it is non-trivial. If $CTDB_SERVICE_AUTOSTARTSTOP is "yes" then auto-start/stop services when told to newly manage or no longer manage them. This defaults to "yes". However, if using a canned configuration file that doesn't set $CTDB_SERVICE_AUTOSTARTSTOP then this stops the auto-start-stop logic from working. Therefore, this works around CQ S1026685 - on the system in question another daemon controls service auto-start/stop and CTDB just gets in the way. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ef71b8290ae49117d7bcc7166598b77cb64cc8a0)	2011-08-30 09:33:47 +10:00
Martin Schwenke	54402cdff4	Eventscripts - in 60.nfs uniquify the share check directory list There are sites that have multiple entries for the same export. This optimises the share check in this case. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1ccdae79b64b236fc27f4653606429d73c9c3595)	2011-08-30 09:33:47 +10:00
Ronnie Sahlberg	02ebd35398	Merge remote branch 'martins/eventscripts' (This used to be ctdb commit bb008c01989ebb173a3f095ebd2f90ab54f9da91)	2011-08-17 14:10:04 +10:00
Martin Schwenke	6e7dbf0543	Eventscripts - new default TCP port checker using "ctdb checktcpport" New function ctdb_check_tcp_ports_ctdb(). This should be fast... and is now the default checker. If it fails in an unexpected way we fall back to the nmap and netstat checkers. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a1e16a707ce204817531a61455000361f972080a)	2011-08-17 14:02:45 +10:00
Martin Schwenke	1374327f6e	Eventscripts - generalise TCP port checking plus new nmap-based checker Split the netstat-specific parts of ctdb_check_tcp_ports() into new function ctdb_check_tcp_ports_netstat(). Implement new ctdb_check_tcp_ports_nmap() function that uses "nmap -PS" to check if the desired ports are listening. ctdb_check_ctdb_ports() now uses new configuration variable CTDB_TCP_PORT_CHECKERS to decide which port checkers to try. Default value is currently "nmap netstat". If nmap is not found then this will fall back to netstat - if logging is at debug level this will also fill the logs with message saying the nmap checker failed. This indicates that either nmap should be installed or the default value of CTDB_TCP_PORT_CHECKERS should be changed (in a configuration file) to avoid trying to use nmap. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d9651175b40b9454e7d4e98291955fcf1445085e)	2011-08-17 12:12:20 +10:00
Martin Schwenke	62f654d3d2	Eventscripts - ctdb_check_tcp_ports() only prints netstat output if debugging Use the new debug function to conditionally print the netstat output. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 44c14aeeb11080980fe07c7396d06843a4870747)	2011-08-17 10:39:54 +10:00
Martin Schwenke	86792724a2	Eventscripts - weaken TCP port check message if CTDB has just been started. Sometimes smbd and other services can take a while to start, especially when there is a lot of activity after ctdbd has just started. The TCP port check can then pollute the logs with lots of "ERROR" messages and possibly extra debug. This creates a flag file when a service is started (but not restarted) and this flag is removed the first time that TCP port checks succeed for that service. When a port check fails and the flag file still exists, a less extreme "INFO" message is printed rather than the usual "ERROR" message. This means that until the node actually becomes healthy we see more friendly messages. The subtext is that we're hearing false positive reports "recreates" of CQ S1024874 (samba stopped responding on port 445) quite often when ctdbd is started. This reduces the chances of people reporting such false recreates... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 571865eb6ef847857129d0b1e2ba5fa7254bfe8c)	2011-08-17 10:39:53 +10:00
Martin Schwenke	5c9fbb55ce	Eventscript functions: optimise ctdb_check_tcp_ports() and add debug. ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each port. There are 2 problems with this: * Netstat is run on each loop iteration when it need only be run once. * The -a option is used to list all connections but the function only cares about the listening ports. There may be many thousands of non-listening ports to grep through. This changes ctdb_check_tcp_ports() to run netstat with the -l option instead of the -a option. It also only runs netstat once before the main loop. When a port is found to not be listening the output of the netstat command is now dumped to help with debugging. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 830355a8b18c53cfcc3ad1e3009bbb1a7a681fa0)	2011-08-17 10:39:53 +10:00
Martin Schwenke	f0f9271301	Eventscripts: add a debug() function and call ctdb_set_current_debuglevel() The debug function passes its arguments to echo if $CTDB_CURRENT_DEBUGLEVEL is >= 4 (i.e. DEBUG). If no args are given then use stdin - this allows the function to be used with here documents. To ensure $CTDB_CURRENT_DEBUGLEVEL is set, ctdb_set_current_debuglevel() is called near the end of the functions file. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6143483d9f87322578c00f12081e381f425226ca)	2011-08-17 10:39:35 +10:00
Ronnie Sahlberg	ce4555b7a6	dont use a too big persistence timeout value (This used to be ctdb commit 82628e32c431d66b806399ffb9657c3a031f6428)	2011-08-17 10:00:06 +10:00
Martin Schwenke	3e1a0528b8	Eventscripts - conditionally inherit ctdbd debug level in each monitor event Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a7eebc06f81a7b0a3fba93759bcbdeabc8c2e86e)	2011-08-17 09:14:23 +10:00
Martin Schwenke	171bef3d68	Eventscripts - new function ctdb_set_current_debuglevel() This function ensures that CTDB_CURRENT_DEBUGLEVEL is set. It works like this: 1. If it is already set then do nothing, since it might have been set some other way. The recommended "other way" would be to add a file in rc.local.d/. 2. If it is not set then set it by sourcing /var/ctdb/eventscript_debuglevel. 3. If this file does not exist then create it using output from "ctdb getdebug". If the optional 1st argument is set to "create" then don't source an existing file but create a new one instead - this is useful for creating the file just once in each event run in, say, 00.ctdb. If there's a problem getting the debug level from ctdb then it is silently set to 0 - no use spamming logs if our debug code is broken... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 93910921c8a25f2b029733cd938069ff7c7bdab7)	2011-08-17 09:00:46 +10:00
Martin Schwenke	430ca2f606	Eventscripts - ensure the statd update-trigger file always exists. See the comment in the code for details. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8ee9856996a8ec738e9d3ea7f1561605da526b8c)	2011-08-16 13:28:40 +10:00
Martin Schwenke	1452b63d27	Eventscripts: remove "return 0" from 50.samba service_stop(). This potentially masks errors and was basically included by accident. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit e7e4a1b4f31118027fd13a6223192f9957cf2e74)	2011-08-16 13:18:40 +10:00
Ronnie Sahlberg	81292ac0e6	Change the errors for 10.interface to clearly state ERROR: for error messages Update the tests system to catch the new error strings generated by this change (This used to be ctdb commit a2c30d88348da47d1a733a16e4c7d83c3becb6df)	2011-08-15 15:53:04 +10:00
Ronnie Sahlberg	1fb577f4b2	Merge remote branch 'martins/eventscript.10.interface' (This used to be ctdb commit 0d17daab38d4086f922a8006d4c545133adca191)	2011-08-15 15:27:50 +10:00
Ronnie Sahlberg	bc00292cfe	Merge remote branch 'martins/60_nfs_regression' (This used to be ctdb commit 845fb0ba24cf9118470c58fae7103ab8322ce079)	2011-08-15 15:22:20 +10:00
Martin Schwenke	c9d168bbe4	Eventscripts: 10.interfaces - make startup event actually mark interfaces up! The startup event intends to mark interfaces up. However, it doesn't actually do that because $INTERFACES is empty. This uses the function get_all_interfaces() to list the interfaces... and then mark them up. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fc62bf0975c6059ee467285565d0dc3b4daaf238)	2011-08-12 16:34:34 +10:00
Martin Schwenke	5ab955a73d	Eventscripts: 10.interfaces - startup comment says assume all interfaces good. Interfaces are currently marked down. Mark them up instead, as per the comment... and discussion with Ronnie. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 35942841229cc72ce363a7236aec708f1a33136b)	2011-08-12 16:34:34 +10:00
Martin Schwenke	e7963d8a65	Eventscripts: 10.interfaces - new function get_all_interfaces(). Move existing interface listing code to new function in preparation for using it in startup event. While we're here change the "sort \| uniq" into "sort -u" and save some complexity. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cd1442531ad079b11c60f46ee9d34f5104bef219)	2011-08-12 16:34:34 +10:00
Martin Schwenke	9bdcdb76be	Eventscripts: 10.interface clean-ups - minor tweaks and new comments. * sed can read files, it doesn't need a file piped to it * use $() subshells instead of `` - they seem to quote better in dash * tweak the uniquifying code so that it is easier to read * add comments * remove some extraneous semicolons at ends of lines Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5f49537889a92c3cb68d9203912188bedf00ecd4)	2011-08-12 16:34:13 +10:00
Martin Schwenke	32fe247e37	Eventscripts: In 60.nfs don't restart NFS when restarting rpc.lockd. This effectively reverts 953dbfbddad656a64e30a6aca115cb1479d11573 and is a policy decision. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 380c9263eb37db5a250264316e250c2160908263)	2011-08-12 16:28:09 +10:00
Martin Schwenke	7c33fb1711	Eventscripts: 10.interface clean-ups - variable name fix-ups. Change most of the uppercase variable names to lowercase for consistency with other variables, readability and so they can be easily distinguished from environment/configuration variables. Change the name of 2 of the variabless to add some clarity. Changes are as follows: INTERFACES -> all_interfaces IFACES -> ctdb_interfaces IFACE -> iface I -> i REALIFACE -> realiface Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7b201c1087b1433cfbc95de76cb4205e484ccd6f)	2011-08-12 15:57:34 +10:00
Martin Schwenke	6fa27bdf18	Eventscripts: 10.interfaces clean-ups - push logic into monitor_interfaces(). The logic in the monitor event itself is very complex. Nearly all of it can go away by adding a single check of $CTDB_PARTIALLY_ONLINE_INTERFACES to the return logic of monitor_interfaces() and reversing the sense of the corresponding check. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit fa93177442c65c2a4eb2d5d5dba0a0da1c486969)	2011-08-12 15:00:03 +10:00
Martin Schwenke	00c4cc6d22	Eventscripts: 10.interfaces clean-up - use more descriptive variable names. The name of variable $ok gives no clue to its meaning/use so this changes that variable to be named $up_interfaces_found. The return logic relating to $ok and $fail is difficult to read, so these variables are given true/fale values, allowing the return logic to be simplified. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3402930319d462eab5525410f6a676952e120182)	2011-08-12 14:49:27 +10:00
Martin Schwenke	bb5db84021	Eventscripts: 10.interfaces cleanup - new functions mark_up(), mark_down(). The same few lines of logic are used every time an interface up or down. This encapsulates those few lines in 2 new functions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ab443c4d7d282f282792abc6a6ac224ab06abe30)	2011-08-12 14:43:15 +10:00
Martin Schwenke	1d71dd08e3	Eventscripts: change failure counts and behaviour for statd and nfsd. We reduce the number of failures before attempting a restart. However, after 6 failures we mark the cluster unhealthy and no longer try to restart. If the previous 2 attempts didn't work then there isn't any use in bogging the system down with an attempted restart on every monitor event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f654739080b40b7ac1b7f998cacc689d3d4e3193)	2011-08-12 14:16:17 +10:00
Martin Schwenke	398116ff29	Eventscripts: clean up 60.nfs monitor event. This adds a helper function called nfs_check_rpc_service() and uses it to make the monitor event much more readable. An example of usage is as follows: nfs_check_rpc_service "mountd" \ -ge 10 "verbose restart:b unhealthy" \ -eq 5 "restart:b" The first argument to nfs_check_rpc_service() is the name of the RPC service to be checked. The RPC service corresponding to this command is checked for availability using the rpcinfo command. If the service is available then the function succeeds and subsequent arguments are ignored. If the rpcinfo check fails then a failure counter for that particular RPC service is incremented and subsequent arguments are processed in groups of 3: 1. An integer comparison operator supported by test. 2. An integer failure limit. 3. An action string. The value of the failure counter is checked using (1) and (2) above. The first check that succeeds has its action string processed - note that this explains the somewhat curious reverse ordering of checks. It the example above: * If the counter is >= 10 then a verbose message is printed describing the failure, the service is restarted in the background and the node is marked as unhealthy (via an "exit 1" from the function). * If the counter is == 5 then the service us restarted in the background. For more action options please see the code. This also changes the ctdb_check_rpc() function so that it no longer takes a program number to check. It now just takes a real RPC program name that rpcinfo can resolve via /etc/rpc. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9b66057964756a6245bafb436eb6106fb6a2866e)	2011-08-12 14:16:14 +10:00
Martin Schwenke	1971336200	Eventscripts: fix regression in 60.nfs export checking. Commit 35a60a63a9b5c7d98dde514ae552239506b691c9 introduced a regression, reported by "Jonathan Buzzard" <J.Buzzard@dundee.ac.uk>, as follows: Basically the use of sed in the following code snippet does not work for long exports where exportfs wraps the host or network onto the next line. exportfs \| grep -v '^#' \| grep '^/' \| sed -e 's/[[:space:]][^[:space:]]$//' \| ctdb_check_directories The result is that the you get lots of blank lines being sent to ctdb_check_directories which causes the host to be marked as unhealthy and then thrashing sets in of the managed IP's making the whole cluster unusable. This tightens up the sed expression so that it is less likely to produce a spurious empty line. It also removes an unnecessary "grep -v". Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit bd39b91ad12fd05271a7fced0e6f9d8c4eba92e6)	2011-08-11 15:01:39 +10:00
Ronnie Sahlberg	f9e58b502f	Merge remote branch 'martins/eventscript.10.interface' (This used to be ctdb commit 84ac667af408816e5508719b9fdb7c5e25408640)	2011-08-11 14:15:22 +10:00
Ronnie Sahlberg	b77a78d809	Merge remote branch 'martins/eventscript_infrastructure' (This used to be ctdb commit 20864822372b6d574c545287002a429b273c4bcc)	2011-08-11 14:01:02 +10:00
Martin Schwenke	088620b026	Eventscripts: in 60.nfs move statd-notify code to service_reconfigure(). This means that it now occurs on every reconfigure event. As a result the ipreallocated event is removed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c45a89418ba733ff91d48340d72bdb6d2ef80051)	2011-08-11 13:56:25 +10:00
Martin Schwenke	eef89f83b2	Eventscripts - 60.nfs should define service_reconfigure(). Not $service_reconfigure. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 642292d7ba7a95567964b4160c7ee31a4f8985d1)	2011-08-11 13:55:02 +10:00
Ronnie Sahlberg	53b956fee7	When starting and stopping ctdb through the init-script, make sure we first clear all public ips bvefore we start the daemon, in case they are still hanging around since a previous kill -9 and also make sure we drop them after we have stopped the deamon when shutting down CQ S1027550 (This used to be ctdb commit 8de5513b3ad89711da845c7588d35b32e2f2acb6)	2011-08-11 11:48:04 +10:00
Martin Schwenke	3a760b09ed	Evenscripts: improvements to ctdb_service_check_reconfigure(). * Make this function applicable to "ipreallocated" event too. * Monitor event should not always succeed just because we reconfigure. If the service was unhealthy before the reconfigure and we end the reconfigure with "exit 0" then we can cause the node's health status to flip-flop. To avoid this we return the status of the service from the previous monitor event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 21dfcbbdccd906fcd6ab7bba81418ce565bf63aa)	2011-08-11 10:46:57 +10:00
Martin Schwenke	e66a1af9b3	Eventscripts: 50.samba - only start/stop nmbd if $CTDB_SERVICE_NMB set. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit defaec99df8c279d8e315d5010f9146e013afda2)	2011-08-11 10:46:57 +10:00
Martin Schwenke	8fb04d451e	Eventscripts: 50.samba needs null service_reconfigure() function. Samba doesn't need to do anything for configuration changes. It will notice configuration changes and reload automatically. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit de13350c17261032a7468c2cf4d2cf4a8d66a840)	2011-08-11 10:46:57 +10:00
Martin Schwenke	b01d99a8fa	Eventscripts: 40.vsftpd service_stop() no longer /dev/null's output. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f928c201b6d0e1cd3e5568ae65186e3cee7c4988)	2011-08-11 10:46:57 +10:00
Martin Schwenke	1ea3616dcc	Eventscripts: improvements to 41.httpd. * Reduce the failure counts so that restart attempts happen sooner. * Use service_start() and service_stop() for the restart. ctdb_service_start() resets the failure count, which isn't very useful in this context. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 01776b9f29af9ad5c8534649ece1bd100e450434)	2011-08-11 10:46:56 +10:00
Martin Schwenke	2a14f91722	Eventscript functions: new function ctdb_check_counter(). This should eventually be able to replace ctdb_check_counter_limit() and ctdb_check_counter_equal(), although it doesn't issue warnings like the former. It takes 4 optional arguments: 1. _msg - If "error" then over limit causes an error message and and exit 1. Anything else fails silently but the function returns 1. Default is "error". 2. _op - An integer operator supported by test (e.g. -eq, -ge, -gt). Default is -ge. 3. _limit - Limit for the counter to be used in comparison. Default is $service_fail_limit. 4. _service_name - Used to identify the counter. Default is $service_name. For example: ctdb_check_counter error -ge 5 foo will print a message and exit 1 if the counter for foo is >= 5, whereas ctdb_check_counter check -ge 5 foo will just return 1 if the counter for foo is >= 5, and ctdb_counter_check with print a message and exit 1 if the counter for $service_name is >= $service_fail_limit. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5b01b7233515669e995e037205796e265643b176)	2011-08-11 10:46:56 +10:00
Martin Schwenke	219c6fd55b	Eventscripts: remove unused remove_ip() function. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 881af7c1417962b9b3ade6565b3e8eb9f9df7a97)	2011-08-11 10:46:56 +10:00
Martin Schwenke	5c948528b5	Eventscripts: startstop_nfs stop no longer redirects output to /dev/null. When stopping (as opposed to restarting) it is useful to see this information. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a9ab1937239761dc32b143c9d225447bc6f090b4)	2011-08-11 10:46:56 +10:00
Martin Schwenke	caee6f1508	Eventscripts: fix typo in _ctdb_counter_common(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f57d1722b6aa082f3f826171acc57d7d796ea95c)	2011-08-11 10:46:56 +10:00
Martin Schwenke	ab693dbcc0	Eventscripts: improve log messages in ctdb_start_stop_service(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6da7095192fb172a06b434cfb02f4bfa6221b343)	2011-08-11 10:46:56 +10:00
Martin Schwenke	1b956b2b0a	Eventscript functions: fix counter regression. d362be7d32079ac1390d67056ce107bfbca2c937 wasn't well thought out. Subsequent commits depend on ctdb_counter_init() taking an argument, so this makes those cases work. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05a8fcfbac3da2b5843b31e0fe258255cc761190)	2011-08-11 10:46:56 +10:00
Martin Schwenke	217edfa1c8	Eventscript functions: ctdb_service_check-reconfigure() acts only on monitor. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit beabf506a5eb68fc50fdbf8772c1d2bb0f7951e3)	2011-08-11 10:46:56 +10:00
Martin Schwenke	cd4074d2f8	Eventscripts: make 50.samba use $service_state_dir. Signed-off-by: Martin Schwenke <martin@meltin.net> Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f003f05e28037eefdce3a686fcb52cd2289af9d)	2011-08-11 10:46:56 +10:00
Martin Schwenke	3d1f0100be	Evenscripts: update 60.nfs to use ctdb_service_check_reconfigure. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 7c070b0bc86b3b9a91a9dc263b72c0567934535c)	2011-08-11 10:46:56 +10:00
Martin Schwenke	a35138a001	Evenscripts: update 60.nfs to use ctdb_setup_service_state_dir. The state directory basename becomes "nfs" rather than "statd". One line of code i moved from the "startup" event to service_start(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cc4c5c19af7efe01c48f73bb5ec5e607ed79db4c)	2011-08-11 10:46:20 +10:00
Martin Schwenke	d6c5fcfbae	Evenscripts: update 40.vsftpd to use ctdb_service_check_reconfigure. To simplify we also remove the reconfigure from the recovered event because the monitor event will handle this very quickly anyway. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit da3aedd1a472b430b75989d3c157efedd382e327)	2011-08-11 10:46:20 +10:00
Martin Schwenke	4daf8bb1c8	Evenscripts: update 41.httpd to use ctdb_service_check_reconfigure. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 51c45b1c4751af41e5f9fd252763e0025f8cce3a)	2011-08-11 10:46:20 +10:00
Martin Schwenke	820d9b30ea	Eventscripts: rejig the reconfigure infrastructure. * Add an optional service name argument to existing reconfigure functions. * User function service_reconfigure() instead of variable $service_reconfigure to specify how a service is reconfigured. * New function ctdb_service_check_reconfigure() reconfigures a service if it is flagged for reconfigure. * Remove $service_reconfigure settings from 40.vsftpd and 41.httpd - they're the defaults. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 15d4111d0761d82f57d5d4f0b1227812d14e4d7c)	2011-08-11 10:46:20 +10:00
Martin Schwenke	5b5bd3d27b	Eventscript functions: move flagging of managed services. Move flagging of managed or unmanaged services into ctdb_service_start() and ctdb_service_stop(). That way services will be correctly flagged if they are started from the startup and shutdown events. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8675744cbd90b5a5095ed6fff7b36ae82004a457)	2011-08-11 10:46:20 +10:00
Martin Schwenke	428e32d647	Eventscript function: change service_start into a function. service_start is currently a variable. This makes passing arguments hard. We change it to be a function and put default definitions into the functions file. We use a convention that if a service name argument is passed to a redefined version of service_start() or service_stop() then it will act unconditionally. If no argument is passed then it can use internal logic to decide if services should really be started. This is useful when a single eventscript handles multiple services. This is a cherry-pick of ae38895 that needed to be reset mid-stream. There is still some breakage following this commit. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 86e4aefed9fd1028660c98e3ea758c2b75ffc1d8)	2011-08-11 10:46:20 +10:00
Martin Schwenke	f60802c776	Eventscript functions: add optional event name argument to fail count functions. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b14f18649f42aab80ce0336c15ab6159f241c9af)	2011-08-11 10:46:20 +10:00
Martin Schwenke	ea6a53e2b3	Eventscript functions - optimise is_ctdb_managed_service(). This function generates a lot of trace when running under "set -x". This is due to the backward compatibility code. This adds 3 optimisations: 1. Before invoking the backward compatiblity code, is_ctdb_managed_service() returns early if the service is listed in $CTDB_MANAGED_SERVICES. 2. ctdb_compat_managed_service() actually now updates $CTDB_MANAGED_SERVICES instead of temporary variable $t. This means that a subsequent call to is_ctdb_managed_service() will short circuit due to optimisation (1). 3. ctdb_compat_managed_service() only adds a service to $CTDB_MANAGED_SERVICES if it is the service being checked by is_ctdb_managed_service(). This stops irrelevant services being added to $CTDB_MANAGED_SERVICES multiple times by multiple calls to is_ctdb_managed_service(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 758f4667c60089e09a0439c1eb74f5e426ca5e2e)	2011-08-11 10:46:20 +10:00
Martin Schwenke	6ec2cfc7da	50.samba eventscript should use is_ctdb_managed_service "winbind". Currently it checks $CTDB_MANAGES_WINBIND directly in several places. This doesn't work when someone sets $CTDB_MANAGED_SERVICES directly. This modifies check_ctdb_manages_winbind() so that it return a condition rather than modifying $CTDB_MANAGES_WINBIND. This makes some code more readable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 538902fbc1e74134a03987b36b3733ad641f8971)	2011-08-11 10:46:20 +10:00
Martin Schwenke	e96e655430	50.samba eventscript should use is_ctdb_managed_service "samba". Currently it checks $CTDB_MANAGES_SAMBA directly. This doesn't work when someone sets $CTDB_MANAGED_SERVICES directly. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d8f0f8948abd340088720718fef7dc858661ba23)	2011-08-11 10:46:20 +10:00
Martin Schwenke	45bcf843ec	50.samba eventscript should stop/start services when they become (un)managed. When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or corresponding changes are made to $CTDB_MANAGED_VERSIONS), the associated service should be started or stopped as necessary. This add calls to ctdb_start_stop_service() to manage starting/stopping samba and winbind. An associated cleanup is made to the initial checks that one of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them with calls to is_ctdb_managed_service(). To handle the winbind cases ctdb_start_stop_service() and is_ctdb_managed_service() are updated to take an optional service name parameter. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/events.d/50.samba Most of this merged elsewhere. This just removes a check that this is the monitor event. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 257a2e350280c0b76ed2fac588cad167381fda52)	2011-08-11 10:46:20 +10:00
Ronnie Sahlberg	21226ee738	Add documentation for the new filesystem use monitoring (This used to be ctdb commit 9f10c5d48a08ffb3417f880c801aed2aa2dc1355)	2011-08-11 10:07:50 +10:00
Ronnie Sahlberg	ee96db07d5	Add new eventscript 40.fs_use that can be used to monitor file system use and flag a node unhealthy when they become full (This used to be ctdb commit 2fd1babf8135ad5d53f3b25ba823d840ebc66460)	2011-08-11 10:04:40 +10:00
Ronnie Sahlberg	c8a18e8f9a	make the persistent even longer for lvs to make people even happier (This used to be ctdb commit 8158077624eb763ba40c6a7b4b7faf3867b205d7)	2011-08-11 09:12:38 +10:00
Ronnie Sahlberg	543701293f	increase the persistent timeout to make people happier (This used to be ctdb commit 68ea19cb02017e93769df7f6312d5e0bef55e605)	2011-08-11 07:14:57 +10:00
Ronnie Sahlberg	f9156adef5	check the shares if they are available before we decide to try to restart nfs CQ S1027529 (This used to be ctdb commit b6c6a4588ccf6ef78fabfd76d228f56b4eb65165)	2011-08-11 07:14:16 +10:00
Martin Schwenke	4e60075228	Eventscripts - fix 10.interface bash incompatibility. In dash, this fails gracefully with nothing to stderr: t=$(cat /does_not_exist) 2>/dev/null In bash the error from cat is still printed due to different order of evaluation. This works everywhere: t=$(cat /does_not_exist 2>/dev/null) Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a6e61867c7a58d5a77cd8641d8df0b105cddff77)	2011-08-10 16:06:26 +10:00
Martin Schwenke	06f1004da4	Merge branch 'eventscript.20.multipathd' into eventscript.00.ctdb (This used to be ctdb commit 8723b88b0b2bbeece38c74c77c50e8d8b3e2d5ca)	2011-08-10 15:32:58 +10:00
Martin Schwenke	383b203096	Merge branch 'eventscript.62.cnfs' into eventscript.20.multipathd (This used to be ctdb commit fb87fa9273db4f82e801a331b5d95059d64dfb8e)	2011-08-10 15:32:11 +10:00
Martin Schwenke	7eae4aafca	Merge branch 'eventscript.13.per_ip_routing' into eventscript.62.cnfs (This used to be ctdb commit cfa4102ec0d97e1d1d3c1ce6407ffacdb85c2e10)	2011-08-10 15:31:13 +10:00
Martin Schwenke	098da255fa	Evenscripts: update 61.cnfs to use ctdb_setup_service_state_dir. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit afafeb1fb12384bddff470d38b534f513a1f3b07)	2011-08-10 12:27:41 +10:00
Martin Schwenke	061b7adad6	Evenscripts: update 13.per_ip_routing to use ctdb_setup_service_state_dir. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 18e0236754507a9475653f04bb239c5d46ba51de)	2011-08-09 17:35:37 +10:00
Martin Schwenke	609a1e5c77	Evenscripts: update 20.multipathd to use ctdb_setup_service_state_dir. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 797ca65bdd59b14325ffd32b4d4140e9b01dbe71)	2011-08-09 17:28:09 +10:00
Martin Schwenke	f36bae1cbf	Eventscripts: fix dangerous rm -rf in 00.ctdb init event. Also remove some unnecessary absolute paths for commands, which were making the code slightly difficult to read. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1b3f2dd62efb240f8486016fe0f8dfb73d6ccc66)	2011-08-09 16:48:57 +10:00
Martin Schwenke	dd56cde3ff	Eventscripts: 00.ctdb uses $service_state_dir, neaten update_config_from_tdb(). This also fixes a bug where update_config_from_tdb() used an incorrect filename in one place. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a5ce2adaa39f077f56582072a97bb64d0eba4b4d)	2011-08-09 16:45:50 +10:00
Martin Schwenke	cbf030a72e	00.ctdb eventscript removes all files from $ctdb_active_dir. Without this you can get into a situation where ctdbd can not start. If the active file for a service exists but the service is not running, then trying to stop the service may fail, causing the eventscript to exit from ctdb_start_stop_service(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 28379ca0f747c5952d690a451834ce7421adfd34)	2011-08-09 16:42:27 +10:00
Martin Schwenke	71e9016ec2	Scripts: add note about not using absolute command paths to README. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 87e6a4a23a6ae6c276e9628ce513663f47b4ee77)	2011-08-09 16:36:37 +10:00
Martin Schwenke	d81c1319e9	Add a README to the config/ subdirectory. This includes a comment about using POSIX Bourne shell, including a suggestion not to use "local" variables. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5ae002c7513b1b2aa5136437a1a19f8cd179b869)	2011-08-09 16:36:37 +10:00
Martin Schwenke	ee38b9a159	Eventscript functions: new function ctdb_setup_service_state_dir(). To be used by eventscripts to create a per-service directory for their own state data. $service_state_dir is set to point to the new directory. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a273554791c2a5281aee28f8e2be0c514e14c91e)	2011-08-09 16:35:07 +10:00
Martin Schwenke	ec33c04283	Eventscript functions: new functions to remember/check if service managed. This was done ad hoc and was badly named. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 9a084a121f629b2c1bcefc1e4c4a4a5cacf53987)	2011-08-09 16:20:08 +10:00
Martin Schwenke	50dc5b01a4	Scripts: remove absolute paths from interface_modify.sh. The "ip" command is currently run as "/sbin/ip". This makes it impossible to replace with a stub in unit testing. The functions file controls $PATH, so we don't need absolute paths. This replaces the absolute paths... Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5b4c712aab3edc0059f2e5a6730b7fdcf7e5f4ec)	2011-08-08 15:50:10 +10:00
Martin Schwenke	eec654314a	Eventscripts - Remove local variable usage in 10.interfaces. POSIX sh doesn't have local variables. Debian's dash doesn't behave the same way as bash on this contruct: local var=`command that produces multiple words` It only assigns the 1st word and may print an error. Just remove the use of the "local" keyword in monitor_interfaces() to solve this. It isn't actually limiting the scope of any variables that are used outside the function. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 95d9a1e19655461288a2c7e52abf9d01ab23e05a)	2011-08-08 15:44:30 +10:00
Martin Schwenke	72362e7b56	Eventscripts: source a file specified by $CTDB_RC_LOCAL in functions file. Another unit testing hook. This is easier than dropping files into rc.local.d/ and then removing them. The file has to be executable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit b13ac3bdaf326a6cdfd87da9195eb9630806c418)	2011-08-08 13:51:32 +10:00
Martin Schwenke	394bbe8454	Eventscript functions - use $CTDB_VARDIR instead of local $ctdb_spool_dir. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d0c6d9b19f0dd8946f9504b0d1cf50dd21f7a592)	2011-08-08 13:21:23 +10:00
Martin Schwenke	b0e7237653	Eventscripts - remove some more absolute paths to commands. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f5b7cb03aaf19fb4b12fc3f0c14d98ee2d7b0798)	2011-08-04 17:14:11 +10:00
Martin Schwenke	8026b3ce5a	Eventscripts - Rework the use of get_proc() for the bonding checks. Call call_proc(), put the output into a variable and then use it. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 2dfdc997f432d522034922b43cb6f8f878d11ba7)	2011-08-03 20:12:48 +10:00
Martin Schwenke	6fd94af5cc	Eventscripts: update 60.nfs service() start to use set_proc(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 70ebb30b90956bb1212287d267ccb72ea83740ca)	2011-08-03 20:01:38 +10:00
Martin Schwenke	4b516600a2	Eventscripts: update 10.interface to use set_proc() and get_proc(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 61b7f0172ba5c83c847c29fac3582c25c7754b68)	2011-08-03 19:58:25 +10:00
Martin Schwenke	cfdccc5cac	Eventscripts: use set_proc() in startstop_nfs(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5a3d5c6b1ca3682bb45104e50061871dec6e9b1d)	2011-08-03 19:57:40 +10:00
Martin Schwenke	75bbc93c0b	Eventscripts: remove unnecessary absolute paths from external commands. For eventscript unit testing it will be necessary to override external commands to allow stub implementations to be used. If absolute paths aren't used then this can be done using either a fake bin/ subdirectory or by using shell functions. This removes all of the simple cases of absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> Conflicts: config/ctdb.init config/events.d/50.samba Keep old code but remove absolute paths. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 05851d50b0078de8bf4691442d718825adca6fe8)	2011-08-03 17:19:15 +10:00
Martin Schwenke	5f4ab05766	Eventscripts: new functions set_proc() and get_proc(). These provide a thin layer around writing and reading files in /proc. They can be easily replaced by stubs for unit testing. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 637f9d8af517b73c72ed8f3cc2a2661f11eb2126)	2011-08-03 17:04:58 +10:00
Martin Schwenke	571e55ac0d	Eventscripts: remove ctdb_wait_command() and ctdb_wait_tcp_ports() functions. These haven't been used for a long time. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f5fd361cadb3ea18d29e2d7215a7853718e48d00)	2011-08-03 17:02:41 +10:00
Martin Schwenke	e3a9991e46	Eventscripts: iptables() should put lock in $CTDB_VARDIR. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 3f04793f391c63b78ffb9c9851ab3f0daf3ed50a)	2011-08-03 16:55:43 +10:00
Martin Schwenke	3bbfdfcdd3	Make Emacs recognise that the eventscript functions file is a shell script. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a6dfb76cfa759f6f9409f24368111c4f85ca0fbf)	2011-08-03 16:49:38 +10:00
Martin Schwenke	3380c6ce1d	Eventscript functions: add $CTDB_ETCDIR and hook service() functions. * $CTDB_ETCDIR defaults to /etc but can be changed for testing. All hard-coded instances of /etc have been changed to $CTDB_ETCDIR. This includes references to /etc/init.d and /etc/sysconfig. * service() and nice_service() functions now call new function _service(). This makes it easier to override these functions (say, in rc.local) for testing and call most of the existing functionality using _service(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit f43c9a7604b779bb6257ddb2bf3cbe266d496a63)	2011-08-03 16:45:54 +10:00
Martin Schwenke	d31fbcab4b	Set $CTDB_VARDIR in the functions file. This will be needed when eventscripts that use it are called externally. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ebd53b66b0cc66d9d04830781886234167fc2164)	2011-08-03 16:44:49 +10:00
Martin Schwenke	652bf326e1	Eventscripts - 10.interfaces should not check orphaned interfaces. If the last IP address on an interfaces is removed then that interfaces should no longer be checked by 10.interfaces. However, "ctdb ifaces" still lists such interfaces so they are currently checked. The problem really needs to be addressed in ctdbd but a neat quick eventscript fix will be minimally invasive... This changes the code to use "ctdb -Y ip -v" instead of "ctdb -Y ifaces". The former includes details of all public addresses and associated interfaces, so when an address is removed there is no output for it. This avoids orphaned interfaces from being listed. The logic is also slightly improved so that $IFACES includes just a (non-uniquified) list of interfaces, allowing an existing loop to be removed. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443)	2011-08-02 16:53:14 +10:00
Ronnie Sahlberg	18af72f08f	change the name for the key for the record where we stoire the public address config from public-addresses... to public_addresses... CQ1019030 (This used to be ctdb commit 114d5034ff4880848588caf493382a537a1469ae)	2011-06-28 15:40:46 +10:00
Mathieu Parent	c262fe6a8f	Fix bashism ... again ;-) Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit 2266586c1839af032622be54dc7f71e39d2bd9ef)	2011-05-14 22:30:25 +02:00
Ronnie Sahlberg	d020b2c950	When using multiple VLANs, some funky stuff can sometimes happen when adding/removing IP addresses causing routes might be dropped by the system. The easiest workaround for this is to unconditionally try to reapply all static routes for all interfaces once ipreallocation has finished, not just adding them back on the affected interface. This worksaround a funky issue in CQ S1023538 (This used to be ctdb commit 84600d1f53632d5fe76c308727f31f61b5ec1010)	2011-05-12 12:06:45 +10:00
Ronnie Sahlberg	d1edf44e4f	If samba fails to start for some reason, make this cause the startup event to fail too, so that ctdbd will re-try the startup event later. Or else this will leave samba not running. CQ S1023394 (This used to be ctdb commit f90485b08d32cbe56050718a3b28ca0fe1d64e0f)	2011-05-10 09:59:38 +10:00
Ronnie Sahlberg	ee9e137759	Dont exit from checking interfaces once we have found one interface that is not in use by public addresses. this can happen when we have removed existing interfaces/ip addresses and prevents us from verifying the status of other interfaces (This used to be ctdb commit d67955b42f7627be9dae995230c8fcbb8a948ec2)	2011-05-10 07:53:43 +10:00
Ronnie Sahlberg	2e2e37fdd6	Remove logging of spam/errors from the 10.interfrace script if/when we have for example NATGW configured but no public addresses defined on that interface CQ S1023378 (This used to be ctdb commit 8837daa424732aeb5a20814b1709c345a97a0e09)	2011-05-09 08:10:49 +10:00
Ronnie Sahlberg	d97e42183e	bonding mode 4 monitoring: we can not just check if MII Status is up for bonding mode 4, since the kernel will always report the bond device as UP even if all cables are disconneccted. For mode 4, ignore the status of the bond device and instead chek if at least one slave interface is up when determining if the device is good or bad (This used to be ctdb commit a6930cec6d9503dba18b9d4839d87a1c1a8ddba2)	2011-04-13 09:05:58 +10:00
Ronnie Sahlberg	c04505724a	IFACE handling. Assume links are always good on nstartup (they almost always Simplify the handling of setting the links in the 10.interface eventscript and remove the optimization to only call setifacelink on state change to make the code simpler to read. If a take ip event fails, flag the node as unhealthy. Add a check to the interface script to check if the interface exists or if it has been deleted. So that we can capture and become UNHELTHY if someone deletes an interface we are using to host public addresses. (This used to be ctdb commit 4ab63d2a7262aff30d5eced184c294c9c9dd4974)	2011-04-11 07:40:05 +10:00
Ronnie Sahlberg	55853a4683	NATGW: dont set arp_ignore in 11.natgw anymore since we no longer need this for the natgw functionality (This used to be ctdb commit bf3bf2967e3781c918e33b3a210e68e0ccca0c51)	2011-04-06 11:33:11 +10:00
Michael Adam	c9dc10292e	ctdb.init: print a warning when tdbdump is found but tdbtoo or "tdbtool check" is not available (This used to be ctdb commit afb26e38b617b85cdac14a7cd6dd3c85b8fddbc4)	2011-04-05 13:50:00 +02:00
Michael Adam	faa6d8d7e2	ctdb.init: check for availability of "tdbtool check" and "tdbdump" Print a warning if neither is available. (This used to be ctdb commit 4137d2a7d31cdce22847cebfc0239cfe2d8e937c)	2011-04-05 13:43:56 +02:00
Mathieu Parent	a5a6140b7e	Correction of spelling errors * continous -> continuous * activete -> activate (thanks to lintian) See https://bugzilla.samba.org/show_bug.cgi?id=6935 Signed-off-by: Michael Adam <obnox@samba.org> (This used to be ctdb commit fb6987c2f747d6dbf9bb3899a480124d1c242a90)	2011-03-23 00:35:23 +01:00
Ronnie Sahlberg	a453e79050	50.samba : Tell winbind about every time we add/remove and ip from the node CQ S1021636 (This used to be ctdb commit 87b279027616cffbcedfd534ac0032cd51238dfe)	2011-02-18 11:29:35 +11:00
Ronnie Sahlberg	d32a4dd501	remove checking for filesystems and filesystem health from the cnfs script. remove the gpfsmount and gpfsumount entry points (This used to be ctdb commit 7db5a4832a9555be53c301f198f72b9e075a8ae7)	2011-02-18 10:11:56 +11:00
Ronnie Sahlberg	ef0ab7eee1	60.nfs Dont update the statd settings that often. When we have very many nodes and very many ips, this would generate a lot of unnessecary load on the system (This used to be ctdb commit 0c030c9384500f340d8382c20e1e91b11aa377e9)	2011-02-18 10:10:34 +11:00
Martin Schwenke	59c5a9f279	Eventscripts: lower the fail/restart limits for nfsd. We were potentially leaving a node unable to serve requests for too long. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 5be8610ffa33db49e33949560d0ef2fa5f3c0c73)	2011-01-11 16:49:46 +11:00
Martin Schwenke	96378d6dc8	Eventscripts: use "startstop_nfs restart" to reconfigure NFS. This was defaulting to just "service nfs restart", which doesn't have the workarounds we need. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 0f462e9e9fe12b595f3c7452123db8e69548abd6)	2011-01-11 16:49:14 +11:00
Martin Schwenke	3efd5ef77c	Eventscripts: only autostart during a monitor event. Otherwise we might short-circuit events that are run only once and actually need to do something. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit c4f9e8a43540bc049b2771e0a2d76d37b9d17331)	2011-01-11 16:48:50 +11:00
Martin Schwenke	fb8f199651	Eventscripts: print a message when reconfiguring a service. Otherwise there can be strange error messages from services stopping/starting, without any context. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 8bcf7ab164429ddc0ae530133e114f186a8146dd)	2011-01-11 16:48:17 +11:00
Martin Schwenke	934ae76d38	Eventscripts: work around NFS restart failure under load. "service nfs restart" can fail. To stop nfsd it sends a SIGINT and nfsd might take a while to process it if the system is loaded. Starting nfsd may then fail because resources are still in use. This does some /proc magic to tell nfsd to do no more processing. It then runs service stop, kills nfsd with SIGKILL, and then runs service start. This is much less likely to fail. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit a9bf4f82852975b0b627f61ceb2d23401f630805)	2011-01-11 16:47:43 +11:00
Ronnie Sahlberg	47aad74673	TYPO (This used to be ctdb commit 38dc1ac2e87416a22c9356596286b773d601e71c)	2011-01-11 16:17:33 +11:00
Ronnie Sahlberg	2a3442d972	STATD is 100027 not 1000247 (This used to be ctdb commit f4cf15a2b06ffefde0cba803603b48040ad0fa05)	2011-01-11 16:16:28 +11:00
Ronnie Sahlberg	7e747aab8d	60.nfs Check if we have rpc.statd and if not, skip checking for statd availability at all (since we cant restart it, there is not point checking if it is alive) (This used to be ctdb commit 6075e85ba6c0f58fd1ab2ce3b09dd3d6ff491365)	2011-01-06 15:49:15 +11:00
Ronnie Sahlberg	ded7c23122	41.HTTPD Httpd can be very slow to start on some platforms, wait 5 monitor intervals before we try to restart it if it has not bound to port 80 yet. After 10 failed intervals, flag the node as unhealthy. (This used to be ctdb commit 6ec1993aa5f2778b8227ce5f6eca0d19e4ae9788)	2010-12-22 10:31:41 +11:00
Ronnie Sahlberg	e9ff38be7d	60.nfs Try to restart LOCKD after 10 failures and flag the node as unhealthy after 15 failures (This used to be ctdb commit 5a67889c9166835aef3443051812d14af07dfca5)	2010-12-22 10:31:31 +11:00
Ronnie Sahlberg	57e74f6d8a	Dont run net serverid wipe in the background (This used to be ctdb commit 76c515f9f05f4fb5683b5ff65cf136c168fd882f)	2010-12-22 10:31:26 +11:00
Ronnie Sahlberg	97a6eccaf7	50.samba Net serverid wipe can take a bit of time sometimes so background it. Only perform auto start/stop of the managed service on the monitor event (This used to be ctdb commit deba5cbbf7703a1a24ce88a06c73fca056e05521)	2010-12-14 21:19:28 +11:00
Ronnie Sahlberg	1e41ab5fa3	LVS update lvs configuration on ipreallocated events too (This used to be ctdb commit a4e98073d955676fdcbb91affae1de1a733d0bc2)	2010-12-13 14:24:16 +11:00
Ronnie Sahlberg	c26c6a01cf	only run "serverid wipe" if we are actually running samba. we dont need to run this on systems where we do run winbind but not samba (This used to be ctdb commit fcb9e8d1e1c78439ea42adb8b05ad84fbca7f724)	2010-12-10 13:42:12 +11:00
Ronnie Sahlberg	8147d29598	add a missing part of the import of the previous ganesha patch (This used to be ctdb commit 171b8855bb2feae7f7dd6a079571f3113dedd6f4)	2010-12-06 11:50:15 +11:00
Chandra Seetharaman	5e485d5ca0	make changes to ctdb event scripts to support NFS-Ganesha. make changes to ctdb event scripts to support NFS-Ganesha. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com> (This used to be ctdb commit 7298588ed54492f106954c893dd86b0a36783470)	2010-12-06 11:50:12 +11:00
Ronnie Sahlberg	8959c8e850	dont try starting samba through the "init" event (This used to be ctdb commit e314a449606418a4c4eac6eb319bfcdf1c398cd3)	2010-12-03 11:40:38 +11:00
Ronnie Sahlberg	6ed0009125	When we are no longer the natgw master, dont put the natgw ip on loopback. We put the ip on loopback just to make sure we would still interoperate with non-standard configurations on unix-KDC, that are configured to verify the optional HostAddresses field. This is not required for AD, since AD does not use this field, and is replaced in unix land with other/better mechanisms than this "dodgy" check. This makes it "easier" for applications that have bound to the natgw address to detect a socket problem and try to reconnect/recover if the ip address is completely missing from the system. At the same time, use the winbind specific hook that exists to explicitely tell winbindd : this address is gone, so if you have bound to it, this is a good time to close and rebind your socket. cq 1020333 (This used to be ctdb commit 0da94869d2912b2a412ba3fbd2137d88ce4e4389)	2010-11-29 12:45:59 +11:00
Ronnie Sahlberg	ebcc866ae0	update autostart/stop to work for samba (This used to be ctdb commit 37ab57e2adaecc3f7996ea20af45a5df0cd8be76)	2010-11-22 20:42:26 +11:00
Ronnie Sahlberg	a3e7dfadca	add an explicit _is_managed_service to iscsi eventscript (This used to be ctdb commit 44f683a1ba15944d3306a0effd572de3280ff975)	2010-11-18 14:15:56 +11:00
Ronnie Sahlberg	193d9d50d1	Dont pollute the logs with a "file not found" message CQ S1020745 (This used to be ctdb commit ea8bb7b26bb879a895c267d49672433182390d0d)	2010-11-18 13:54:15 +11:00
Martin Schwenke	c00db6f271	60.nfs eventscript should do nothing if NFS isn't managed by CTDB. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 582e5cd077501e8d4131a9c7981781471308edfd)	2010-11-18 13:36:40 +11:00
Martin Schwenke	a2af87482b	Eventscript functions - catch failures in ctdb_service_start(). ctdb_service_start() currently succeeds if ctdb_counter_init() succeeds. This changes it to fail when a service start fails. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit ddb73962d72d933bf0edc28be0dbb45bea7e5ef4)	2010-11-18 12:15:05 +11:00
Martin Schwenke	3ab768e8d4	50.samba eventscript should stop/start services when they become (un)managed. When the value of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND (or corresponding changes are made to $CTDB_MANAGED_VERSIONS), the associated service should be started or stopped as necessary. This add calls to ctdb_start_stop_service() to manage starting/stopping samba and winbind. An associated cleanup is made to the initial checks that one of $CTDB_MANAGES_SAMBA or $CTDB_MANAGES_WINBIND is set, replacing them with calls to is_ctdb_managed_service(). To handle the winbind cases ctdb_start_stop_service() and is_ctdb_managed_service() are updated to take an optional service name parameter. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit d98f175e8420d921a123ae9c0ce00945350b1537)	2010-11-18 12:12:30 +11:00
Ronnie Sahlberg	4fe85e5be5	add a new support function ctdb_check_counter_equal() update nfs to try to restart the service after 10 consecutive failures and to flag the node unhealthy after 15 add similar function to mountd (This used to be ctdb commit 1569a54bb82fc433895ed68f816cf48399ad9d40)	2010-11-17 13:54:57 +11:00
Martin Schwenke	8fe1ec3754	Eventscripts: make loadconfig() function hookable by the test suite. Rename loadconfig() to _loadconfig(). Add a new loadconfig() that simply calls _loadconfig(). This makes it easy for the test suite to override loadconfig(). Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 1d77a3adfff893b3c01b87f791e72c0d3148425c)	2010-11-17 11:46:48 +11:00
Martin Schwenke	e23ca7dba5	Make a time comparison in 60.nfs eventscript more readable. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 26077e6c8eb126584af587e7416154ea4858aea2)	2010-11-17 11:44:26 +11:00
Martin Schwenke	6ab5ae2c9b	60.nfs only fails or warns after 10 consecutive nfsd/statd failures. These failures are sometimes the result of slow restarts so we want to avoid dirtying the logs or marking a node unhealthy because of them, unless they are excessive. For these 2 cases we use the existing fail counting code but hack a temporary service_name in a subshell to allow separate fail counts. We also update ctdb_check_rpc() so that it captures the error output from rpcinfo and we add a message including the service name to the beginning. The error is printed to stdout but is also stored in ctdb_check_rpc_out to allow it to be conditionally used by the caller. This function also now returns non-zero rather than exiting on failure. Other direct rpcinfo calls are relaced by called to ctdb_check_rpc() for consistency. Option handling code for service restarts is cleaned up so that fits in 80 columns. A more informative restart messageis now used in all cases, printing the exact command being used to start a service. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 79c25fe241cf5d8f92e23d3736823ebaf4e1769d)	2010-11-17 11:43:09 +11:00
Ronnie Sahlberg	055eafb790	this stuff is just so fragile that it will enter infinite recovery and fail loops on any kind of tiny unexpected error unconditionally try to remove ip addresses from both old and new interface before trying to add it to the new interface to make it less fragile (This used to be ctdb commit 80acca2c91c9053c799365bae918db7ed8bdc56f)	2010-11-10 14:55:25 +11:00
Ronnie Sahlberg	ebed26d755	delete from old interface before adding to new interface this stops the script from failing with an error if both interfaces are specified as the same, which otherwise breaks and leads to an infinite recovery loop (This used to be ctdb commit 565de03a784ed441490f8cd0b137b5cec8716d55)	2010-11-10 14:55:25 +11:00
Ronnie Sahlberg	76578b9533	dont delete all ips from the system during the initial "init" event leave any ips as they are and let the recovery daemon remove them as required (This used to be ctdb commit 8ab311719857847b4cf327507b0af1793551e73c)	2010-11-10 14:55:23 +11:00
Ronnie Sahlberg	a1cfa23d60	Both nfs and nfslock scripts can fail under redhat in very rare situations. Ctdb can also be configured to ignore checking for knfsd and if it is alive. In that situation, no attempt will be made to restart nfs, and sicne nfs is not running, lockd can not be restarted either. To workaround this, everytime we try to restart the lockmanager, also try to restart nfsd (This used to be ctdb commit 953dbfbddad656a64e30a6aca115cb1479d11573)	2010-10-28 13:45:40 +11:00
Ronnie Sahlberg	0d75856bb7	When shuttind down, we always unconditionally try to remove the natgw address even if we are not currently the natgw master. This adds extra reliability in case we have stopped previously without removing it proper, but does add spam messages to syslog everytime we shutdowm. Remove these spam messages from pulluting the syslog upon normal shutdown (This used to be ctdb commit cd84da6f247ee46bbab8318298d1cd3cfc87aba9)	2010-10-28 13:38:07 +11:00
Ronnie Sahlberg	14c8228292	Redirect the output from 00.ctdb pfetch to stdout. Normally, the config.tdb database would not exist, so we do not need to spam syslog with a "config.tdb does not exist" message every time we start ctdb (This used to be ctdb commit 5792809b72e534161c5ca9ef5c9897abcb3b899c)	2010-10-28 13:35:55 +11:00
Stefan Metzmacher	ab6beb6b7f	events.d/11.routing: handle "updateip" event metze (This used to be ctdb commit 034635418c7e5274d6bdf4cccc7a10e3b631e2d4)	2010-10-21 11:09:46 +11:00
Ronnie Sahlberg	b4e3a95039	try to restart NFS LOCKD if it failed to start (This used to be ctdb commit 2913cc93a9a172caf9e0d6675cfa4de4cc957b13)	2010-10-14 08:13:09 +11:00
Ronnie Sahlberg	0de79c12ba	Make sure the statd directory exist before trying to access the "update trigger" file. CQ 1020344 (This used to be ctdb commit 171f98f6f7ce7d01f47c44043ad599702711b12d)	2010-10-12 08:02:18 +11:00
Ronnie Sahlberg	842d9aab4e	move extracting the config from config.tdb for public addresses into its own function (This used to be ctdb commit 2d478a39ed8303b0371112d61630660d12b7db2c)	2010-10-12 02:57:53 +11:00
Ronnie Sahlberg	f7febd28af	dont stop checking interfaces after the first bond device continue the loop to process all other interfaces too (This used to be ctdb commit 500ade4e6a58ea786a665f6be7cf30f43c882570)	2010-10-09 10:55:43 +11:00
Ronnie Sahlberg	51a38dc4a4	Spotted by rusty. Add a missing $ so we delete $_ip and not _ip (This used to be ctdb commit e9d04c5f419eaa0338a3beefba32c52be00242a8)	2010-10-08 15:53:36 +11:00
Ronnie Sahlberg	f5c0539dc6	Change how NATGW is configured to allow special nodes that do not have network connectivity outside of the cluster to still be able to participate in a natgw group. These nodes can not become natgw master since they lack external network connectivity. These nodes are configured just the same way as for any other node with NATGW, with the following two exceptions : * we do NOT set CTDB_NATGW_PUBLIC_IFACE at all on these nodes. since these ndoes lack external network we should not check the interface for link. * we must set CTDB_NATGW_SLAVE_ONLY=yes to flag that this is a node that can not become natgw master. (This used to be ctdb commit ab7b00a37e55beffc074be95b55d8a5c7cb9eef2)	2010-09-08 09:20:16 +10:00
Ronnie Sahlberg	dc2f87737d	Dont store temporary runtime data in $CTDB_BASE/state since that will usually be /etc/ctdb/state and storing this under /etc is just wrong. Add a new variable CTDB_VARDIR that defaults to /var/ctdb and store the data there instead. (This used to be ctdb commit 516423c25afa9861d9988096efa8a4a2b12b31b1)	2010-09-03 12:43:28 +10:00
Ronnie Sahlberg	c7df27e32d	make sure all statd state directories exist before we try to reference them or else tar and friends will throw an error in the log (This used to be ctdb commit 96cbd2c0aa9a4641a42b3c33374675fa732ed1e5)	2010-09-01 15:49:57 +10:00
Ronnie Sahlberg	8be5bf1567	dont print a lot of log information about shutting down vsftpd (This used to be ctdb commit 1a41cd7332703629001201eea8ae9b94f1341c9d)	2010-09-01 13:29:38 +10:00
Ronnie Sahlberg	9ef21f1c07	ouch, remove a dummy debug printout that snuck in there somehow (This used to be ctdb commit 14c4d99513b4bdb94f60c3e9c4823e04b0833e60)	2010-08-30 19:48:41 +10:00
Ronnie Sahlberg	2b4d9170c2	Merge commit 'martins/master' (This used to be ctdb commit cc8c851e2e0b46f00b18a6dc61fd2774e97850dd)	2010-08-30 18:22:05 +10:00
Ronnie Sahlberg	12cc826231	Remove the dependency on the underlying cluster filesystem for handling the clusterwide persistent data associated with the lock manager and statd notifications. Use persistent databases to store this data instead of a shared directory. (This used to be ctdb commit fc0678d351187cfa4c71123f97c0f493aacd5d16)	2010-08-30 18:14:41 +10:00
Ronnie Sahlberg	c95f4258d8	Add a new event "ipreallocated" This is called everytime a reallocation is performed. While STARTRECOVERY/RECOVERED events are only called when we do ipreallocation as part of a full database/cluster recovery, this new event can be used to trigger on when we just do a light failover due to a node becomming unhealthy. I.e. situations where we do a failover but we do not perform a full cluster recovery. Use this to trigger for natgw so we select a new natgw master node when failover happens and not just when cluster rebuilds happen. (This used to be ctdb commit 7f4c591388adae20e98984001385cba26598ec67)	2010-08-30 18:09:30 +10:00
Martin Schwenke	a104d1d823	NFS tickles: use addtickle/deltickle instead of shared tickle directory. This adds a new function update_tickles() that tracks tickles for a given port using the new ctdb addtickle/deltickle commands. This function is used in events.d/60.nfs to handle NFS tickles. events.d/61.nfstickle is removed. The /proc/sys/net/ipv4/tcp_tw_recycle setup is also moved to events.d/60.nfs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit dca4c4ebf3c35f8db3ae208efb7a83abbf726ed6)	2010-08-26 14:59:59 +10:00
Ronnie Sahlberg	3edec07807	Add a configuration database, implemented as a persistent database. This database can be used, as an option, to store the public address assignment instead of editing the /etc/ctdb/public-addresses file manually. This configuration is stored in one record per key, with a key-name of public-addresses:node#<pnn> where <pnn> is the node number. The content of this record is the same syntax as the /etc/ctdb/public-addresses file. When ctdbd starts, if this key exist and contains data. It is extracted from the database and compared with the normal file /etc/ctdb/public-addresses. If the content differs, the config database "wins" and is used to overwrite/update the /etc/ctdb/public-addresses file, after which ctdbd is restarted. The main benefit with this option is that it can be used to update the public address configuration for nodes that are offline/unreachable by updating their configuration in the persistent database. Once the offline node is available again, it will resync its databases with the rest of the cluster, find out that the config has changed, apply the changes and restart ctdbd automatically. The command to store the public address configuration for a node into the persistent database is : ctdb pstore config.tdb public-addresses:node#<pnn> <filename> where <pnn> is the node# we wish to update the config for, and <filename> is a file containing the new content for that nodes public address configuration. (This used to be ctdb commit 292d7435a360efd7f15a7a99f658a605e07c0a81)	2010-08-25 11:49:56 +10:00
Ronnie Sahlberg	2e8aac6689	Merge commit 'rusty/ports-from-1.0.112' into foo (This used to be ctdb commit 13e58d92f5f1723e850a82ae030d0ca57e89b1ee)	2010-08-19 13:17:56 +10:00
Ronnie Sahlberg	729f1ddea0	On RHEL, "service nfs stop;service nfs start" and "service nfs restart" sometimes (very rarely) fails to restart the service. Add a function to restart NFSd on SLES and RHEL-like systems. If we detect the system is unhealthy due to kNFSd not running, try to restart the service again "service nfs restart" and hope for the best. CQ1019372 (This used to be ctdb commit 25c4ce7e919f13226219f036bcffd2be76b2f06c)	2010-08-19 07:18:22 +10:00
Martin Schwenke	6ce1501aa1	Move NAT gateway firewall rules to recovered\|updatenatgw events. The existing code wasn't working as designed in the start event. It should work here. BZ: 62613 Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit aeb70c7e7822854eb87873a5c7783e27e6e72318)	2010-08-18 11:40:07 +09:30
Martin Schwenke	b930c885b3	initscript: wait until we can ping ctdbd before setting tunables. Currently we do a "sleep 1" after starting and before running set_ctdb_variables to set the tunables. This is too arbitrary and might fail if the system is heavily loaded. This, for example, could result in some nodes running with DeterministicIPs and some without, in which case a different IP allocation algorithm would run depending on who is the recmaster! This makes the start function wait until "ctdb ping" succeeds (with 10 second timeout) before trying to run set_ctdb_variables. If a timeout occurs then the start function attempts to kill ctdbd before exiting with a failure. It also cleans up the status reporting code for Red Hat and SUSE so that the final status code is reported. Currently there are cases where a correct status is prematurely reported before a failure occurs. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit cdcd05662a30b51caaeeab4ac44138cac2474e0a)	2010-08-05 15:29:40 +10:00
Martin Schwenke	fe64a8f87a	Optimise 61.nfstickle to write the tickles more efficiently. Currently the file for each IP address is reopened to append the details of each source socket. This optimisation puts all the logic into awk, including the matching of output lines from netstat. The source sockets for each for each destination IP are written into an array entry and then each array entry is written to the corresponding file in a single operation. Signed-off-by: Martin Schwenke <martin@meltin.net> (This used to be ctdb commit 6549e9b01538998d51a5f72bfc569776d232b024)	2010-07-30 16:50:18 +10:00
Stefan Metzmacher	794230775c	events/10.interface: we need to mark interfaces as "up" if we don't know how to monitor them metze (This used to be ctdb commit 1e08d1578d1960fcfc5fdd85492fbd6d194e5e94)	2010-07-30 16:33:27 +10:00
Stefan Metzmacher	7b1345d446	config/interface_modify.sh: do the echo before running the script metze Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit bb1d2bd31073304fc203868517144f61d12b7fc2)	2010-07-15 15:06:51 +09:30
Stefan Metzmacher	3b9eeb1049	config/interface_modify.sh: before calling a script check if it exists and is executable For non bash shells $_s_script might end with '/*'. We do the workarround this way, because it makes sense to check that a script is executable, before trying to execute it. metze [ This actually applies to any shell -- Rusty Russell ] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit e665cfde03fc9ec2264e99512ed5470872a2fd04)	2010-07-15 15:06:39 +09:30
Rusty Russell	34ce8a4f02	config: wrap iptables in flock to avoid concurrancy. When doing a releaseip event, we do them in parallel for all the separate IPs. This creates a problem for iptables, which isn't reentrant, giving the strange message: iptables encountered unknown error "18446744073709551615" while initializing table "filter" The worst possible symptom of this is that releaseip won't remove the rule which prevents us listening to clients during releaseip, and the node will be healthy but non-responsive. The simple workaround is to flock-wrap iptables. Better would be to rework the code so we didn't need to use iptables in these paths. CQ:S1018353 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (This used to be ctdb commit 72d6914ee913272312d7b68f1be5ad05ad06587d)	2010-07-15 10:45:24 +09:30

... 3 4 5 6 7 ...

818 Commits