1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00
Commit Graph

1516 Commits

Author SHA1 Message Date
David Teigland
bbaaf4f1d3 lvmlockd: override unknown lock manager error numbers
When sanlock or dlm lock managers return an error number
that we don't recognize, replace it with a generic -ELMERR
which is defined in the set of special lvmlockd error
numbers.  Otherwise, an unknown lock manager error number
could be misinterpreted for something else if it happened
to overlap another set of error numbers (which they have
not thus far.)
2017-11-17 10:59:12 -06:00
David Teigland
e52d2e3bd8 lvmlockd: retry on other sanlock errors
These less common errors returned from sanlock should
also cause sanlock to retry the lock acquire:

- i/o timeout occurs during sanlock_acquire().
  other i/o on the same disk as the leases can cause
  sanlock i/o timeouts.

- low level disk paxos contention between hosts naturally
  causes one host to not acquire the lease.  There are a
  couple special error numbers associated with these cases
  that should just be recognized as a normal failure to
  acquire the lease.
2017-11-17 10:59:12 -06:00
Zdenek Kabelac
0c9e3e8df2 coverity: add some initilizers
Coverity cannot do a deeper analyzis so let's make just reports
go away and initialize them to 0.
2017-11-07 21:26:11 +01:00
Eric Ren
14d0b0bbdd clvmd: supress ENOENT error on testing connection
In HA cluster, we have "clvm" resource agent to manage clvmd daemon.
The agent invokes clvmd like: "clvmd -T90 -d0", which  always prints
a scaring error message:

"""
local socket: connect failed: No such file or directory
"""

When specifed with "-d" option, clvmd tries to check if an instance
of the clvmd daemon is already running through a testing connection.
The connect() will fail with this ENOENT error in such case, so supress
the error message in such case.

TODO: add missing error reaction code - since ofter log_error, program
is not supposed to continue running (log_error() is for reporting
stopping problems).

Signed-off-by: Eric Ren <zren@suse.com>
2017-11-07 21:24:39 +01:00
David Teigland
1b319f39d6 lvmlockd: check error for sanlock access to lvmlock LV
When the sanlock daemon does not have permission to access
the lvmlock LV, make the error messages more helpful.
2017-10-17 13:45:53 -05:00
Zdenek Kabelac
9940c2f754 dmeventd: schedule exit on break
When dmeventd receives SIGTERM/INT/HUP/QUIT it validates if exit is possible.
If there was any device still monitored, such exit request used to
be ignored/refused. This 'usually' worked reasonably well, however if there
is very short time period between last device is unmonitored and signal
reception - there was possibility such EXIT was ignored, as dmeventd has
not yet got into idle state even commands like 'vgchange -an' has already
finished.

This patch changes logic towards scheduling EXIT to the nearest
point when there is no monitored device.

EXIT is never forgotten.

NOTE: if there is only a single monitored device and someone sends
SIGTERM and later someone uses i.e. 'lvchange --refresh' after
unmonitoring dmeventd will exit and new instance needs to be
started.
2017-10-05 10:19:21 +02:00
Tony Asleson
32c87d56b1 lvmdbusd: thread stacks dump support
If you send a SIGUSR1 (10) to the daemon it will dump all the
threads current stacks to stdout.  This will be useful when the
daemon is apparently hung and not processing requests.

eg.
$ sudo kill -10 <daemon pid>
2017-09-27 07:45:00 -05:00
Tony Asleson
60e3dbd6d5 lvmdbusd: Give threads names
This will allow easier debug.
2017-09-27 07:45:00 -05:00
Tony Asleson
2074094e77 lvmdbusd: Main thread exception logging
Make sure that any and all code that executes in the main thread is
wrapped with a try/except block to ensure that at the very least
we log when things are going wrong.
2017-09-27 07:45:00 -05:00
Tony Asleson
bdccab07f9 lvmdbusd: Improve args comparison 2017-09-21 14:35:36 -05:00
Tony Asleson
7a6e438df8 lvmdbusd: Ensure vg_uuid is present
In some cases we are seeing where there are no VGs, but the data returned from
lvm shows that the PVs have the following for the VG:

"vg_name":"[unknown]", "vg_uuid":""

The code was only checking for the exitence of the VG name and we called into
the function get_object_path_by_uuid_lvm_id which requires both the VG name and
the LV name to exist (asserts this) which results in the following stack trace:

Traceback (most recent call last):
  File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 563, in runner
    obj._run()
  File "/home/tasleson/lvm2/daemons/lvmdbusd/utils.py", line 584, in _run
    self.rc = self.f(*self.args)
  File "/home/tasleson/lvm2/daemons/lvmdbusd/fetch.py", line 26, in
		_main_thread_load
    cache_refresh=False)[1]
  File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 48, in load_pvs
    emit_signal, cache_refresh)
  File "/home/tasleson/lvm2/daemons/lvmdbusd/loader.py", line 37, in common
    objects = retrieve(search_keys, cache_refresh=False)
  File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 40, in
		pvs_state_retrieve
    p["pv_attr"], p["pv_tags"], p["vg_name"], p["vg_uuid"]))
  File "/home/tasleson/lvm2/daemons/lvmdbusd/pv.py", line 84, in __init__
    vg_uuid, vg_name, vg_obj_path_generate)
  File "/home/tasleson/lvm2/daemons/lvmdbusd/objectmanager.py", line 318,
		in get_object_path_by_uuid_lvm_id
    assert uuid
AssertionError
2017-09-21 14:35:36 -05:00
Tony Asleson
e3965d392c lvmdbusd: Fix hang in MThreadRunner
When executing in the main thread, if we encounter an exception we
will bypass the notify_all call on the condition and the calling thread
never wakes up.

@staticmethod
    def runner(obj):
        # noinspection PyProtectedMember
Exception thrown here
 ----> obj._run()
So the following code doesn't run, which causes calling thread to hang
	with obj.cond:
            obj.function_complete = True
            obj.cond.notify_all()

Additionally for some unknown reason the stderr is lost.
Best guess is it's something to do with scheduling a python function
into the GLib.idle_add.  That made finding issue quite difficult.
2017-09-21 14:35:36 -05:00
Zdenek Kabelac
03efec2712 deamonize: restore detection of errors
Keep forked environment for daemon more strick and check even
for nearly impossible to happen errors.
2017-09-06 11:47:53 +02:00
David Teigland
09c792c206 lvmlockd: fix check for no running lock manager
In some cases it was reporting there was no running
lock manager when there is.
2017-08-29 15:18:12 -05:00
David Teigland
f847fcd31a lvmlockd: print error about starting lock manager
In the case where lvmlockd is running, but no lock manager
is running, we should print a specific error message about
that situation.
2017-08-28 16:24:00 -05:00
Zdenek Kabelac
c8fdc5c087 cleanup: easier to read code
Split into lines for better reading.
2017-08-25 14:20:59 +02:00
Zdenek Kabelac
d79d919329 lvmlockd: log pthread_join errno code
Log possible errno with pthread_join (and one close() instance).
2017-08-25 14:20:59 +02:00
Zdenek Kabelac
da9a8fdedc lvmlockctl: fix check for failing close
On  close() failure it's -1.
2017-08-25 14:20:59 +02:00
Zdenek Kabelac
288e10cf8b lvmlockd: avoid double unlock of client_mutex
Avoid double unlocking of client_mutex and
and unlock client_mutex in 'else' branch
since it's already unlocked in 'if (cl->dead)' branch.
2017-08-25 14:20:59 +02:00
Zdenek Kabelac
b3b1e788e1 daemonize: more unified code
ATM we have several instances of daemonizing code.
Each has its 'special' logic so not completely easy
to unify them all into a single routine.

Start to unify them and use one strategy for rediricting
all input/outpus to /dev/null - use 'dup2' function for this
and open /dev/null before fork to make sure it's available.
2017-08-25 14:20:57 +02:00
David Teigland
e71c3ff187 lvmlockd: fix mutex unlock
Remove an unwanted pthread_mutex_unlock which would
lead to a double unlock.
2017-08-23 11:30:55 -05:00
David Teigland
46ddd5520c lvmlockd: add comment about temp ls name 2017-08-23 11:25:18 -05:00
Zdenek Kabelac
d4ce98de4d lvmlockd: shorter code
gcc warns here about storring 69 bytes in 64 byte array (losing
potentially 4 bytes from 'ls->name').

lvmlockd-core.c:2657:36: warning: ‘%s’ directive output may be truncated writing up to 64 bytes into a region of size 60 [-Wformat-truncation=]
  snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
                                    ^~
lvmlockd-core.c:2657:2: note: ‘snprintf’ output between 5 and 69 bytes into a destination of size 64
  snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Replaced with slightly better code - but it still misses error path what
to do if the name would be truncated... - so added FIXME.

Also using all bytes for snprintf() buffer size
(as the size is with \0 included)
2017-08-22 10:23:31 +02:00
Zhang Huan
43305ae8da lvmlockd: reduce io impact for finding sanlock lv free lock offset
currently, lvcreate for sanlock find the free lock offset
from the beginning of the lvmlock every time.
after created thousands of lvs, it will issue thousands of read
ios for lvcreate to find free lock offset.
remeber the last free lock offset will greatly reduce the impact

Signed-off-by: Zhang Huan <zhanghuan@huayun.com>
2017-08-15 11:56:31 -05:00
David Teigland
8ecb5817c7 lvmlockd: global name doesn't apply to sanlock
When adopting locks, we shouldn't skip the special
dlm global lockspace name when using sanlock.
2017-08-07 10:46:03 -05:00
David Teigland
568c7ed6f1 lvmlockd: fix lm running check during adoption
When trying to adopt locks in startup, we want to ignore
a lock manager that isn't running, not fail.
2017-08-07 10:45:59 -05:00
Zdenek Kabelac
92b53a8077 configure: improve test for realtime clock
Check first if we need to even link -lrt  - since clock functions
are normally emebeded with recent  glibc (>=2.17)
Use standard  RT_LIBS name.
Avoid duplicate test for realtime clock with lvmlockd
Show better error message when realtime clock support is missing or
disabled.
Link  RT_LIBS explicitely with lvmlockd and lvmetad.
2017-08-01 14:03:54 +02:00
Zdenek Kabelac
00fdf01d9d makefiles: cleanups 2017-08-01 11:53:32 +02:00
Zdenek Kabelac
2232e82d25 makefiles: fixing linking
Avoid adding -g more then once for debug builds.
Avoid enabling  DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link  lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
2017-08-01 11:53:30 +02:00
Zdenek Kabelac
1fd8785ff3 tidy: drop unneeded return 2017-07-20 11:20:22 +02:00
Zdenek Kabelac
0bf836aa14 tidy: prefer not using else after return
clang-tidy: avoid using  'else' after return - give more readable code,
and also saves indention level.
2017-07-20 11:18:29 +02:00
Zdenek Kabelac
567aa60fa1 lvmetad: cleanup
Avoid hashing insertion when object with same content is already there.
2017-07-17 12:33:17 +02:00
Zdenek Kabelac
f7e62bc55c cleanup: drop extra compare
dm_free() already validates for NULL itself.
2017-07-17 12:32:18 +02:00
Zdenek Kabelac
55f9e2f399 cleanup: shorten dump output code
Save couple lines on code with simpler code.
2017-07-17 12:32:18 +02:00
Zdenek Kabelac
f6c2ee57fa cleanup: drop const from allocated value
Avoid using const for casting to non-const.
2017-07-17 12:32:18 +02:00
Zdenek Kabelac
ba9820b142 numbers: strtod or strtoul need reset of errno
API for strtod() or strtoul() needs reset of errno, before it's being
called. So add missing resets in missing places and some also some
errno validation for out-of-range numbers.
2017-07-17 12:32:18 +02:00
Zdenek Kabelac
28e319ddc0 clvmd: fix valgrind memory report
Avoid reading already released memory and do a continue directly.

Invalid read of size 1
   at 0x1201B0: main_loop (clvmd.c:931)
   by 0x11F640: main (clvmd.c:666)
 Address 0x72ddef0 is 32 bytes inside a block of size 224 free'd
   at 0x4C30D18: free (vg_replace_malloc.c:530)
   by 0x54D6FD1: dm_free_wrapper (dbg_malloc.c:357)
   by 0x122E6E: process_work_item (clvmd.c:2034)
   by 0x123003: lvm_thread_fn (clvmd.c:2085)
   by 0x590A3A8: start_thread (pthread_create.c:465)
   by 0x5C3C7FE: clone (in /usr/lib64/libc-2.25.90.so)
 Block was alloc'd at
   at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
   by 0x54D6EF1: dm_malloc_aux (dbg_malloc.c:286)
   by 0x54D6F1C: dm_zalloc_aux (dbg_malloc.c:291)
   by 0x54D6F96: dm_zalloc_wrapper (dbg_malloc.c:345)
   by 0x11F89C: local_rendezvous_callback (clvmd.c:731)
   by 0x1203D2: main_loop (clvmd.c:964)
   by 0x11F640: main (clvmd.c:666)
2017-07-17 12:30:01 +02:00
Zdenek Kabelac
d7f92ea8ee clvmd: fix valgrind warning
Initialize mutex upfront any debugging and fix this report:

Mutex reinitialization: mutex 0x485d20, recursion count 0, owner 1.
   at 0x4C38480: pthread_mutex_init_intercept (drd_pthread_intercepts.c:821)
   by 0x4C38480: pthread_mutex_init (drd_pthread_intercepts.c:830)
   by 0x11F359: main (clvmd.c:562)
mutex 0x485d20 was first observed at:
   at 0x4C38F63: pthread_mutex_lock_intercept (drd_pthread_intercepts.c:885)
   by 0x4C38F63: pthread_mutex_lock (drd_pthread_intercepts.c:898)
   by 0x11E920: debuglog (clvmd.c:254)
   by 0x11F1D8: main (clvmd.c:527)
2017-07-17 12:29:57 +02:00
Zdenek Kabelac
919fa89482 lvmetad: fix memory leaks
Hash tables need to release no longer needed inserted data.
2017-07-17 12:27:53 +02:00
David Teigland
c995e40b63 lvmlockd: use DM_UUID_LEN for buffer size 2017-07-07 15:00:15 -05:00
Huan Zhang
bffae6c985 lvmlockd: miss adopt orphaned resources
1. dm_uuid is 68 byte length, but buf is 64 which
   will cause miss match uuid from lv lock manager
2. no lv lock_type path in dm config, use lock_args instead

Signed-off-by: Zhang Huan <zhanghuan@chinac.com>
2017-07-07 14:58:14 -05:00
Alasdair G Kergon
f2eda36cfa clvmd: Fix client list corruption
Centralise editing of the client list into _add_client() and
_del_client().  Introduce _local_client_count to track the size of the
list for debugging purposes.  Simplify and standardise the various ways
the list gets walked.

While processing one element of the list in main_loop(),
cleanup_zombie() may be called and remove a different element, so make
sure main_loop() refreshes its list state on return.  Prior to this
patch, the list edits for clients disappearing could race against the
list edits for new clients connecting and corrupt the list and cause a
variety of segfaults.

An easy way to trigger such failures was by repeatedly running shell
commands such as:
  lvs &; lvs &; lvs &;...;killall -9 lvs; lvs &; lvs &;...

Situations that occasionally lead to the failures can be spotted by
looking for 'EOF' with 'inprogress=1' in the clvmd debug logs.
2017-07-01 01:34:38 +01:00
Alasdair G Kergon
af789fd6d0 clvmd: add client id to debug log messages
Use standard format to make it easier to find the client to which each
debug log message refers.
2017-07-01 01:17:40 +01:00
Alasdair G Kergon
17ed254091 clvmd: add debuglog mutex
Log messages issued by different threads occasionally got intertwined.
2017-07-01 00:58:39 +01:00
Zdenek Kabelac
a533892cd3 coverity: checked_return of close
Check (or make quiet) close() ret code.
NOTE: there is another duplicated code of daemonize function which
should be converted to libdaemon.
2017-06-28 14:42:11 +02:00
Zdenek Kabelac
664e947726 coverity: add some error path for failed allocs
Coverity reports some unchecked allocations.
2017-06-27 00:27:36 +02:00
Zdenek Kabelac
feed61f3fa libdm: use rounded float for percent print
Use new added  dm_percent_to_round_float to enhance print
of percentage values.
2017-06-24 17:44:42 +02:00
Zdenek Kabelac
0016b79c8b dmeventd: improve more raid status reporting
When we want to report primary leg failure, check for intial 'a',
since otherwice 'Aa idle' is normally visible.

Also reset array of bit flags marking dead devices, once
plugin detects raid is in sync.
2017-06-24 00:06:12 +02:00
Zdenek Kabelac
653bdedb83 raid: plugin does not to use --config
Functionality of ignore suspend devices is already granted by:

lvm2_disable_dmeventd_monitoring() -> init_run_by_dmeventd() ->
init_ignore_suspended_devices().

In fact plugins should never use --config because it has
some unpleasant technical issues.
2017-06-23 23:32:40 +02:00
Jonathan Brassow
4c0e908b0a RAID (lvconvert/dmeventd): Cleanly handle primary failure during 'recover' op
Add the checks necessary to distiguish the state of a RAID when the primary
source for syncing fails during the "recover" process.

It has been possible to hit this condition before (like when converting from
2-way RAID1 to 3-way and having the first two devices die during the "recover"
process).  However, this condition is now more likely since we treat linear ->
RAID1 conversions as "recover" now - so it is especially important we cleanly
handle this condition.
2017-06-14 08:39:50 -05:00