IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Add a global timeout value to be used for the threads to end waiting for
whatever it is they are blocked on. The values varied from 2-5 seconds,
which is way longer than needed. Value of 0.5 shows no CPU load when
service is running and is idle.
When the daemon is starting we do an initial fetch of lvm state. If we
happened to get some type of failure with lvm during this time we would
exit. During error injection testing this happened enough that
the unit tests were unable to finish. Add retries to ensure we can get
started during error injection testing.
Historically we have seen a few different errors which occur when we call
fullreport. Failing exit code and JSON which is missing one or more keys.
Instruct lvm to dump the debug to a file during fullreport calls when we
fork & exec lvm. If we encounter an error, ouput the debug data.
The reason this isn't being done when lvmshell is used is because we
don't have an easy way to test the error paths.
This change is complicated by the following:
1. We don't know if fullreport was good until we evaluate all the JSON.
This is done a bit after we have called into lvm and returned.
2. We don't want to orphan the debug file used by lvm if the daemon is
killed. Thus we try to minimize the window where the debug file hasn't
already been unlinked. A RFE to pass an open FD to lvm for this
purpose is outstanding.
The temp. file is:
-rw------. 1 root root /tmp/lvmdbusd.lvm.debug.XXXXXXXX.log
Introduce an exception which is used for known existing issues with lvm.
This is used to distinguish between errors between lvm itself and lvmdbusd.
In the case of lvm bugs, when we simply retry the operation we will log
very little. Otherwise, we will dump a full traceback for investigation
when we do the retry.
Lvm occasionally fails to return all the request JSON keys in the output of
"fullreport". This happens very rarely. When it does the daemon was reporting
the resulting informational exception:
MThreadRunner: exception
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/lvmdbusd/utils.py", line 667, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.9/site-packages/lvmdbusd/fetch.py", line 40, in _main_thread_load
(lv_changes, remove) = load_lvs(
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 143, in load_lvs
return common(
File "/usr/lib/python3.9/site-packages/lvmdbusd/loader.py", line 37, in common
objects = retrieve(search_keys, cache_refresh=False)
File "/usr/lib/python3.9/site-packages/lvmdbusd/lv.py", line 95, in lvs_state_retrieve
l['vdo_operating_mode'],
KeyError: 'vdo_operating_mode'
The daemon retries the operation, which usually works and the daemon continues.
However, simply reporting this informational stack trace is causing CI and other
automated tests to fail as they expect no tracebacks in the log output.
Remove the reporting of this code path unless it persists and causes the daemon
to give up and exit.
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=2120267
When the daemon isn't started with --debug we will keep a circular
buffer of the past N number of debug messages which we will output
when we encounter an issue.
Rather than trying to bubble up return codes that get us to exit cleanly
it's better to just raise an exception to bail. In some cases functions
don't have return codes, so they cannot be checked.
When we are walking the new lvm state comparing it to the old state we can
run into an issue where we remove a VG that is no longer present from the
object manager, but is still needed by LVs that are left to be processed.
When we try to process existing LVs to see if their state needs to be
updated, or if they need to be removed, we need to be able to reference the
VG that was associated with it. However, if it's been removed from the
object manager we fail to find it which results in:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/lvmdbusd/utils.py", line 666, in _run
self.rc = self.f(*self.args)
File "/usr/lib/python3.6/site-packages/lvmdbusd/fetch.py", line 36, in _main_thread_load
cache_refresh=False)[1]
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 146, in load_lvs
lv_name, object_path, refresh, emit_signal, cache_refresh)
File "/usr/lib/python3.6/site-packages/lvmdbusd/loader.py", line 68, in common
num_changes += dbus_object.refresh(object_state=o)
File "/usr/lib/python3.6/site-packages/lvmdbusd/automatedproperties.py", line 160, in refresh
search = self.lvm_id
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 483, in lvm_id
return self.state.lvm_id
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 173, in lvm_id
return "%s/%s" % (self.vg_name_lookup(), self.Name)
File "/usr/lib/python3.6/site-packages/lvmdbusd/lv.py", line 169, in vg_name_lookup
return cfg.om.get_object_by_path(self.Vg).Name
Instead of removing objects from the object manager immediately, we will
keep them in a list and remove them once we have processed all of the state.
Ref:
https://bugzilla.redhat.com/show_bug.cgi?id=1968752
When a LV loses an interface it ends up getting removed and recreated.
This happens after the VGs have been processed and updated. Thus when
this happens we need to re-check the VGs.
In some cases we get stuck where we are unable to retrieve the current
state of lvm as we are encountering an error. When the error is
persistent we will log and exit the daemon instead of consuming vast
amounts of resources.
Signed-off-by: Tony Asleson <tasleson@redhat.com>
If during the process of fetching current lvm state we experience an
exception we fail to call set_result on the queued_requests we were
processing. When this happens those threads block forever which causes
the service to stall infinitely. Only clear the queued_requests after
we have called set_result.
In preparation to have more than one thread issuing commands to lvm
at the same time we need to serialize updates to the dbus state and
retrieving the global lvm state. To achieve this we have one thread
handling this with a thread safe queue taking and coalescing requests.
We will fetch the lvm state in non-main thread and only process the new
data with the main thread to prevent hanging the main thread event loop.
ref. https://bugs.freedesktop.org/show_bug.cgi?id=98521