IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The DM_EVENT_GET_PARAMETERS requests the parameters under which
the running dmeventd is run and the it sends them to caller.
The parameters sent:
- the pid of the running dmeventd
- foreground state
- exec_method (currently either "direct" or "systemd")
The exact message sent back:
pid=<pid> daemon=<no/yes> exec_method=<direct/systemd>
Trying to restart dmeventd as a reload action is causing problems
under systemd environment. The systemd loses track of new dmeventd
this way. See also https://bugzilla.redhat.com/show_bug.cgi?id=1060134
for more info.
We need to call dmeventd -R directly instead of "systemctl reload dm-event.service"
that was used before (the reload is aimed at configuration reload anyway,
not stateful restart of the daemon - we did this before just because
there's no ExecRestart in systemd and there's only ExecStart and
ExecStop with which we'd lose the state).
Also, use ExecStart="dmeventd -f" to run dmeventd in foreground
(and let's rely on systemd to daemonize it) and change the
service type from "forking" to "simple".
The PIE and RELRO compiler/linker options can be used to produce a code
some techniques applied that makes the code more immune to some attacks:
- PIE (Position Independent Executable). It can make use of the ASLR
(Address Space Layout Randomization) provided by kernel to avoid
static locations for .text regions of executables (this is the 'pie'
compiler and linker option)
- RELRO (Relocation Read-Only). This prevents overwrite attacks of
the GOT (Global Offset Table) and PLT (Procedure Lookup Table)
used for relocations by making it read-only after all relocations
are resolved (these are the 'relro' and 'now' linker options) -
hence all symbols are resolved at the very start so there's no
need for those tables to be writeable later.
These compiler/linker options are now used by default for daemons
if the compiler/linker supports it.
Make it easier to run a live lvmetad in debugging mode and
to avoid conflicts if multiple test instances need to be run
alongside a live one.
No longer require -s when -f is used: use built-in default.
Add -p to lvmetad to specify the pid file.
No longer disable pidfile if -f used to run in foreground.
If specified socket file appears to be genuine but stale, remove it
before use.
On error, only remove lvmetad socket file if created by the same
process. (Previous code removes socket even while a running instance
is using it!)
If using lv/vgchange --sysinit -aay and lvmetad is enabled, we'd like to
avoid the direct activation and rely on autoactivation instead so
it fits system initialization scripts.
But if we're calling lv/vgchange --sysinit -aay too early when even
lvmetad service is not started yet, we just need to do the direct
activation instead without printing any error messages (while
trying to connect to lvmetad and not finding its socket).
This patch adds two helper functions - "lvmetad_socket_present" and
"lvmetad_used" which can be used to check for this condition properly
and avoid these lvmetad connections when the socket is not present
(and hence lvmetad is not yet running).
Failures in the temporary mirror used when up-converting cause dmeventd
to issue 'lvconvert --repair' on the sub-LV, <lv_name>_mimagetmp_?. The
'lvconvert' command refuses to deal with this sub-LV outright - it
expects to be given the name of the top-level LV. So, just like we do
with mirrored logs, we strip-off the portion of the name that is not
the top-level LV and issue the command on the top-level LV instead.
This fixes a bug in commit 19baf842 where verify_message
was rejecting the CLVMD_FLAG_REMOTE flag. It was missed
since the patch was ported from an lvm version where that
flag does not exist.
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
A common scenario is during new LV creation when we need to wipe the
newly created LV and avoid any udev scanning before this stage otherwise
it could cause the device (the LV) to be claimed by some other subsystem
for which there were stale metadata within LV data.
This patch adds possibility to mark the LV we're just about to wipe with
a flag that gets passed to udev via DM_COOKIE as a subsystem specific
flag - DM_SUBSYSTEM_UDEV_FLAG0 (in this case the subsystem is "LVM")
so LVM udev rules will take care of handling that.
The corosync cluster interface for clvmd did not correctly
deal with node up/down events so that when a node was removed
from the cluster clvmd would prevent remote operations
from happening, as it thought the node was up but not
running clvmd.
This patch fixes that code by simplifying the case to node
being up or down - which was the original intention
and is supported by pacemaker and CPG in the higher layers.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Add more 'realistic' simulation of dlm locking.
Previous version was not capable to maintain multiple locks.
Current version doesn't handle multiqueues for locks,
so the ordering is different.
The bug addressed by this patch manifested itself during testing
by showing a mirror that never became 'in-sync' after creation.
The bug is isolated to distributions that do not have support
for openAIS checkpointing (i.e. > RHEL6, > F16).
When a node joins a group that is managing a mirror log, the other
machines in the group send it a checkpoint representing the current
state of the bitmap. More than one machine can send a checkpoint,
but only the initial one should be imported. Once the bitmap state
has been imported from the initial checkpoint, operations (such
as resync, mark, and clear operations) can begin. When subsequent
checkpoints are allowed to be imported, it has the effect of erasing
all the log operations between the initial checkpoint and the ones
that follow.
When cmirrord was updated to handle the absence of openAIS
checkpointing (commit 62e38da133),
the new import_checkpoint() function failed to honor the 'no_read'
parameter. This parameter was designed to avoid reading all but
the initial checkpoint. Honoring this parameter has solved the
issue of corrupting bitmap data with secondary checkpoints.
- null_fd resource leak on error path in _reopen_fd_null fn
- dead code in verify_message in clvmd code
- dead code in _init_filter_components in toolcontext code
- null dereference in dm_prepare_selinux_context on error path if
setfscreatecon fails while resetting SELinux context
Check that fields in clvm_header are valid when
local or remote messages are received. If not,
log an error, dump the message data and ignore
the message.
When creating a timeout thread for snapshots, the thread is not
tracked and thus never joined. This means that the exit status
of the timeout thread is held indefinitely. Saves a bit of
memory to set PTHREAD_CREATE_DETACHED when creating this thread.
I've also added pthread_attr_init|destroy to setup the creation
pthread_attr_t.
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Start separating the validation from the action in the basic lvresize
code moved to the library.
Remove incorrect use of command line error codes from lvresize library
functions. Move errors.h to tools directory to reinforce this,
exporting public versions of the error codes in lvm2cmd.h for dmeventd
plugins to use.
Just to make it more clear and also not to confuse
config_valid with check against config definition
(and its 'valid' flag within the config defintion tree).
Instead of calling syslog() from signal event handler,
run all logging code in the main loop.
Also it needs to take the lock and check for list
only when really needed.
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
Switch to use libdm dm_get_status_snapshot() function for
reading status info.
This fixes bug, where the code was using 32bit integers,
while the snapshot target is able to return 64bit sizes.
However this also means, someone is using >1TB snapshot
cow devices, which is actually very bad idea anyway, since the
perfomance and memory usage in this case is very bad.
Patch fixes hidden problem with lvm metadata caching.
When the pretest was made, only the commited data have been cached back
since the call lv_info_by_lvid() triggers mda read operation.
However call of lv_suspend_if_active() also reads precommited metadata.
The problem is visible in this sequence of calls:
vg_write(), suspend_lv(), vg_commit(), resume_lv()
which may end with leaving outdated mda in lvm cache, since vg_write()
drops cached metadata and vg_commit() only transforms precommited
to commited metadata, but in the case of pretesting we have
no precommited mda available so the cache will continue to use
old metadata. This happens, when suspend LV is inactive.
Sharing char* with field has a problem in error path,
when we allocate event, but fail to allocate timeout string.
Instead of creating complicated error paths to resolve
it individually stop using unions, and let the resource
to be released in a simple _free_message().
For example, the old call and reference:
find_config_tree_str(cmd, "devices/dir", DEFAULT_DEV_DIR)
...now becomes:
find_config_tree_str(cmd, devices_dir_CFG)
So we're referring to the named configuration ID instead
of passing the configuration path and the default value
is taken from central config definition in config_settings.h
automatically.
On glibc, those are erroneously (namespace pollution) pulled in via
other headers. this doesn't work with conformant libcs (musl libc in
this case), we simply need to include all needed headers.
Signed-Off-By: John Spencer <maillist-lvm@barfooze.de>
For reseting locale environment into significantly less memory
consuming version 'C' - use LC_ALL instead of LANG since it has
higher priority in locale settings.
Otherwise we may observe whole locale-archive which might be
over 100MB on i.e. Fedora systems locked in memory with
some daemons.
The idea is to avoid a period when an existing VG is not mapped to either the
old or the new name. (Note that the brief "blackout" was present even if the
name did not actually change.) We instead allow a brief overlap of a VG existing
under both names, i.e. a query for a VG might succeed but before a lock is
acquired the VG disappears.
This reverts commit ed23da95b6.
Hash table device_to_pvid seems to contain references to
already deleted pvids and so revert to the older
behaviour using allocated memory.
All operations on shared hash tables need to be protected by mutexes. Moreover,
lookup and subsequent key removal need to happen atomically, to avoid races (and
possible double free-ing) between multiple threads trying to manipulate the same
VG.
If we fail to get memory for mutex, hash the mutex
or fail somewhere along pthread function calls
return allocated resources back and unlock vg_lock_map mutex.
Since we are doing just dump and function doesn't report
any error, explicitely ignore return values from
dm_config_write_node and dm_asprintf.
Same applies for the logging function.
clvmd -d option parsing was not working properly.
clvmd -d 2 (with space) has been ignored because of
'::' used in getopt string, and as failsafe it's been used '1'.
Later this debug_arg has been ignored and debug_opt was used
instead which happend to have value '1'.
Submitted-by: Robert Milasan <rmilasan at suse.com>
Reported-by: Robert Milasan <rmilasan at suse.com>
Try to bring the lvmetad usage text and man page closer to the code.
There seem to be 3 useful ways to use -d with lvmetad at the moment:
-d all
-d wire
-d debug
(They can also be comma-separated like -d wire,debug.)
Prior to the last release, -d, -dd and -ddd were supported.
Fail if an unrecognised debug arg is supplied on the command line.
Change -V to report the same version as the lvm binary: previously it
just reported version 0.
Reset counter after thin pool resize failure.
If the pool goes above threshold, support unmounting
of all thin volumes if the lvextend fails to avoid
overfilling of the pool.
- move common dm_config_tree manipulation functions from lvmetad-core to
daemon-shared
- add config-tree-based request manipulation APIs to daemon-client
- factor out _v (va_list) variants of most variadic functions in libdaemon
If we were defining a section (which is a node without a value) and
the value was created automatically on dm_config_create_node call,
we were wasting resources as the next step after creating the config
node itself was assigning NULL for the node's value.
The dm_config_node_create + dm_config_create_value sequence should be
used instead for settings and dm_config_node_create alone for sections.
The majority of the code already used the correct sequence. Though
with dm_config_node_create fn creating the value as well, the pool
memory was being trashed this way.
This patch removes the node value initialization on dm_config_create_node
fn call and keeps it for the direct dm_config_create_value fn call.
- logging is not controlled by "levels" but by "types"; types are
independent of each other... implementation of the usual "log level"
user-level semantics can be simply done on top; the immediate
application is enabling/disabling wire traffic logging independently
of other debug data, since the former is rather bulky and can easily
obscure almost everything else
- all logs go to "outlets", of which we currently have 2: syslog and
stderr; which "types" go to which "outlets" is entirely configurable
There were several hard-coded values for run directory around the code.
Also, some tools are DM specific only, others are LVM specific and there
was no distinction made here before. With this patch applied, we have
this cleaned up a bit (subsystem in brackets, defaults in parentheses):
[common] configurable PID_DIR (/var/run)
lvm [lvm] configurable RUN_DIR (/var/run/lvm)
configurable locking dir (/var/lock/lvm)
clvmd [lvm] configurable pid file (PID_DIR/clvmd.pid)
socket (RUN_DIR/clvmd.sock)
lvmetad [lvm] configurable pid file (PID_DIR/lvmetad.pid)
socket (RUN_DIR/lvmetad.socket)
dm [dm] configurable DM_RUN_DIR (/var/run)
cmirrord [dm] configurable pid file (PID_DIR/cmirrord.pid)
dmeventd [dm] configurable pid file (PID_DIR/dmeventd.pid)
server fifo (DM_RUN_DIR/dmeventd-server)
client fifo (DM_RUN_DIR/dmeventd-client)
The changes briefly:
- added configure --with-default-pid-dir
- added configure --with-default-dm-run-dir
- added configure --with-lvmetad-pidfile
- by default, using one common pid directory for everything
(only lvmetad was not following this before)
Looking at the code in cmirrord/local.c, we can see the various different
request types handled in different ways. Some information that is non-changing
does not need to go around the cluster and can be short-circuited. For
example, once the cluster mirror is in-sync, it is pointless to continue
sending that query around the cluster. We can save network bandwidth and reply
directly back to the kernel. When it comes to status information, there are
two types 'TABLE' and 'INFO'. The 'TABLE' information never changes and
belongs to the group of requests that can be safely short-circuited. The
'STATUS' information can change - and will change if a device fails. Thus it
cannot be short-circuited, but this is exactly what was found. The 'STATUS'
information request was being short-circuited and therefore never reporting the
failure condition to anyone other than the "server" that experienced it
directly.
In some occasional case dmevent restart was experiencing problems
with obtaining pid lockfile. So this patch tries to send several more kill
message until daemon kills itself so there is would reponse.
With this small loop the restart seems to work reliable,
although the loopsize and usleep are just randomly picked for now.
various dmeventd plug-ins into a new function called 'dmeventd_lvm2_command',
but the new function did not strip off the "_mlog" extentions that the
mirror plug-in had been doing. This created bug 794904 - failure to replace
devices in a redundant log.
The test suite did catch this scenario because it performs repair tests (mainly)
through the CLI and not dmeventd. It's also not easy to test because the test
itself will hang if the bug is encountered.
LISTEN_PID and LISTEN_FDS environment variables are defined only during systemd
"start" action. But we still need to know whether we're activated during
"reload" action as well - we use the reload action to call "dmeventd -R"/"lvmetad -R"
for statefull daemon restart. We can't use normal "restart" as that is simply
composed of "stop" and "start" and we would lose any state the daemon has.
Add 3rd daemon return state "unknown" for lookups that are carried out
successfully but don't find the item requested.
Avoid issuing error messages when it's expected that a device that's
being looked up in lvmetad might not be there.
F17 is getting rid of OpenAIS libraries (and checkpointing). While the
CPG stuff is staying, some if its constants are being removed. So, we
must adjust and use the remaining constants which the CPG constants were based on.
[~]# egrep 'CPG_DISPATCH_ALL|CPG_OK' /usr/include/*/*
corosync/corotypes.h:#define CPG_DISPATCH_ALL CS_DISPATCH_ALL
corosync/corotypes.h:#define CPG_OK CS_OK
The OpenAIS checkpoint library is going away; therefore, cmirrord must
operate without it. The algorithms the handle the timing of when to send
a checkpoint, the determination of what to send, and which ongoing cluster
requests are relevent with respect to the checkpoints are unaffected. We
need only replace the functions that actually perform the storing/transmitting
and retrieving/receiving of the checkpoint data. Rather than store the
checkpoint data in an OpenAIS checkpoint file, we simply transmit it along
with the message that notifies the incoming node that the checkpoint is
ready.
Drop whole buffer clearing (most messages at <100 bytes).
Just make sure we have always \0 terminated string for strlen() operations.
(before for PIPE_BUF sized messages this was not set).
This could be seen as some sort of simple validation - it's not easy to
recognize a valid message for now - but we definitely do not want to
allocate a lot of megabytes in clvmd memory locked daemon when broken
message gets in.
Size of 8000 is just selected for now - possibly there could be much
lower value put in.
As API is passing structures by value, do not leave
the function which created buffer and keeps valid pointer
look like it would be some memory leak and move
free of buffer from inner function - makes more obvious,
how is the memory management handled.
Code cannot proceed if oldname would be NULL.
Since lvmetad currently doesn't use logging mechanism of lvm to report
internal errors - stay with current code style of lvmetad which uses
plain asserts for cases like this.
config tree per PV which is mostly provided by the client, so it can be used to
keep track of things like label_sector, PV format, mda count / offsets and so
on.
- rename the hashes to be explicit about the mapping
- add VG/PV listing calls to the protocol
- cache slightly more of the per-PV state
- filter cached metadata
- compare the metadata upon metadata_update
As atoi may return negative value - test for both limits.
Test log_args for limits before calling alloca().
Code from dmeventd mirror plugin should probably share same code as
we have in mirrored.c.
This patch to the suspend code - like the similar change for resume -
queries the lock mode of a cluster volume and records whether it is active
exclusively. This is necessary for suspend due to the possibility of
preloading targets. Failure to check to exclusivity causes the cluster target
of an exclusively activated mirror to be used when converting - rather than
the single machine target.
used to be a few mis-ordered memory accesses (release and access in the next
block). Fix that set_flag could have sometimes corrupted the flags being
modified.
A few issues with metadata tracking are sorted out as well now, and there are
only a few problems remaining before we can integrate lvmetad, mostly on the
client side:
- metadata areas need to be tracked in lvmetad (most likely to be addressed
through an extension of metadata, meaning no special support in lvmetad would
be needed)
- non-udev scanning code needs to be taught about telling lvmetad about device
disappearance (pvscan most importantly)
- this last item also needs to mesh with metadata inconsistencies and
suddenly-incomplete volume groups (aux disable_dev in tests); udev-based
scanning should address this separately and more elegantly
For snapshot, prepare whole command in front into private buffer.
Add also some missing '\n' for syslog messages.
For raid and mirror only convert creation of command line string.
This should avoid any unbound growth of mempool for dm_split_names.
Fix some minor outstading issue from thin plugin introduction -
Call dmeventd_lvm2_exit() in failpath for registration.
Add some missing '\n' in syslog messages.
The RAID plug-in for dmeventd now calls 'lvconvert --repair' to address failures
of devices in a RAID logical volume. The action taken can be either to "warn"
or "allocate" a new device from any spares that may be available in the
volume group. The action is designated by setting 'raid_fault_policy' in
lvm.conf - the default being "warn".
udev may also need to be disabled if you didn't build it statically too.
dmeventd.static could be fixed with some more work but I don't really see the
point: without dlopen() it's useless, and if you have dlopen(), why not support
normal shared libraries too?
that we can also test clustered volume groups (vgcreate -c y) in the test
suite. Unfortunately we can't make this the testing default since cluster
mirrors require further infrastructure, and snapshots probably don't work at
all. I'll eventually add a few test cases that create clustered VGs
specifically.
monitoring state of the logical volumes they are currently acting on.
Until now, every time a logical volume has been changed by a dmeventd plugin,
this plugin would have called back to dmeventd through the external FIFO
mechanism. I am fairly sure this was superfluous, inefficient and possibly even
dangerous.
Version 2 of the userspace log protocol accepts return information during the
DM_ULOG_CTR exchange. The return information contains the name of the log
device that is being used (if there is one). The kernel can then register the
device via 'dm_get_device'. Amoung other things, this allows for userspace to
assemble a correct dependency tree of devices - critical for LVM handling of
suspend/resume calls.
Also, update dm-log-userspace.h to match the kernel header associated with
this protocol change. (Includes a version inc.)
Code here is using thread write protected variable without locking.
So add locking, for proper synchronization and a FIXME, since the
code needs closer look.
Barrier is supposed to be used in situation like this
and replace tricky mutex usage, where mutex has been unlocked
by a different thread than the locking thread.
Usage of thread unprotected init_test is not correct and needs probably lvm lock
since it part of lvm library. Current implementation may probably fail with
test mode and actually create something unexpectedly (and vice versa).
Since default thread stack size is around 8MB and clvmd creates for now thread
for message, clvmd may easily reach multi GB size of in-memory locked pages
(runs with mlockall()).
This patch significantly reduces memory usage to just tens of MB,
and now different reasons are the cause of server overloading.
Now we are running out of free file descriptors mostly.
Replace usleep with pthread condition to increase speed testing
(for simplicity just 1 condition for all locks).
Use thread mutex also for unlock resource (so it wakes up awaiting
threads)
Better check some error states and return error in fail case with
unlocked mutex.
Since execve passed only NULL as environ, we had lost all environment vars on
restart - thus actually running 'different' clvmd then the one at start.
Preserving environ allows to restart clvmd with the same settings
(i.e. LD_LIBRARY_PATH)
Add test for second restart.
Add named cluster_ops to easily learn the name of the active cluster manager,
so we are able to restart singlenode manager in testing.
Add simple test for clvmd -S (restart) and -R (refresh)
(though it needs some extensions).
Next iteration for better fit of lvmetad compilation.
Move build of libdaemon.a into common subdir Makefile.
libdaemon.a is device-mapper target.
Build and install lvmetad as lvm2 target.
Read 2 environmental vars to learn about overide position for
CLVMD and LVM binaries.
We support LVM_BINARY in other script - and this way we could easily
test restart in our test-suite.
Bugfix:
Add (most probably unfinished) support for -E arg with list of exclusive
locks. (During clvmd restart all exclusive locks would have been lost and
in fact, if there would have been an exclusive lock, usage text would be
printed and clvmd exits.)
Instead of parsing list options multiple times every time some lock UUID is
checked - put them straight into the hash table - make the code easier to
understand as well.
Remove was_ex_lock() function (replaced with dm_hash_lookup()).
Swap return value for get_initial_state() (1 means success).
Update man pages and usage info for -E option.
Patch fixes Clang warnings about possible access via lv_name NULL pointer.
Replaces allocation of memory (strdup) with just pointer assignment
(since execve is being called anyway).
Checks for !*lv_name only when lv_name is defined.
(and as I'm not quite sure what state this really is - putting a FIXME
around - as this rather looks suspicios ??).
Add debug print of passed clvmd args.
(since data in statbuf are invalid).
Check whether sysconf managed to find _SC_PAGESIZE.
Report at least debug warning about failing unlink
(logging scheme here seems to be a different then in lvm).
Duplicate terminal FDs and use similar code as is made in clvmd
and cleanup warns about missing open/close tests.
FIXME: Looks like we already have 3 instancies of the same code in lvm repo.
daemon/common code in a single libdaemon.a, which is completely private. This
is currently linked into the lvmetad binary, and will be linked into LVM (the
client part, since static linking only picks up only symbols that are actually
used). I have also added --enable/disable-lvmetad to ./configure; although the
current default is off, I expect this to be flipped to on shortly. There's no
LVM-side support yet, but when there is, even when built, it'll still need to
be enabled by an lvm.conf option.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
Systemd preloads file descriptors for us and passes them in for
newly spawned daemon when using on-demand fifo (or socket)
based activation.
This patch adds checks for file descriptors preloaded by
systemd and uses them instead of opening the FIFOs again
to properly support on-demand FIFO-based activation.
(We'll change FIFOs to sockets soon - but still this
part of the code will stay almost the same.)
The filename to adjust the oom score was changed in 2.6.36.
We should use oom_score_adj instead of oom_adj (which is still
there under /proc, but it's scheduled for removal in August 2012).
New oom_score_adj uses a range from -1000 (OOM_SCORE_ADJ_MIN,
disable oom killing) to 1000 (OOM_SCORE_ADJ_MAX).
very reasonable amount of parallel access, although the hash tables may become
a point of contention under heavy loads. Nevertheless, there should be orders
of magnitude less contention on the hash table locks than we currently have on
block device scanning.
are affected by the move. (Currently it's possible for I/O to become
trapped between suspended devices amongst other problems.
The current fix was selected so as to minimise the testing surface. I
hope eventually to replace it with a cleaner one that extends the
deptree code.
Some lvconvert scenarios still suffer from related problems.
allocates these buffers in such way it adds memory page for each such buffer
and size of unlock memory check will mismatch by 1 or 2 pages.
This happens when we print or read lines without '\n' so these buffers are
used. To avoid this extra allocation, use setvbuf to set these bufffers ahead.
Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Milan Broz <mbroz@redhat.com>
Reviewed-by: Petr Rockai <prockai@redhat.com>
'a small step' towards cleaner shutdown sequence.
Normally clvmd doens't care about unreleased memory on exit -
but for valgrind testing it's better to have them cleaned all.
So - few things are left on exit path - this patch starts to remove
just some of them.
1. lvm_thread_fs is made as a thread which could be joined on exit()
2. memory allocated to local_clien_head list is released.
(this part is somewhat more complex if the proper reaction is
needed - and as it requires some heavier code moving - it will
be resolved later.
Fix 2 more functions sending cluster messages to avoid passing uninitilised bytes
and compensate 1 extra byte attached to the message from the clvm_header.args[1]
member variable.
Fixing few issues:
struct clvm_header contains 'char args[1]' - so adding '+ 1' here
for message length calculation is 1 byte off.
Message with last byte uninitialized is then passed to write function.
Update also related arglen.
Initialise xid and clintid to 0.
Memory allocation is checked for NULL
- returned char not needed to be explicitly const
- warn if pipe() fails in clvmd (more fixes here needed for error paths...)
- assign (and ignore) read() output in drain buffer
Fixing some const warnings - with API change in:
int vg_extend(struct volume_group *vg, int pv_count, const char *const *pv_names,
Change is needed - as lvm2api expects const behaviour here.
So vg_extend() is doing local strdup for unescaping.
skip_dev_dir return const char* from const char* vg_name.
Rest of the patch is cleanup of related warnings.
Also using dm_report_filed_string() API change to simplify
casting in _string_disp and _lvname_disp.
New strategy for memory locking to decrease the number of call to
to un/lock memory when processing critical lvm functions.
Introducing functions for critical section.
Inside the critical section - memory is always locked.
When leaving the critical section, the memory stays locked
until memlock_unlock() is called - this happens with
sync_local_dev_names() and sync_dev_names() function call.
memlock_reset() is needed to reset locking numbers after fork
(polldaemon).
The patch itself is mostly rename:
memlock_inc -> critical_section_inc
memlock_dec -> critical_section_dec
memlock -> critical_section
Daemons (clmvd, dmevent) are using memlock_daemon_inc&dec
(mlockall()) thus they will never release or relock memory they've
already locked memory.
Macros sync_local_dev_names() and sync_dev_names() are functions.
It's better for debugging - and also we do not need to add memlock.h
to locking.h header (for memlock_unlock() prototyp).
results in clvmd deadlock
When a logical volume is activated exclusively in a cluster, the
local (non-cluster-aware) target is used. However, when creating
a snapshot on the exclusive LV, the resulting suspend/resume fails
to load the appropriate device-mapper table - instead loading the
cluster-aware target.
This patch adds an 'exclusive' parameter to the pertinent resume
functions to allow for the right target type to be loaded.
activated.
In order to achieve this, we need to be able to query whether
the origin is active exclusively (a condition of being able to
add an exclusive snapshot).
Once we are able to query the exclusive activation of an LV, we
can safely create/activate the snapshot.
A change to 'hold_lock' was also made so that a request to aquire
a WRITE lock did not replace an EX lock, which is already a form
of write lock.
Remove temporaly added fs_unlock() calls to fix clmvd usablity.
Now when the message passing is properly working - they are no longer needed.
Simplify no_locking check for VG unlock - as message is always send
for all targets - clustered & non-clustered.
Thanks to CLVMD_CMD_SYNC_NAMES propagation fix the message passing started
to work. So starts to send a message before the VG is unlocked.
Removing also implicit sync in VG unlock from clmvd as now the message
is delievered and processed in do_command().
Also add support for this new message into external locking
and mask this event from further processing.
This is better way how to fix clustered synchronization with udev.
As the code for message passing needs fixed - put currently
fs_unlock() after every active/deactive command in clvmd to
ensure nodes are properly created in time.
Instead of implicitly syncing udev operation in clustered and
file locking code - call synchronization directly in lock_vol() when
the operation unlocks VG
The problem is missing implicit fs_unlock() in the no_locking code.
This is used with --sysinit on read-only filesystem locking dir.
In this case vgchange -ay could exit before all udev nodes are properly
synchronised and may cause problems with accessing such node right after
vgchange --sysinint command is finished.
Add test case for vgchange --sysinit.
return lockspace reference (even if lockspace already exists)
and thus increases DLM lockspace count. It means that after
clvmd restart the lockspace is still in use.
(The only way to clean environment to enable clean cluster
shutdown is call "dlm_tool leave clvmd" several times.)
Because only one clvmd can run in time, we can use simpler logic,
try to open lockspace with dlm_open_lockspace() and only if it fails
try to create new one. This way the lockspace reference doesn not
increase.
Very easily reproducible with "clvmd -S" command.
Patch also fixes return code when clvmd_restart fails and fixes
double free if debug option was specified during restart.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=612862