IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
When pvcreate/pvremove prompt the user, they first release
the global lock, then acquire it again after the prompt,
to avoid blocking other commands while waiting for a user
response. This release/reacquire changes the locking
order with respect to the hints flock (and potentially other
locks). So, to avoid deadlock, use a nonblocking request
when reacquiring the global lock.
These two flags may be not reset at the end of
the command when the unlock is implicit, which
is a problem if the cmd struct is reused.
Clear the flags in the general fin_locking.
There have been two file locks used to protect lvm
"global state": "ORPHANS" and "GLOBAL".
Commands that used the ORPHAN flock in exclusive mode:
pvcreate, pvremove, vgcreate, vgextend, vgremove,
vgcfgrestore
Commands that used the ORPHAN flock in shared mode:
vgimportclone, pvs, pvscan, pvresize, pvmove,
pvdisplay, pvchange, fullreport
Commands that used the GLOBAL flock in exclusive mode:
pvchange, pvscan, vgimportclone, vgscan
Commands that used the GLOBAL flock in shared mode:
pvscan --cache, pvs
The ORPHAN lock covers the important cases of serializing
the use of orphan PVs. It also partially covers the
reporting of orphan PVs (although not correctly as
explained below.)
The GLOBAL lock doesn't seem to have a clear purpose
(it may have eroded over time.)
Neither lock correctly protects the VG namespace, or
orphan PV properties.
To simplify and correct these issues, the two separate
flocks are combined into the one GLOBAL flock, and this flock
is used from the locking sites that are in place for the
lvmlockd global lock.
The logic behind the lvmlockd (distributed) global lock is
that any command that changes "global state" needs to take
the global lock in ex mode. Global state in lvm is: the list
of VG names, the set of orphan PVs, and any properties of
orphan PVs. Reading this global state can use the global lock
in sh mode to ensure it doesn't change while being reported.
The locking of global state now looks like:
lockd_global()
previously named lockd_gl(), acquires the distributed
global lock through lvmlockd. This is unchanged.
It serializes distributed lvm commands that are changing
global state. This is a no-op when lvmlockd is not in use.
lockf_global()
acquires an flock on a local file. It serializes local lvm
commands that are changing global state.
lock_global()
first calls lockf_global() to acquire the local flock for
global state, and if this succeeds, it calls lockd_global()
to acquire the distributed lock for global state.
Replace instances of lockd_gl() with lock_global(), so that the
existing sites for lvmlockd global state locking are now also
used for local file locking of global state. Remove the previous
file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN).
The following commands which change global state are now
serialized with the exclusive global flock:
pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove,
vgcreate, vgextend, vgremove, vgreduce, vgrename,
vgcfgrestore, vgimportclone, vgmerge, vgsplit
Commands that use a shared flock to read global state (and will
be serialized against the prior list) are those that use
process_each functions that are based on processing a list of
all VG names, or all PVs. The list of all VGs or all PVs is
global state and the shared lock prevents those lists from
changing while the command is processing them.
The ORPHAN lock previously attempted to produce an accurate
listing of orphan PVs, but it was only acquired at the end of
the command during the fake vg_read of the fake orphan vg.
This is not when orphan PVs were determined; they were
determined by elimination beforehand by processing all real
VGs, and subtracting the PVs in the real VGs from the list
of all PVs that had been identified during the initial scan.
This is fixed by holding the single global lock in shared mode
while processing all VGs to determine the list of orphan PVs.
Native disk scanning is now both reduced and
async/parallel, which makes it comparable in
performance (and often faster) when compared
to lvm using lvmetad.
Autoactivation now uses local temp files to record
online PVs, and no longer requires lvmetad.
There should be no apparent command-level change
in behavior.
The last commit related to this was incomplete:
"Implement lock-override options without locking type"
This is further reworking and reduction of the locking.[ch]
layer which handled all clustering, but is now only used
for file locking. The "locking types" that this layer
implemented were removed previously, leaving only the
standard file locking. (Some cluster-related artifacts
remain to be cleared out later.)
Command options to override or modify locking behavior
are reimplemented here without using the locking types.
Also, deprecated locking_type values are recognized,
and implemented as if one of the equivalent override
options was set.
Options that override file locking are:
. --nolocking disables all file locking.
. --readonly grants read lock requests without actually
taking a file lock, and refuses write lock requests.
. --ignorelockingfailure tries to set up file locks and
uses them normally if possible. When not possible, it
behaves like --readonly, but allows activation.
. --sysinit is the same as ignorelockingfailure.
. global/metadata_read_only acquires actual read file
locks, and refuses write lock requests.
(Some of these options could probably be deprecated
because they were added as workarounds to various
locking_type behaviors that are now deprecated.)
The locking_type setting now has one valid value: 1 which
refers to standard file locking. Configs that contain
deprecated values are recognized and still work in
largely the same way:
. 0 disabled all locking, now implemented like --nolocking
is set. Allow the nolocking option in all commands.
. 1 is the normal file locking setting and is unchanged.
. 2 was for external locking which was not used, and
reverts to normal file locking.
. 3 was for cluster/clvm. This reverts to normal file
locking, and prints messages about lvmlockd.
. 4 was equivalent to readonly, now implemented like
--readonly is set.
. 5 disabled all locking, now implemented like
--nolocking is set.
The options: --nolocking, --readonly, --sysinit
override, or make exceptions to, the normal file locking
behavior. Implement these by just checking for the
options in the file locking path instead of using
special locking types.
Basic LV functions:
activate_lv(), deactivate_lv(),
suspend_lv(), resume_lv()
were routed through the locking infrastruture on the way to:
lv_activate_with_filter(), lv_deactivate(),
lv_suspend_if_active(), lv_resume_if_active()
This commit removes the locking infrastructure from the
middle and calls the later functions directly from the former.
There were a couple of ancillary steps that the locking
infrastructure added along the way which are still included:
- critical section inc/dec during suspend/resume
- checking for active component LVs during activate
The "activation" file lock (serializing activation) has not
been kept because activation commands have been changed to
take the VG file lock exclusively which makes the activation
lock unused and unnecessary.
Different flavors of activate_lv() and lv_is_active()
which are meaningful in a clustered VG can be eliminated
and replaced with whatever that flavor already falls back
to in a local VG.
e.g. lv_is_active_exclusive_locally() is distinct from
lv_is_active() in a clustered VG, but in a local VG they
are equivalent. So, all instances of the variant are
replaced with the basic local equivalent.
For local VGs, the same behavior remains as before.
For shared VGs, lvmlockd was written with the explicit
requirement of local behavior from these functions
(lvmlockd requires locking_type 1), so the behavior
in shared VGs also remains the same.
As we start refactoring the code to break dependencies (see doc/refactoring.txt),
I want us to use full paths in the includes (eg, #include "base/data-struct/list.h").
This makes it more obvious when we're breaking abstraction boundaries, eg, including a file in
metadata/ from base/
In fact pvmove does support 'clustered-core' target for clustered
pvmove of LVs activated on multiple nodes.
This patch restores support for activation of pvmove on all nodes
for LVs that are also activate on all nodes.
There is no need to differentiation between clustered VG and normal VG.
As the activation depends on locking type.
Use unconditionally locally exclusive activation for pvmove.
Activation lock has a primary purpose to serialize locking of individual
LV in case there is no other protecting mechanism for parallel
execution.
However in the case an activated LV is composed from several other LVs,
noone should be able to manipulate with those LVs as well.
This patch add a very 'naive' global VG activation locking in this case.
In the future we may introduce smarter function detecting minimal closed
graph components if this will appear as bottleneck
Patch checks if the VG Write lock is held - in this case we do not
need any more locking - command has exclusive access to VG.
In case we have clustered VG and we are activating an LV which does not
need other LVs - we also do not need any more locks.
In all other cases take respective lock - for single LV - use lvid,
for complex LVs use vgname.
Previously, a command sent lvmetad new VG metadata in vg_commit().
In vg_commit(), devices are suspended, so any memory allocation
done by the command while sending to lvmetad, or by lvmetad while
updating its cache could deadlock if memory reclaim was triggered.
Now lvmetad is updated in unlock_vg(), after devices are resumed.
The new method for updating VG metadata in lvmetad is in two phases:
1. In vg_write(), before devices are suspended, the command sends
lvmetad a short message ("set_vg_info") telling it what the new
VG seqno will be. lvmetad sees that the seqno is newer than
the seqno of its cached VG, so it sets the INVALID flag for the
cached VG. If sending the message to lvmetad fails, the command
fails before the metadata is committed and the change is not made.
If sending the message succeeds, vg_commit() is called.
2. In unlock_vg(), after devices are resumed, the command sends
lvmetad the standard vg_update message with the new metadata.
lvmetad sees that the seqno in the new metadata matches the
seqno it saved from set_vg_info, and knows it has the latest
copy, so it clears the INVALID flag for the cached VG.
If a command fails between 1 and 2 (after committing the VG on disk,
but before sending lvmetad the new metadata), the cached VG retains
the INVALID flag in lvmetad. A subsequent command will read the
cached VG from lvmetad, see the INVALID flag, ignore the cached
copy, read the VG from disk instead, update the lvmetad copy
with the latest copy from disk, (this clears the INVALID flag
in lvmetad), and use the correct VG metadata for the command.
(This INVALID mechanism already existed for use by lvmlockd.)
Take a local file lock to prevent concurrent activation/deactivation of LVs.
Thin/cache types and an extension for cluster support are excluded for
now.
'lvchange -ay $lv' and 'lvchange -an $lv' should no longer cause trouble
if issued concurrently: the new lock should make sure they
activate/deactivate $lv one-after-the-other, instead of overlapping.
(If anyone wants to experiment with the cluster patch, please get in touch.)
Previously, we have relied on UUIDs alone, and on lvmcache to make getting a
"new copy" of VG metadata fast. If the code which triggers the activation has
the correct VG metadata at hand (the version which is currently on disk), it can
now hand it to the activation code directly.
Move the free_vg() to vg.c and replace free_vg with release_vg
and make the _free_vg internal.
Patch is needed for sharing VG in vginfo cache so the release_vg function name
is a better fit here.
New strategy for memory locking to decrease the number of call to
to un/lock memory when processing critical lvm functions.
Introducing functions for critical section.
Inside the critical section - memory is always locked.
When leaving the critical section, the memory stays locked
until memlock_unlock() is called - this happens with
sync_local_dev_names() and sync_dev_names() function call.
memlock_reset() is needed to reset locking numbers after fork
(polldaemon).
The patch itself is mostly rename:
memlock_inc -> critical_section_inc
memlock_dec -> critical_section_dec
memlock -> critical_section
Daemons (clmvd, dmevent) are using memlock_daemon_inc&dec
(mlockall()) thus they will never release or relock memory they've
already locked memory.
Macros sync_local_dev_names() and sync_dev_names() are functions.
It's better for debugging - and also we do not need to add memlock.h
to locking.h header (for memlock_unlock() prototyp).
activated.
In order to achieve this, we need to be able to query whether
the origin is active exclusively (a condition of being able to
add an exclusive snapshot).
Once we are able to query the exclusive activation of an LV, we
can safely create/activate the snapshot.
A change to 'hold_lock' was also made so that a request to aquire
a WRITE lock did not replace an EX lock, which is already a form
of write lock.
Thanks to CLVMD_CMD_SYNC_NAMES propagation fix the message passing started
to work. So starts to send a message before the VG is unlocked.
Removing also implicit sync in VG unlock from clmvd as now the message
is delievered and processed in do_command().
Also add support for this new message into external locking
and mask this event from further processing.
Ignore snapshots when performing mirror recovery beneath an origin.
Pass LCK_ORIGIN_ONLY flag around cluster.
Add suspend_lv_origin and resume_lv_origin using LCK_ORIGIN_ONLY.