IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
dev_name(dev) returns "[unknown]" if there are no names
on dev->aliases. It's meant mainly for log messages.
Many places assume a valid path name is returned, and
use it directly. A caller that wants to use the path
from dev_name() must first check if the dev has any
paths with dm_list_empty(&dev->aliases).
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
During removal of a lot of locking code the signal blocking got lost
and signal processing got broken leading to unpredictable
behavior of i.e. activation code the can get interrupted in the
middle of DM table processing.
lvm2 code always expects signals are blocked while lock is held
unless it is explictelly placed into section of:
sigint_allow();....;sigint_restore();
For checking catched interrupt there is sigint_catched();
When either logical block size or physical block size is 4K,
then lvmlockd creates sanlock leases based on 4K sectors,
but the lvm client side would create the internal lvmlock LV
based on the first logical block size it saw in the VG,
which could be 512. This could cause the lvmlock LV to be
too small to hold all the sanlock leases. Make the lvm client
side use the same sizing logic as lvmlockd.
Creating a snapshot was using a persistent LV lock
on the origin, so if the origin LV was inactive at
the time of the snapshot the LV lock would remain.
(Running lvchange -an on the inactive LV would
clear the LV lock.) Use a transient LV lock so it
will be dropped if it was not locked previously.
When pvcreate/pvremove prompt the user, they first release
the global lock, then acquire it again after the prompt,
to avoid blocking other commands while waiting for a user
response. This release/reacquire changes the locking
order with respect to the hints flock (and potentially other
locks). So, to avoid deadlock, use a nonblocking request
when reacquiring the global lock.
When a cachevol LV is attached, have the LV keep it's lock
allocated. The lock on the cachevol won't be used while
it's attached. When the cachevol is split a new lock does
not need to be allocated. (Applies to cachevol usage by
both dm-cache and dm-writecache.)
If a VG is forcibly changed from lock_type sanlock to
lock_type none, the internal lvmlock LV is left behind.
If that LV is not removed before vgremove is run on the
VG, then an internal check will be triggered by the
hidden lvmlock LV. So, check for and remove a left over
lvmlock LV during vgremove.
These two flags may be not reset at the end of
the command when the unlock is implicit, which
is a problem if the cmd struct is reused.
Clear the flags in the general fin_locking.
There have been two file locks used to protect lvm
"global state": "ORPHANS" and "GLOBAL".
Commands that used the ORPHAN flock in exclusive mode:
pvcreate, pvremove, vgcreate, vgextend, vgremove,
vgcfgrestore
Commands that used the ORPHAN flock in shared mode:
vgimportclone, pvs, pvscan, pvresize, pvmove,
pvdisplay, pvchange, fullreport
Commands that used the GLOBAL flock in exclusive mode:
pvchange, pvscan, vgimportclone, vgscan
Commands that used the GLOBAL flock in shared mode:
pvscan --cache, pvs
The ORPHAN lock covers the important cases of serializing
the use of orphan PVs. It also partially covers the
reporting of orphan PVs (although not correctly as
explained below.)
The GLOBAL lock doesn't seem to have a clear purpose
(it may have eroded over time.)
Neither lock correctly protects the VG namespace, or
orphan PV properties.
To simplify and correct these issues, the two separate
flocks are combined into the one GLOBAL flock, and this flock
is used from the locking sites that are in place for the
lvmlockd global lock.
The logic behind the lvmlockd (distributed) global lock is
that any command that changes "global state" needs to take
the global lock in ex mode. Global state in lvm is: the list
of VG names, the set of orphan PVs, and any properties of
orphan PVs. Reading this global state can use the global lock
in sh mode to ensure it doesn't change while being reported.
The locking of global state now looks like:
lockd_global()
previously named lockd_gl(), acquires the distributed
global lock through lvmlockd. This is unchanged.
It serializes distributed lvm commands that are changing
global state. This is a no-op when lvmlockd is not in use.
lockf_global()
acquires an flock on a local file. It serializes local lvm
commands that are changing global state.
lock_global()
first calls lockf_global() to acquire the local flock for
global state, and if this succeeds, it calls lockd_global()
to acquire the distributed lock for global state.
Replace instances of lockd_gl() with lock_global(), so that the
existing sites for lvmlockd global state locking are now also
used for local file locking of global state. Remove the previous
file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN).
The following commands which change global state are now
serialized with the exclusive global flock:
pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove,
vgcreate, vgextend, vgremove, vgreduce, vgrename,
vgcfgrestore, vgimportclone, vgmerge, vgsplit
Commands that use a shared flock to read global state (and will
be serialized against the prior list) are those that use
process_each functions that are based on processing a list of
all VG names, or all PVs. The list of all VGs or all PVs is
global state and the shared lock prevents those lists from
changing while the command is processing them.
The ORPHAN lock previously attempted to produce an accurate
listing of orphan PVs, but it was only acquired at the end of
the command during the fake vg_read of the fake orphan vg.
This is not when orphan PVs were determined; they were
determined by elimination beforehand by processing all real
VGs, and subtracting the PVs in the real VGs from the list
of all PVs that had been identified during the initial scan.
This is fixed by holding the single global lock in shared mode
while processing all VGs to determine the list of orphan PVs.
This reverts 518a8e8cfb
"lvmlockd: activate mirror LVs in shared mode with cmirrord"
because while activating a mirror LV with cmirrord worked,
changes to the active cmirror did not work.
When lvextend extends an LV that is active with a shared
lock, use this as a signal that other hosts may also have
the LV active, with gfs2 mounted, and should have the LV
refreshed to reflect the new size. Use the libdlmcontrol
run api, which uses dlm_controld/corosync to run an
lvchange --refresh command on other cluster nodes.
When an LV is active with a shared lock, a command can be
run to change the LV with --lockopt skiplv (to override the
exclusive lock the command ordinarily requires which is not
compatible with the outstanding shared lock.)
In this case, other commands may have the LV active and may
need to refresh the LV, so print warning stating this.
and "cachepool" to refer to a cache on a cache pool object.
The problem was that the --cachepool option was being used
to refer to both a cache pool object, and to a standard LV
used for caching. This could be somewhat confusing, and it
made it less clear when each kind would be used. By
separating them, it's clear when a cachepool or a cachevol
should be used.
Previously:
- lvm would use the cache pool approach when the user passed
a cache-pool LV to the --cachepool option.
- lvm would use the cache vol approach when the user passed
a standard LV in the --cachepool option.
Now:
- lvm will always use the cache pool approach when the user
uses the --cachepool option.
- lvm will always use the cache vol approach when the user
uses the --cachevol option.
If there are two independent scripts doing:
vgchange --lockstart vg
lvchange -ay vg/lv
The first vgchange to do the lockstart will wait for
the lockstart to complete before returning.
The second vgchange to do the lockstart will see that
the start is already in progress (from the first) and
will do nothing. This means the second does not wait
for any lockstart to complete, and moves on to the
lvchange which may find the lockspace still starting
and fail.
To fix this, make the vgchange lockstart command
wait for any lockstart's in progress to complete.
In few cases error paths from initialization were returned as
'success == 1'.
Also assing num_mb with single compare checking valid sector_size.
For dumb compiler make num_mb always defined.
If a single, standard LV is specified as the cache, use
it directly instead of converting it into a cache-pool
object with two separate LVs (for data and metadata).
With a single LV as the cache, lvm will use blocks at the
beginning for metadata, and the rest for data. Separate
dm linear devices are set up to point at the metadata and
data areas of the LV. These dm devs are given to the
dm-cache target to use.
The single LV cache cannot be resized without recreating it.
If the --poolmetadata option is used to specify an LV for
metadata, then a cache pool will be created (with separate
LVs for data and metadata.)
Usage:
$ lvcreate -n main -L 128M vg /dev/loop0
$ lvcreate -n fast -L 64M vg /dev/loop1
$ lvs -a vg
LV VG Attr LSize Type Devices
main vg -wi-a----- 128.00m linear /dev/loop0(0)
fast vg -wi-a----- 64.00m linear /dev/loop1(0)
$ lvconvert --type cache --cachepool fast vg/main
$ lvs -a vg
LV VG Attr LSize Origin Pool Type Devices
[fast] vg Cwi---C--- 64.00m linear /dev/loop1(0)
main vg Cwi---C--- 128.00m [main_corig] [fast] cache main_corig(0)
[main_corig] vg owi---C--- 128.00m linear /dev/loop0(0)
$ lvchange -ay vg/main
$ dmsetup ls
vg-fast_cdata (253:4)
vg-fast_cmeta (253:5)
vg-main_corig (253:6)
vg-main (253:24)
vg-fast (253:3)
$ dmsetup table
vg-fast_cdata: 0 98304 linear 253:3 32768
vg-fast_cmeta: 0 32768 linear 253:3 0
vg-main_corig: 0 262144 linear 7:0 2048
vg-main: 0 262144 cache 253:5 253:4 253:6 128 2 metadata2 writethrough mq 0
vg-fast: 0 131072 linear 7:1 2048
$ lvchange -an vg/min
$ lvconvert --splitcache vg/main
$ lvs -a vg
LV VG Attr LSize Type Devices
fast vg -wi------- 64.00m linear /dev/loop1(0)
main vg -wi------- 128.00m linear /dev/loop0(0)
The lvmlock LV size was not adjusted correctly for 512 vs 4K
sector sizes which influence the lease size used by sanlock.
When lvmlock was automatically extended, the zeroing through
bcache wasn't working.
This fixes a problem in commit e6bb780d24, in which the
back compat handling for the old locking_type=4 was
incorrectly translated to mean the same thing as --readonly,
which prevented activation because activation uses an
exclusive vg lock. Previously, locking_type=4 allowed
activation.
If we see locking_type 4 in an old config, translate it to
the new combination of --readonly and --sysinit, which we
now define to mean the --readonly behavior with an exception
to allow activation.
Native disk scanning is now both reduced and
async/parallel, which makes it comparable in
performance (and often faster) when compared
to lvm using lvmetad.
Autoactivation now uses local temp files to record
online PVs, and no longer requires lvmetad.
There should be no apparent command-level change
in behavior.
The last commit related to this was incomplete:
"Implement lock-override options without locking type"
This is further reworking and reduction of the locking.[ch]
layer which handled all clustering, but is now only used
for file locking. The "locking types" that this layer
implemented were removed previously, leaving only the
standard file locking. (Some cluster-related artifacts
remain to be cleared out later.)
Command options to override or modify locking behavior
are reimplemented here without using the locking types.
Also, deprecated locking_type values are recognized,
and implemented as if one of the equivalent override
options was set.
Options that override file locking are:
. --nolocking disables all file locking.
. --readonly grants read lock requests without actually
taking a file lock, and refuses write lock requests.
. --ignorelockingfailure tries to set up file locks and
uses them normally if possible. When not possible, it
behaves like --readonly, but allows activation.
. --sysinit is the same as ignorelockingfailure.
. global/metadata_read_only acquires actual read file
locks, and refuses write lock requests.
(Some of these options could probably be deprecated
because they were added as workarounds to various
locking_type behaviors that are now deprecated.)
The locking_type setting now has one valid value: 1 which
refers to standard file locking. Configs that contain
deprecated values are recognized and still work in
largely the same way:
. 0 disabled all locking, now implemented like --nolocking
is set. Allow the nolocking option in all commands.
. 1 is the normal file locking setting and is unchanged.
. 2 was for external locking which was not used, and
reverts to normal file locking.
. 3 was for cluster/clvm. This reverts to normal file
locking, and prints messages about lvmlockd.
. 4 was equivalent to readonly, now implemented like
--readonly is set.
. 5 disabled all locking, now implemented like
--nolocking is set.
The options: --nolocking, --readonly, --sysinit
override, or make exceptions to, the normal file locking
behavior. Implement these by just checking for the
options in the file locking path instead of using
special locking types.
Basic LV functions:
activate_lv(), deactivate_lv(),
suspend_lv(), resume_lv()
were routed through the locking infrastruture on the way to:
lv_activate_with_filter(), lv_deactivate(),
lv_suspend_if_active(), lv_resume_if_active()
This commit removes the locking infrastructure from the
middle and calls the later functions directly from the former.
There were a couple of ancillary steps that the locking
infrastructure added along the way which are still included:
- critical section inc/dec during suspend/resume
- checking for active component LVs during activate
The "activation" file lock (serializing activation) has not
been kept because activation commands have been changed to
take the VG file lock exclusively which makes the activation
lock unused and unnecessary.