IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When converting a VG to locktype sanlock, a new
lease is allocated for each existing lv. Finding
a new lease location involved searching the lvmlock
LV from the start for an unused location, which
would be very slow with many LVs. Improve this by
starting each search from the last used location.
With existing code, the cache was working only to the 2nd. locking.
So i.e. when 'lvs' scans system with more then one VG, the caching
was effectively not working.
Update the code, so the label invalidate code is able to update DM
cache - so whenever we take a new lock - we will refresh the cache.
TODO: the refresh ATM does a very simple compare of old a new list
of cached DM device, and with the first spotted difference, it just
fallback to the full rebuild of DM cache - with large amount of active
devices this might not the most efficient way....
commit a125a3bb50 "lv_remove: reduce commits for removed LVs"
changed "lvremove <vgname>" from removing one LV at a time,
to removing all LVs in one vg write/commit. It also changed
the behavior if some of the LVs could not be removed, from
removing those LVs that could be removed, to removing nothing
if any LV could not be removed. This caused a regression in
shared VGs using sanlock, in which the on-disk lease was
removed for any LV that could be removed, even if the command
decided to remove nothing. This would leave LVs without a
valid ondisk lease, and "lock failed: error -221" would be
returned for any command attempting to lock the LV.
Fix this by not freeing the on-disk leases until after the
command has decided to go ahead and remove everything, and
has written the VG metadata.
Before the fix:
node1: lvchange -ay vg/lv1
node2: lvchange -ay vg/lv2
node1: lvs
lv1 test -wi-a----- 4.00m
lv2 test -wi------- 4.00m
node2: lvs
lv1 test -wi------- 4.00m
lv2 test -wi-a----- 4.00m
node1: lvremove -y vg/lv1 vg/lv2
LV locked by other host: vg/lv2
(lvremove removed neither of the LVs, but it freed
the lock for lv1, which could have been removed
except for the proper locking failure on lv2.)
node1: lvs
lv1 test -wi------- 4.00m
lv2 test -wi------- 4.00m
node1: lvremove -y vg/lv1
LV vg/lv1 lock failed: error -221
(The lock for lv1 is gone, so nothing can be done with it.)
New config setting sanlock_align_size can be used to configure
the sanlock lease size that lvmlockd will use on 4K disks.
By default, lvmlockd and sanlock use 8MiB align_size (lease size)
on 4K disks, which supports up to 2000 hosts (and max host_id.)
This can be reduced to 1, 2 or 4 (in MiB), to reduce lease i/o.
The reduced sizes correspond to smaller max hosts/host_id:
1 MiB = 250 hosts
2 MiB = 500 hosts
4 MiB = 1000 hosts
8 MiB = 2000 hosts (default)
(Disks with 512 byte sectors always use 1MiB leases and support
2000 hosts/host_id, and are not affected by this.)
Previously, lvmlockd detected the end of the lvmlock LV
by doing i/o to it until an i/o error was returned.
This triggered sanlock warning messages, so use the LV
size to avoid accessing beyond the end of the device.
Previously, every lvcreate would refresh the lvmlock LV
in case another machine had extended it. This involves
a lot of unnecessary work in most cases, so now compare
the LV size and device size to detect when a refresh is
needed.
lvremove of a thin lv while the pool is inactive would
leave the pool locked but inactive.
lvcreate of a thin snapshot while the pool is inactive
would leave the pool locked but inactive.
lvcreate of a thin lv could activate the pool to check
a threshold before the pool lock was acquired in lvmlockd.
The cmd struct is now required in many more functions, and
it's added as a function arg for most direct dev-cache function
calls. The cmd struct is added to struct device (dev->cmd) so
that it can be accessed in many other cases where dev-cache
functions are being called from places where getting the cmd
struct is too difficult.
The dm devs cache is separate from the ordinary dev cache,
so give the function names distinct prefixes, using
"dm_devs_cache" to prefix dm devs cache functions.
The list of dm devs was in the cmd struct and had a
different lifetime than the radix trees referencing
those dm devs. Now the list and radix trees are
created and destroyed together.
vgchange -an vg is permitted when the vg lockspace
is not available, because LVs could still be active
for some reason, and they should be inactive when not
properly locked. In case lvmlockd was not running, or
the lockspace was not started, the command was
unnecessarily trying and failing to unlock every LV,
printing errors for every LV. We can skip this when
the lockspace is known to not be available.
Lock adoption is not part of standard command behavior, but can
be used for manual recovery or cleanup from unexpected failure
cases. Like other lockopt values, they are hidden options for
--lockopt. Different lock managers will behave differently.
Adopting locks with lvmlockd -A1 is more accurate and automatic.
--lockopt adoptls
. for vgchange --lockstart
. adopt existing ls, or fail if no existing lockspace is found
--lockopt adoptgl | adoptvg | adoptlv
. for commands using lvmlockd locks
. adopt orphan gl/vg/lv lock, or fail the lock request if
no orphan lock is found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
--lockopt adopt
. for lockstart or any command using lvmlockd locks
. adopt existing lockspace, or start lockspace if none exists
. adopt orphan gl/vg/lv lock, or acquire new lock if no orphan found
. will fail if orphan lock exists with a different lock mode
. command may still continue with a failed shared lock request
. with dlm this option only works for ls
Stop printing "Skipping global lock: lockspace not found or started"
for vgchange --lockstart, since it's generally an inherent limitation
that the global lock isn't available until after locking is started.
Update the start delay warning to "a few seconds".
The lvb is used to hold lock versions, but lock verions are
no longer used (since the removal of lvmetad), so the lvb
is not actually useful. Disable their use for sanlock to
avoid the extra i/o required to maintain the lvb.
vgremove with --lockopt force should skip lvmlockd-related
steps and allow a forced vg cleanup, in addition to using
--nolocking to skip normal locking calls.
Previously, a command would call lockd_vg() for a local VG,
which would go to lvmlockd, which would send back ENOLS,
and the command would not care when it saw the VG was local.
The pointless back-and-forth to lvmlockd for local VGs can
be avoided by checking the VG lock_type in lvmcache (which
label_scan now saves there; this wasn't the case back when
the original lockd_vg logic was added.) If the lock_type
saved during label_scan indicates a local VG, then the
lockd_vg step is skipped.
Set the lock_args string in addition to doing initialization.
lvconvert calls lockd_init_lv_args() directly, skipping
the normal lockd_init_lv() which usually sets lock_args.
Cover a case missed by the recent commit e0ea0706d
"report: query lvmlockd for lv_active_exclusively"
Fix the lv_active_exclusively value reported for thin LVs.
It's the thin pool that is locked in lvmlockd, and the thin
LV state was mistakenly being queried and not found.
Certain LV types like thin can only be activated exclusively, so
always report lv_active_exclusively true for these when active.
Query LV lock state in lvmlockd to report lv_active_exclusively
for active LVs in a shared VGs. As with all lvmlockd state,
it is from the perspective of the local node.
Signed-off-by: corubba <corubba@gmx.de>
The number of extents for the sanlock lvmlock lv is calculated using
integer division, which rounds towards zero. With a physical extent size
of 129M, instead of the requested 256M the lv is only 129M (1 extent).
With any physical extent size greater than 256M the lv creation fails
because the number of extents is zero.
This is fixed by replacing the integer division with a division macro
that rounds up and thus guarantees that the size of the lv will always
be equal or greater than the requested size. Using the examples above, a
pes of 129M will result in a 258M lv (2 extents), pes of 300M in a 300M
lv (1 extent).
The re-calculation of the lv size in bytes and megabytes is only so the
debug output shows the correct values. The size in mb there is still
not byte-perfect-accurate, but good enough for a human-readable estimate;
and the exact size in bytes and extents is right next to it.
Signed-off-by: corubba <corubba@gmx.de>
Names matching internal code layout.
Functionc in thin_manip.c uses thin_pool in its name.
Keep 'pool' only for function working for both cache and thin pools.
No change of functionality.
dev_name(dev) returns "[unknown]" if there are no names
on dev->aliases. It's meant mainly for log messages.
Many places assume a valid path name is returned, and
use it directly. A caller that wants to use the path
from dev_name() must first check if the dev has any
paths with dm_list_empty(&dev->aliases).
Since we check for present DM devices - cache result for
futher use of checking presence of such device.
lvm2 uses cache result for label scan, but also when
it tries to activate or deactivate LV - however only simple
target 'striped' is reasonably supported.
Use disable_dm_devs to be able to control when lv_info()
get cache or uncached results.
TODO: support more type, however this is getting very complicated.
Error path in _lockd_retrive_vg_pv_list() has not zeroed released path
caussing possible double-free later in the code.
Fix it by using one single function freeing lock_pvs structure.
Previously there have been necessary explicit call of backup (often
either forgotten or over-used). With this patch the necessity to
store backup is remember at vg_commit and once the VG is unlocked,
the committed metadata are automatically store in backup file.
This may possibly alter some printed messages from command when the
backup is now taken later.
For shared VG or LV locking, IDM locking scheme needs to use the PV
list assocated with VG or LV for sending SCSI commands, thus it requires
to use some places to generate PV list.
In reviewing the flow for LVM commands, the best place to generate PV
list is in the locking lib. So this is why this patch parses PV list as
shown. It iterates over all the PV nodes one by one, and compare with
the VG name or LV prefix string. If any PV matches, then the PV is
added into the PV list. Finally the PV list is sent to lvmlockd daemon.
Here as mentioned, it compares LV prefix string with the format
"lv_name_", the reason is it needs to find out all relevant PVs, e.g.
for the thin pool, it has LVs for metadata, pool, error, and raw LV, so
we can use the prefix string to find out all PVs belonging to the thin
pool.
For the global lock, it's not covered in this patch. To avoid the egg
and chicken issue, we need to prepare the global lock ahead before any
locking can be used. So the global lock's PV list is established in
lvmlockd daemon by iterating all drives with partition labeled with
"propeller".
Signed-off-by: Leo Yan <leo.yan@linaro.org>
We can consider the drive firmware a server to handle the locking
request from nodes, this essentially is a client-server model.
DLM uses the kernel as a central place to manage locks, so it also
complies with client-server model for locking operations. This is
why IDM and DLM are similar with each other for their wrappers.
This patch largely works by generalizing the DLM code paths and then
providing degeneralized functions as wrappers for both IDM and DLM.
Signed-off-by: Leo Yan <leo.yan@linaro.org>