IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When lvm2 is activating layered pool LV (to basically keep pool opened,
the other function used to be 'locking' be in sync with DM table)
use this LV in read-only mode - this prevents 'write' access into
data volume content of thin-pool.
Note: since EMPTY/unused thin-pool is created as 'public LV' for generic
use by any user who i.e. wish to maintain thin-pool and thins himself.
At this moment, thin-pool appears as writable LV. As soon as the 1st.
thinLV is created, layer volume will appear is 'read-only' LV from this moment.
For a long time there has been a bug in the activation
done by the initial pvscan (which scans all devs to
initialize the lvmetad cache.) It was attempting to
activate all VGs, even those that were not complete.
lvmetad tells pvscan when a VG is complete, and pvscan
needs to use this information to decide which VGs to
activate.
When there are problems that prevent lvmetad from being
used (e.g. lvmetad is disabled or not running), pvscan
activation cannot use lvmetad to determine when a VG
is complete, so it now checks if devices are present
for all PVs in the VG before activating.
(The recent commit "pvscan: avoid redundant activation"
could make this bug more apparent because redundant
activations can cover up the effect of activating an
incomplete VG and missing some LV activations.)
Since we need to preserve allocated strings across 2 separate
activation calls of '_tree_action()' we need to use other mem
pool them dm->mem - but since cmd->mem is released between
individual lvm2 locking calls, we rather introduce a new separate
mem pool just for pending deletes with easy to see life-span.
(not using 'libmem' as it would basicaly keep allocations over
the whole lifetime of clvmd)
This patch is fixing previous commmit where the memory was
improperly used after pool release.
Use temp files in /run/lvm/vgs_online/ to keep track of when
a VG has been autoactivated by pvscan. When pvscan autoactivates
a VG, it creates a temp file with the VG's name. Before a
subsequent pvscan tries to autoactivate the same VG, it checks
if a temp file exists for the VG name, and if so it skips it.
This can commonly happen when many devices appear on the system
at once, which generates several concurrent pvscans. In this case
the first pvscan does initialization by scanning all devices and
activating any complete VGs. The other pvscans would attempt to
activate the same complete VGs again. This extra work could
create a bottleneck of pvscan commands.
If a VG is deactivated by vgchange, the vg online file is removed.
If PVs are then disconnected/reconnected, pvscan will again
autoactivate the VG.
Also, this patch disables the VG refresh that could be called from
pvscan --cache -aay if lvmetad detects metadata inconsistencies.
The role of pvscan should be limited to basic autoactivation, and
any refresh scenarios are special cases that are not appropriate
for automation.
The warning printed by commands retrying an lvmetad connection
has been reduced to once every 10 seconds. New output messages
have been added to pvscan to record when pvscan is falling back
to direct activation of all VGs.
New udev in rawhide seems to be 'dropping' udev rule operations for devices
that are no longer existing - while this is 'probably' a bug - it's
revealing moments in lvm2 that likely should not run in a single
transaction and we should wait for a cookie before submitting more work.
TODO: it seem more 'error' paths should always include synchronization
before starting deactivating 'just activated' devices.
We should probably figure out some 'automatic' solution for this instead
of placing sync_local_dev_name() all over the place...
Support internal removal of 'cache origin' volume - which we
do not normally expose to a user - however internal processing
loops may hit this condition (depending on order of list LVs).
So when this operation is internally requested - we automatically
try to remove it's 'holding' LV (cache LV) - which will also
remove the origin.
Drop the 'cluster-only' optimization so we do resume ALL device
before we try to wait on cookie before 'removal' operation.
It's more correct order of operation - alhtough possibly slightly
less efficient - but until we have correct list of operations
'in-progress' we can't do anything better.
With previous patch 30a98e4d6710a543692d40d11428ae4baea11b7b we
started to put devices one pending_delete list instead
of directly scheduling their removal.
However we have operations like 'snapshot merge' where we are
resuming device tree in 2 subsequent activation calls - so
1st such call will still have suspened devices and no chance
to push 'remove' ioctl.
Since we curently cannot easily solve this by doing just single
activation call (which would be preferred solution) - we introduce
a preservation of pending_delete via command structure and
then restore it on next activation call.
This way we keep to remove devices later - although it might be
not the best moment - this may need futher tunning.
Also we don't keep the list of operation in 1 trasaction
(unless we do verify udev symlinks) - this could probably
also make it more correct in terms of which 'remove' can
be combined we already running 'resume'.
Resuming of 'error' table entry followed with it's dirrect removal
is now troublesame with latest udev as it may skip processing of
udev rules for already 'dropped' device nodes.
As we cannot 'synchronize' with udev while we know we have devices
in suspended state - rework 'cleanup' so it collects nodes
for removal into pending_delete list and process the list with
synchronization once we are without any suspended nodes.
When pvmove is finished, we do a tricky operation since we try to
resume multiple different device that were all joined into 1 big tree.
Currently we use the infromation from existing live DM table,
where we can get list of all holders of pvmove device.
We look for these nodes (by uuid) in new metadata, and we do now a full
regular device add into dm tree structure. All devices should be
already PRELOAD with correct table before entering suspend state,
however for correctly working readahead we need to put correct info
also into RESUME tree. Since table are preloaded, the same table
is skip and resume, but correct read ahead is now set.
Do this at two levels, although one would be enough to
fix the problem seen recently:
- Ignore any reported sector size other than 512 of 4096.
If either sector size (physical or logical) is reported
as 512, then use 512. If neither are reported as 512,
and one or the other is reported as 4096, then use 4096.
If neither is reported as either 512 or 4096, then use 512.
- When rounding up a limited write in bcache to be a multiple
of the sector size, check that the resulting write size is
not larger than the bcache block itself. (This shouldn't
happen if the sector size is 512 or 4096.)
Commit a8921be641afe865c177e11b8859f4b937f76995 was supposedly a fix
for unwanted table reload - however before final commit, the tiny
change has been made that was believed to be an enhancment
of original prepared patch. Unfortunatelly the function
locking_is_clustered is not meant to be an equivalent test
of original function. Drop this change and using
original patch.
Save the previous duplicate PVs in a global list instead
of a list on the cmd struct. dmeventd reuses the cmd struct
for multiple commands, and the list entries between commands
were being freed (apparently), causing a segfault in dmeventd
when it tried to use items in cmd->unused_duplicate_devs
that had been saved there by the previous command.
Recent kernel version from kernel commit:
de7180ff908b2bc0342e832dbdaa9a5f1ecaa33a
started to report in cache status line new flag:
no_discard_passdown
Whenever lvm spots unknown status it reports:
Unknown feature in status:
So add reconginzing this feature flag and also report this with
'lvs -o+kernel_discards'
When no_discard_passdown is found in status 'nopassdown' gets reported
for this field (roughly matching what we report for thin-pools).
This reverts 518a8e8cfbb672c2bf5e3455f1fe7cd8d94eb5b0
"lvmlockd: activate mirror LVs in shared mode with cmirrord"
because while activating a mirror LV with cmirrord worked,
changes to the active cmirror did not work.
Fix bug in table reload for clustered VG.
Function used in lv/vgchange --monitor y|n is using 'somewhat' hackish
shortcut and accesses activation function monitor_dev_for_events()
directly rather through full device refresh to avoid table reload
in case user want to only change monitoring state of device.
However since with old 'mirrors' there is table change dropping
handle_error there was in some cases needed table update.
This was put into monitor_dev_for_events() with assumption
vg_write_lock_held() could be used to decide if the actual monitoring
change comes from read-only lock-holding commands lvchange/vgchange.
However the clustered locking part (clvmd) actually doesn't differentiate
about this concept of having VG openned in read-only or write mode.
So it caused unwanted reloads of mirror tables even in commands like lvconvert.
Thus for clustered locking there is full table reload put into the code and
shortcut applies only for non-clustered locking.
Also the patch tries to avoid calling repeated LV refresh in case the
lv/vgchange uses --refresh & --monitor.
When using 'lvcreate -l100%VG' and there is big disproportion between
real available space and requested setting - automatically fallback
to 100%FREE.
Difference can be seen when VG is big and already most space was
allocated, so the requestion 100%VG can end (and by spec for % modifier
it's correct) as LV with size of 1%VG. Usually this is not a big
problem - buit in some cases - like cache-pool allocation, this
can result a big difference for chunksize selection.
With this patch it's more closely match common-sense logic without
the need of reitteration of too big changes in lvm2 core ATM.
TODO: in the future there should be allocator solving all allocations
in a single call.
Whenever thin-pool chunk size is unspecified and left for lvm calculation
try to select the size as nearest highest power-of-2 instead of
just being a multiple of 64KiB. When multiple is bigger then 1MiB,
keep using 1MiB multiple.
Udev is running udev-rule action upon 'resume'.
However lvm2 in special case is doing replacement of
'soon-to-be-removed' device with 'error' target for resuming
and then follows actual removal - the sequence is usually quick,
so when udev start action - it can result in 'strange' error
message in kernel log like:
Process '/usr/sbin/dmsetup info -j 253 -m 17 -c --nameprefixes --noheadings --rows -o name,uuid,suspended' failed with exit code 1.
To avoid this - we need to ensure there is synchronization wait for udev
between 'resume' and 'remove' part of this process.
However existing code put strict requirement to avoid synchronizing with
udev inside critical section - but this originally came from requirement
to not do anything special while there could be devices in
suspend-state. Now we are able to see differnce between critical section
with or without suspended devices. For udev synchronization only
suspended devices are prohibited to be there - so slightly relax
condition and allow calling and using 'fs_sync()' even inside critical
section - but there must not be any suspended device.
When data are growing, adapt also size of metadata.
As we get way too many reports from users doing huge growths of
data portion while keep metadata small and avoiding using monitoring.
So to enhance the user-experience in case user requests grown of
thin-pool (without passing PV list for growth) - lvm2 will automaticaly
grown also the metadata part of thin-pool (if possible).
Add function for estimation of thin-pool metadata size for given size of
data. Function is using already existing internal API so it can
be reused for resize of thin-pool data.
The master branch uses activate_lv only, but on the stable branch
activate_lv_excl_local was used. This patch restores the constraints.
Fix regression from bad cherry pick: 9b04851fc574ce9cffd30a51d2b750955239f316
There's a small window during creation of a new RaidLV when
rmeta SubLVs are made visible to wipe them in order to prevent
erroneous discovery of stale RAID metadata. In case a crash
prevents the SubLVs from being committed hidden after such
wiping, the RaidLV can still be activated with the SubLVs visible.
During deactivation though, a deadlock occurs because the visible
SubLVs are deactivated before the RaidLV.
The patch adds _check_raid_sublvs to the raid validation in merge.c,
an activation check to activate.c (paranoid, because the merge.c check
will prevent activation in case of visible SubLVs) and shares the
existing wiping function _clear_lvs in raid_manip.c moved to lv_manip.c
and renamed to activate_and_wipe_lvlist to remove code duplication.
Whilst on it, introduce activate_and_wipe_lv to share with
(lvconvert|lvchange).c.
Resolves: rhbz1633167
(cherry picked from commit dd5716ddf258c4a44819fa90d3356833ccf767b4)
Conflicts:
WHATS_NEW
lib/activate/activate.c
lib/metadata/lv_manip.c
lib/metadata/raid_manip.c
tools/lvchange.c
tools/lvconvert.c
When pvscan needs to initialize lvmetad (e.g. lvmetad has just
started and is empty), it should set the lvmetad state to "updating"
before it scans any devices. Otherwise, many parallel pvscans
will try to initialize lvmetad, and in some cases earlier pvscans
with fewer devices information may replace newer pvscans with
more recent information.
This is the default bcache size that is created at the
start of the command. It needs to be large enough to
hold a single copy of metadata for a given VG, or the
VG cannot be read or written (since the entire VG would
not fit into available memory.)
Increasing the default reduces the chances of anyone
needing to increase the default to use their VG.
The size can be set in lvm.conf global/io_memory_size;
the lower limit is 4 MiB and the upper limit is 128 MiB.
When a single copy of metadata gets within 1MB of the
current io_memory_size value, begin printing a warning
that the io_memory_size should be increased.
which defines the amount of memory that lvm will allocate
for bcache. Increasing this setting is required if it is
smaller than a single copy of VG metadata.