IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Fix for commit 79c4971210.
This bug leads lvm to attempt a bogus recovery of the
orphan VG whenever the orphan VG is read.
No recovery code exists for the orphan VG, but lvm still
attempts it when the "consistent" variable is not set to 1.
When lvm attempts the bogus/no-op orphan recovery, it tries
to get a write lock. Usually the write lock succeeds, and
"recovery" does nothing, so fairly harmless. But, with
locking_type=4, the write lock fails, which bubbles up to
cause the whole command failure.
Do this at two levels, although one would be enough to
fix the problem seen recently:
- Ignore any reported sector size other than 512 of 4096.
If either sector size (physical or logical) is reported
as 512, then use 512. If neither are reported as 512,
and one or the other is reported as 4096, then use 4096.
If neither is reported as either 512 or 4096, then use 512.
- When rounding up a limited write in bcache to be a multiple
of the sector size, check that the resulting write size is
not larger than the bcache block itself. (This shouldn't
happen if the sector size is 512 or 4096.)
(cherry picked from commit 7550665ba4)
Conflicts:
lib/device/dev-io.c
Still the place can be better to block only particular reshape
operations which ATM cause kernel problems.
We check if the new number of images is higher - and prevent to take
conversion if the volume is in use (i.e. thin-pool's data LV).
(cherry picked from commit 96985b1373)
The return value from bcache_invalidate_fd() was not being checked.
So I've introduced a little function, _invalidate_fd() that always
calls bcache_abort_fd() if the write fails.
Until we resolve reshape for 'stacked' devices, we need to disable it.
So users can no longer reshape i.e. thin-pool data volumes, causing
ATM bad thin-pool problems.
ev_unset_last_byte() must be called while the fd is still valid.
After a write error, dev_unset_last_byte() must be called before
closing the dev and resetting the fd.
In the write error path, dev_unset_last_byte() was being called
after label_scan_invalidate() which meant that it would not unset
the last_byte values.
After a write error, dev_unset_last_byte() is now called in
dev_write_bytes() before label_scan_invalidate(), instead of by
the caller of dev_write_bytes().
In the common case of a successful write, the sequence is still:
dev_set_last_byte(); dev_write_bytes(); dev_unset_last_byte();
Signed-off-by: Zhao Heming <heming.zhao@suse.com>
When resizing 2 volumes like thin-pool and it's metadata and they
would be of a different type - command would be actually expecting
both LVs being of a same segtype - and would throw an error in
case they are different.
This patch fixes is by setting a new segtype from last segment of
2nd. extented device.
Also it fixes the possible 'percentage' extension setup that
might have been used for 'primary' volume - while the 'secondary'
LV always goes with direct size - as we do not support 'percentage'
setup for them
This affects maily usage of thin-pool where the extension of
thin-pool data size may also lead to extension of metadata size.
Instead of checking all LVs in a VG - do just a direct copy of LVs
from the existing list ->segs_using_thin_lv.
TODO: maybe it could be better to expose seg_list to /tools...
Enhance lv_info with lv_info_with_name_check.
This 'variant' not only check existance if UUID in DM table
but also compares its DM name whether it's matching expected LV name.
Otherwise activation may 'skip' activation with rename in case the
DM UUID already exists, just device is different name.
This change make fairly easier manipulation with i.e. detached mirror
leg which ATM is using same UUID - just the LV name have been changed.
Used code was not able to run 'activation' (and do a rename) and just
skipped the call. So the code used to do a workaround and 'tried'
to deactivate such LV firts - this however work only in non-clvmd case,
as cluster was not having the lock for deactivated LV.
With this extended lv_info code will run 'activation' and will
synchronize the name to match expected LV name.
Patch extends _lv_info() with new paramter 'with_name_check',
which is later translated into 'name_check' argument for
_info_run() which in case of name mismatch evaluates the
check as if device does not exists.
Such call is only used in one place _lv_activate() which then
let activation run. All other invocation of _info() calls
are left intact.
TODO: fix mirror table manipulation (and raid)....
The resume of 'released' 'COW' should preceed the resume of origin.
The fact we need to do the sequence differently for merge was
cause by bugs fixed in 2 previous commits - so we no longer need
to recognize 'merging' and we should always go with single
sequence.
The importance of this order is - to properly remove '-real' device
from origin LV. When COW is activated as 2nd. '-real' device is
kept in table as it cannot be removed during 1st. resume of origin,
and later activation of COW LV no longer builds tree associated
with origin LV.
When checking device id of a thin device that is just being
merged - the snapshot actually could have been already finished
which means '-real' suffix for the LV is already gone and just LV
is there - so check explicitely for this condition and use
correct UUID for this case.
to use the old behavior by default, which allows
using mixed block sizes. The user will need to
set it to 0 in lvm.conf to turn on the new restrictions.
error could be reproduced follow those steps:
#!/bin/bash
vgcreate vgtest /dev/sdb
lvcreate -L 100M -n lv1 vgtest
while :
do
service lvm2-lvmetad restart
vgs &
pvscan &
lvcreate -L 100M -n lv2 vgtest &
lvchange /dev/vgtest/lv1 --addtag xxxxx &
wait
if ! lvs|grep lv2;then
echo "err create"
break
fi
sleep 1
lvremove -y /dev/vgtest/lv2
lvchange /dev/vgtest/lv1 --deltag xxxxx
done
and then fail to create vgtest/lv2, actually lv2 was created, while
the metadata written on disk is replaced by lvchange. It could look
up lv2 by calling dmsetup table, while lvs could not.
This is because, when lvmetad restarted, several lvm commands update
token concurrently, when lvcreate recieve "token_mismatch", it cancle
communicating with lvmetad, which leads to that lvmetad cache is not
sync with the metadata on disk, then lv2 is not committed to lvmetad
cache. The metadata of vgtest which lvchange query from lvmetad is
out of date. After lvchange, it use the old metadata cover the new one.
This patch let lvm process update token synchronously, only one command
update lvmetad token at a time.
lvmetad_pvscan_single send the metadata on a pv by sending "pv_found"
to lvmetad, while the metadata maybe out of date after waiting for the
chance to update lvmetad token. Call label_read to read metadata again.
Token mismatch may lead to problems, increase log level.
Signed-off-by: wangjufeng<wangjufeng@huawei.com>
Avoid having PVs with different logical block sizes in the same VG.
This prevents LVs from having mixed block sizes, which can produce
file system errors.
The new config setting devices/allow_mixed_block_sizes (default 0)
can be changed to 1 to return to the unrestricted mode.
(cherry picked from commit 0404539edb)
Conflicts:
tools/lvmcmdline.c
tools/toollib.c
When lvm2 is activating layered pool LV (to basically keep pool opened,
the other function used to be 'locking' be in sync with DM table)
use this LV in read-only mode - this prevents 'write' access into
data volume content of thin-pool.
Note: since EMPTY/unused thin-pool is created as 'public LV' for generic
use by any user who i.e. wish to maintain thin-pool and thins himself.
At this moment, thin-pool appears as writable LV. As soon as the 1st.
thinLV is created, layer volume will appear is 'read-only' LV from this moment.
For a long time there has been a bug in the activation
done by the initial pvscan (which scans all devs to
initialize the lvmetad cache.) It was attempting to
activate all VGs, even those that were not complete.
lvmetad tells pvscan when a VG is complete, and pvscan
needs to use this information to decide which VGs to
activate.
When there are problems that prevent lvmetad from being
used (e.g. lvmetad is disabled or not running), pvscan
activation cannot use lvmetad to determine when a VG
is complete, so it now checks if devices are present
for all PVs in the VG before activating.
(The recent commit "pvscan: avoid redundant activation"
could make this bug more apparent because redundant
activations can cover up the effect of activating an
incomplete VG and missing some LV activations.)
Since we need to preserve allocated strings across 2 separate
activation calls of '_tree_action()' we need to use other mem
pool them dm->mem - but since cmd->mem is released between
individual lvm2 locking calls, we rather introduce a new separate
mem pool just for pending deletes with easy to see life-span.
(not using 'libmem' as it would basicaly keep allocations over
the whole lifetime of clvmd)
This patch is fixing previous commmit where the memory was
improperly used after pool release.
Use temp files in /run/lvm/vgs_online/ to keep track of when
a VG has been autoactivated by pvscan. When pvscan autoactivates
a VG, it creates a temp file with the VG's name. Before a
subsequent pvscan tries to autoactivate the same VG, it checks
if a temp file exists for the VG name, and if so it skips it.
This can commonly happen when many devices appear on the system
at once, which generates several concurrent pvscans. In this case
the first pvscan does initialization by scanning all devices and
activating any complete VGs. The other pvscans would attempt to
activate the same complete VGs again. This extra work could
create a bottleneck of pvscan commands.
If a VG is deactivated by vgchange, the vg online file is removed.
If PVs are then disconnected/reconnected, pvscan will again
autoactivate the VG.
Also, this patch disables the VG refresh that could be called from
pvscan --cache -aay if lvmetad detects metadata inconsistencies.
The role of pvscan should be limited to basic autoactivation, and
any refresh scenarios are special cases that are not appropriate
for automation.
The warning printed by commands retrying an lvmetad connection
has been reduced to once every 10 seconds. New output messages
have been added to pvscan to record when pvscan is falling back
to direct activation of all VGs.
New udev in rawhide seems to be 'dropping' udev rule operations for devices
that are no longer existing - while this is 'probably' a bug - it's
revealing moments in lvm2 that likely should not run in a single
transaction and we should wait for a cookie before submitting more work.
TODO: it seem more 'error' paths should always include synchronization
before starting deactivating 'just activated' devices.
We should probably figure out some 'automatic' solution for this instead
of placing sync_local_dev_name() all over the place...
Support internal removal of 'cache origin' volume - which we
do not normally expose to a user - however internal processing
loops may hit this condition (depending on order of list LVs).
So when this operation is internally requested - we automatically
try to remove it's 'holding' LV (cache LV) - which will also
remove the origin.
Drop the 'cluster-only' optimization so we do resume ALL device
before we try to wait on cookie before 'removal' operation.
It's more correct order of operation - alhtough possibly slightly
less efficient - but until we have correct list of operations
'in-progress' we can't do anything better.
With previous patch 30a98e4d67 we
started to put devices one pending_delete list instead
of directly scheduling their removal.
However we have operations like 'snapshot merge' where we are
resuming device tree in 2 subsequent activation calls - so
1st such call will still have suspened devices and no chance
to push 'remove' ioctl.
Since we curently cannot easily solve this by doing just single
activation call (which would be preferred solution) - we introduce
a preservation of pending_delete via command structure and
then restore it on next activation call.
This way we keep to remove devices later - although it might be
not the best moment - this may need futher tunning.
Also we don't keep the list of operation in 1 trasaction
(unless we do verify udev symlinks) - this could probably
also make it more correct in terms of which 'remove' can
be combined we already running 'resume'.
Resuming of 'error' table entry followed with it's dirrect removal
is now troublesame with latest udev as it may skip processing of
udev rules for already 'dropped' device nodes.
As we cannot 'synchronize' with udev while we know we have devices
in suspended state - rework 'cleanup' so it collects nodes
for removal into pending_delete list and process the list with
synchronization once we are without any suspended nodes.
When pvmove is finished, we do a tricky operation since we try to
resume multiple different device that were all joined into 1 big tree.
Currently we use the infromation from existing live DM table,
where we can get list of all holders of pvmove device.
We look for these nodes (by uuid) in new metadata, and we do now a full
regular device add into dm tree structure. All devices should be
already PRELOAD with correct table before entering suspend state,
however for correctly working readahead we need to put correct info
also into RESUME tree. Since table are preloaded, the same table
is skip and resume, but correct read ahead is now set.
Do this at two levels, although one would be enough to
fix the problem seen recently:
- Ignore any reported sector size other than 512 of 4096.
If either sector size (physical or logical) is reported
as 512, then use 512. If neither are reported as 512,
and one or the other is reported as 4096, then use 4096.
If neither is reported as either 512 or 4096, then use 512.
- When rounding up a limited write in bcache to be a multiple
of the sector size, check that the resulting write size is
not larger than the bcache block itself. (This shouldn't
happen if the sector size is 512 or 4096.)