IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When user configured lvm2 to NOT user monitoring, activated mirror
actually hang upon error and it's quite unusable moment.
So instead Warn those 'brave' non-monitoring users about possible
problem and activation mirror without blocking error handling.
This also makes it a bit simpler for test suite to handle trouble
cases when test is running without dmeventd.
When adjusting region size for clustered VG it always needs to fit
2 full bitset into 1MB due to old limits of CPG.
This is relatively big amount of bits, but we have still limitation
for region size to fit into 32bits (0x8000000).
So for too big mirrors this operation needs to fail - so whenever
function returns now 0, it means we can't find matching region_size.
Since return 0 is now 'error' we need to also pass proper region_size
when creating pvmove mirror.
Since extent_size is no longer power_of_2 this max region size
evalution was rather producing random bitsize as a combination
of lowest bit from number of extents and extent size itself.
Correct calculation to use whole LV size and pick biggest
possible power of 2 value smaller then UINT32_MAX.
Drop mirrored mirror log limitation that applies only in very limited
use-case and actually mirrored mirror log is deprecated anyway.
So 'disk' mirror log is selecting the correct minimal size, and
bigger size is only enforced with real mirrored mirror log.
Also for mirrored mirror log we let use 'smalled' region size if needed
so if user uses 1G region size, we still keep small mirror log
with much smaller region size in this case when needed.
Also mirror log extent calculation is now properly detecting error
with too big mirrors where previosly trimmed uint32_t was applies
unintentionally.
Whenever we make visible LV out of previously invisible one,
reload it's table - the is mandator for proper udev rule
processing as well as ensure content of dm table is correct.
TODO: this new generic rule probably make extra raid rules unnecessary.
Only policy 'smq' is meant to be used with format version 2.
Code used to let pass 'mq' policy also with format 2. But 'mq'
is obsoloted wth smq and kernel currently matches it. But this
is incompatible with older original mq logic - so disallow creation
of this rather useless combination.
If the tools for checking thin_pool or cache metadata are missing,
issue rather just a WARNING, but let the operation of activation
continue.
This has the advantage, the if user is missing those tools,
but he already started to use thinpool or cacheing, he can
access these volumes with a WARNING.
Also if the user is using too old tools i.e. for CacheV2 format
dmpd tool 0.7 is required - provide informative WARNING and
skip failure from older tool version which can't understand
new format V2.
In case a newly created RaidLV is blacklisted using config
\"activation { volume list = [ ... ] }\" (i.e. its SubLVs stay inactive),
the metadata SubLVs can't get wiped thus failing the creation.
As a result, the RaidLV together with its SubLVs
is left behind in an inconsistent state.
Fix by removing the RaidLV and provide a hint about volume_list reasoning.
Resolves: rhbz1161347
While prioritized_section() based on raised priority works
nicely for standard lvm comman - separate counter is actually needed
when it's used in daemons like clvmd/dmeventd where priority
stays raised all the time.
Detect we are in prioritezed section instead of critical one,
since these operation were supposed to NOT be happining during
whole set of operation.
This patch fixes verification of udev operations.
Introduce prioritized_section() as a closer match to previous logic
of critical_section() that has been held over longer sequence of
ioctl commands - essentially it's matching operation on a single
cookie.
While 'critical_section()' now corresponds to locked memory - we hold
this memory only between suspend/resume thus notion of 'cookie' was
lost.
This patch restores some logic unintentionaly lost with dropping
memory locking for just activation/deactivation calls.
With these read errors it's useful to know the reason.
Also avoid to log error just once so we know exactly
how many times we did failing read.
On the other hand reduce repeated log_error() on code 'backtrace'
path and change severity of message to just log_debug() so the
actual read error is printed once for one read.
Just like lvm2 has internal devices like _tdata which is using UUID with
suffix, there is similar private type of device for crypto device where
they are using CRYPT-TEMP uuid prefix.
Also ignore stratis.
Some kernel version suffer from bad state transition where a device
steps into 'frozen' mode. Any application that tries to read such
raid gets unfortunatelly bloked.
As some sort of protection try to skip such raid device from being
scanned to minimize chances to block lvm2 command on such scan.
When such device is found, warning gets printed.
RaidLVs on read_only_volume_list have their SubLVs
activated readonly thus disabling metadata updates
or image resynchronization/recovery. Bug also causes
automatic repairs to fail.
Fix by always activating the RAID SubLVs readwrite.
Resolves: rhbz1208269
Just like with lvcreate, this lvconvert case also need to properly
check which LV actually holds lock for cached origin - as it might
be i.e. thin-pool tdata subLV.
When snapshot is created in read-only mode with 'lvcreate -s -pr...',
lvm2 still needs to be able to write to layered -cow volume
to store metadata and exceptions blocks.
TODO: in some case we might be able to do full tree with read-only
volume but this probably needs futher validation:
1. checking snapshot header already exist
2. origin & snapshot are both in read-only mode.
Occasionaly users may need to peek into 'component devices.
Normally lvm2 does not let users activation component.
This patch adds special mode where user can activate
component LV in a 'read-only' mode i.e.:
lvchange -ay vg/pool_tdata
All devices can be deactivated with:
lvchange -an vg | vgchange -an....
If componet devices could be activated alone, ensure they are not breaking
common commands.
TODO: mostly likely this is not a definite list of all needed checks
and more will come later.
This is the 'last' place where a LV is present in metadata.
Any removed device should not be left active in dm table.
So this check is an extra validation protection to capture any
forgotten deactivation (adding 1 extra ioctl into lvremove path)
Introduce:
lv_is_component() check is LV is actually a component device.
lv_component_is_active() checking if any component device is active.
lv_holder_is_active() is any component holding device is active.
Instead of checking with existing size of external origin LV,
use correctly the new 'wanted' size of this LV whether it fits
the limitiation requirements for older thin-pool target.
Otherwise code started to the the resize, updates metadata and
just fails during 'resize' in case the LV was active. For
inactive LV operation could have actually passed.
Checking here for cache_pool is not necessary and in effect
the check is not even right - since there are internal
states that do allow to active such LV.
Fix missing 'externalLV' traversing for thins with external origins.
Replace extra for_each_sub_lv_except_pools() with better
internal logic allowing selectively to cut of processed subLV tree.
Extend error code for function 'fn()' when it returns -1 it will
stop futher tree scan for given LV.
Also a bit simplify code to have only one place that
is calling 'fn()' and use level counter to know
depth of traversing.
Update renaming travering to skip trees for pools
and external origins.
While 'file-locking' code always dropped cached VG before
lock was taken - other locking types actually missed this.
So while the cache dropping has been implement for i.e. clvmd,
actually running command in cluster keept using cache even
when the lock has been i.e. dropped and taken again.
This rather 'hard-to-hit' error was noticable in some
tests running in cluster where content of PV has been
changed (metadata-balance.sh)
Fix the code by moving cache dropping directly lock_vol() function.
TODO: it's kind of strange we should ever need drop_cached_metadata()
used in several places - this all should happen automatically
this some futher thinking here is likely needed.
So this is a bit more complex and possibly worth futher checking.
ATM clvmd drops cmd->mem mempool AFTER refresh of cmd.
So anything allocating from cmd->mem during toolcontext init
will likely die at some point in time.
As a quick fix - just use regular malloc/free for 'dso' alloction.
It's worth to note - cmd->libmem seems to be often misused
causing hidden memleaking for clvmd.
Build dso plugin name during segtype initialisation and just
use the string during command life-time.
Also slightlt update message verbosity and make it very_verbose
when operation is going to be made and 'verbose' when it's done.
Avoid using same return code for reporting 2 different things
and stricly report error code by return value and add new
parameter for reporting monitoring status.
This makes easier to recognize which error we got from dm_event
and continue only with ENOENT.
With pthreaded daemons like 'dmeventd' using liblvm via plugin,
lvm2 actually should not 'play' with streams at all - as there
could be parallel outputs running.
As a current quick workaround just disable change for pthreaded
program (gettid() != getpid()).
TODO: it's possible the change of buffering actually doesn't serve us
any measurable benefit and could be dropped as whole later...
Meanwhile this patch is fixing this occasional valgrind race report:
Invalid read of size 4
at 0x571892C: vfprintf (in /usr/lib64/libc-2.26.9000.so)
by 0x57216B3: fprintf (in /usr/lib64/libc-2.26.9000.so)
by 0x5042886: dm_event_log (libdevmapper-event.c:925)
by 0x10B015: _dmeventd_log (dmeventd.c:125)
by 0x10D289: _unregister_for_event (dmeventd.c:1146)
by 0x10E52E: _handle_request (dmeventd.c:1583)
by 0x10E6D7: _do_process_request (dmeventd.c:1631)
by 0x10E7C6: _process_request (dmeventd.c:1660)
by 0x1101A4: main (dmeventd.c:2285)
Address 0x6264d30 is 192 bytes inside a block of size 552 free'd
at 0x4C2ED68: free (vg_replace_malloc.c:530)
by 0x573907D: fclose@@GLIBC_2.2.5 (in /usr/lib64/libc-2.26.9000.so)
by 0x6AC5C00: reopen_standard_stream (log.c:189)
by 0x6A8E62C: destroy_toolcontext (toolcontext.c:2271)
by 0x6BA5C22: lvm_fin (lvmcmdline.c:3339)
by 0x6BD5EF3: lvm2_exit (lvmcmdlib.c:123)
by 0x6856013: dmeventd_lvm2_exit (dmeventd_lvm.c:103)
by 0x66535B8: unregister_device (dmeventd_thin.c:432)
by 0x10CBBC: _do_unregister_device (dmeventd.c:926)
by 0x10CD74: _monitor_unregister (dmeventd.c:979)
by 0x10D094: _monitor_thread (dmeventd.c:1066)
by 0x54B35E0: start_thread (in /usr/lib64/libpthread-2.26.9000.so)
by 0x57C30EE: clone (in /usr/lib64/libc-2.26.9000.so)
Block was alloc'd at
at 0x4C2DBBB: malloc (vg_replace_malloc.c:299)
by 0x573932B: fdopen@@GLIBC_2.2.5 (in /usr/lib64/libc-2.26.9000.so)
by 0x6AC5DC2: reopen_standard_stream (log.c:200)
by 0x6A8D11D: create_toolcontext (toolcontext.c:1898)
by 0x6BA5B6B: init_lvm (lvmcmdline.c:3319)
by 0x6BD5BC8: cmdlib_lvm2_init (lvmcmdlib.c:34)
by 0x6BD5F04: lvm2_init (lvm2cmd.c:20)
by 0x6855EA7: dmeventd_lvm2_init (dmeventd_lvm.c:67)
by 0x665305F: register_device (dmeventd_thin.c:352)
by 0x10CB7A: _do_register_device (dmeventd.c:916)
by 0x10CEE4: _monitor_thread (dmeventd.c:1006)
by 0x54B35E0: start_thread (in /usr/lib64/libpthread-2.26.9000.so)
by 0x57C30EE: clone (in /usr/lib64/libc-2.26.9000.so)
....
Process terminating with default action of signal 6 (SIGABRT): dumping core
at 0x570016B: raise (in /usr/lib64/libc-2.26.9000.so)
by 0x5701520: abort (in /usr/lib64/libc-2.26.9000.so)
by 0x57437D8: __libc_message (in /usr/lib64/libc-2.26.9000.so)
by 0x5743831: __libc_fatal (in /usr/lib64/libc-2.26.9000.so)
by 0x5744056: _IO_vtable_check (in /usr/lib64/libc-2.26.9000.so)
by 0x574751C: __overflow (in /usr/lib64/libc-2.26.9000.so)
by 0x574191A: fputc (in /usr/lib64/libc-2.26.9000.so)
by 0x50428E3: dm_event_log (libdevmapper-event.c:934)
by 0x10B015: _dmeventd_log (dmeventd.c:125)
by 0x10D289: _unregister_for_event (dmeventd.c:1146)
by 0x10E52E: _handle_request (dmeventd.c:1583)
by 0x10E6D7: _do_process_request (dmeventd.c:1631)
by 0x10E7C6: _process_request (dmeventd.c:1660)
by 0x1101A4: main (dmeventd.c:2285)
In fact pvmove does support 'clustered-core' target for clustered
pvmove of LVs activated on multiple nodes.
This patch restores support for activation of pvmove on all nodes
for LVs that are also activate on all nodes.
Actually the removed code is necessary - since not all writes are
getting alligned buffer - older compilers seems to be not able
to create 4K aligned buffers on stack - this the aligning code still
need to be present for write path.
Add protectional internall error whenever we spot activation
of 'exclusive' only segments in 'non-exclusive' mode.
TODO: possibly the activation locking could be enhanced to handle
this fully behind the scene - as for now this works purely for
lvchange/vgchange activation.
Use properly exclusive activation when reactivating origin after
snapshot merge (since origin must have been previously also exlusively
activated).
Same applies when converting volumes to thin-pool or cache.
Previously used 'only' local activation incorrectly allowed local
activation of some targets (i.e. raid) - thus 'leaking' chance to
activate same device on another node - which can be a problem
for device types like raid.
No longer use the external 'result' pointer internally to set up the
cached label. The callback _set_label_read_result() is now given the
internal label pointer directly
Callers that don't need the result are no longer required to pass a
label pointer into label_read().
If the data being requested is present in last_[extra_]devbuf,
return that directly instead of reading it from disk again.
Typical LVM2 access patterns request data within two adjacent 4k blocks
so we eliminate some read() system calls by always reading at least 8k.
Callers that read larger amounts of data now get a pointer to read-only
data directly without copying it through an intermediate buffer. This
data is owned by the device layer so the callers no longer free it.
If it obtains the data, it passes it into the supplied callback function
and returns 1. Otherwise the callback receives failed = 1.
Updated config_file_read_fd to use this and similarly return the data
via a callback fn of its own.
Dedicated functions are now used to process each piece of data obtained,
so the refactoring in this file gives us one for the vgsummary and one
for the metadata header. This new type of function takes two parameters
(for now), the obtained data plus a single struct (that must not
reference any data on the stack) that wraps up the entire context needed
to process it.
Rename dev_read() to dev_read_buf() - the function that reads data
into a supplied buffer.
Introduce a new dev_read() that allocates the buffer it returns and
switch the important users over to this. No caller may change the
returned data. (For now, callers are responsible for freeing it after
use, but later the device layer will take full ownership.)
dev_read_buf() should only be used for tiny buffers or unimportant code
(such as the old disk formats).
The creation of wrapped around metadata - where the start of metadata is
written up to the end of the buffer and the remainder follows back at
the start of the buffer - is now restricted to cases where writing the
metadata in one piece wouldn't fit. This shouldn't happen in 'normal'
usage so let's begin treating the code for this as a special case that
can be ignored when optimising 'normal' cases.
If there is sufficient space in the metadata area, align the next
metadata to a disk offset that is a multiple of 4096 bytes and
don't write it circularly. If it doesn't all fit at the end
of the metadata area, go back to the start and write it all there
contiguously.
If there is insufficient space to use the new stricter rules, revert to
the original behaviour, aligning on 512-byte boundaries wrapping around
the circular buffer as required.
Even after writing some metadata encountered problems, some commands
continue (rightly or wrongly) and attempt to make further changes.
Once an mda is marked MDA_FAILED, don't try to use it again.
This also applies when reverting, where one loop already skips
failed mdas but the other doesn't.
This fixes some device open_count warnings on relevant failure paths.
Use new ALIGN_ABSOLUTE macro when calculating the start location
of new metadata and adjust the end of buffer detection so that
there is no longer an imposed gap between old and new metadata.
Currently both start and offset should always be divisible by alignment,
so this should have no effect, but a later patch will increase alignment
so these variables can no longer be optimised out.
Although it doesn't look like it can be a measurable problem
and costs some time to flip priorities outside of activation window.
So just like with memory locking preserve priority until call
memlock_unlock() appears.
(addition to commit c086dfadc3).
Expand out the metadata wrapping calculations to prepare
to support a larger alignment.
The current alignment is 512 bytes so
(mdac_area_start + rlocn->offset) % alignment is zero.
Mark the first metadata area on each text format PV as MDA_PRIMARY.
Pass this information down to the device layer so that when
there are two metadata areas on a block device, we can easily
distinguish two independent streams of I/O.
In case of failed legs, raid replaces those with
e.g. "vg-lv_rimage_0-missing_0_0" mapped to an error target.
Those errouneously remain on deactivation.
Fix by removing them on deactivation/removal of the RaidLV.