1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-01-18 10:04:20 +03:00

6205 Commits

Author SHA1 Message Date
Zdenek Kabelac
2e51535b18 thin: activate layer pool aas read-only LV
When lvm2 is activating layered pool LV (to basically keep pool opened,
the other function used to be 'locking' be in sync with DM table)
use this LV in read-only mode - this prevents 'write' access into
data volume content of thin-pool.

Note: since EMPTY/unused thin-pool is created as 'public LV' for generic
use by any user who i.e. wish to maintain thin-pool and thins himself.
At this moment, thin-pool appears as writable LV.  As soon as the 1st.
thinLV is created, layer volume will appear is 'read-only' LV from this moment.
2019-09-17 15:29:11 +02:00
Zdenek Kabelac
2caae8a572 devices: crypto skip
Devices with UUID signature CRYPT-SUBDEV are internal crypto devices.
2019-09-17 15:29:11 +02:00
David Teigland
6b12930860 pvscan: fix activation of incomplete VGs
For a long time there has been a bug in the activation
done by the initial pvscan (which scans all devs to
initialize the lvmetad cache.)  It was attempting to
activate all VGs, even those that were not complete.

lvmetad tells pvscan when a VG is complete, and pvscan
needs to use this information to decide which VGs to
activate.

When there are problems that prevent lvmetad from being
used (e.g. lvmetad is disabled or not running), pvscan
activation cannot use lvmetad to determine when a VG
is complete, so it now checks if devices are present
for all PVs in the VG before activating.

(The recent commit "pvscan: avoid redundant activation"
could make this bug more apparent because redundant
activations can cover up the effect of activating an
incomplete VG and missing some LV activations.)
2019-09-03 15:53:07 -05:00
Zdenek Kabelac
ad86cda4d7 activation: use cmd pending mem for pending_delete
Since we need to preserve allocated strings across 2 separate
activation calls of '_tree_action()' we need to use other mem
pool them dm->mem - but since cmd->mem is released between
individual lvm2 locking calls, we rather introduce a new separate
mem pool just for pending deletes with easy to see life-span.
(not using 'libmem' as it would basicaly keep allocations over
the whole lifetime of clvmd)

This patch is fixing previous commmit where the memory was
improperly used after pool release.
2019-08-27 15:59:33 +02:00
Marian Csontos
36523a398d cov: Fix a leak 2019-08-27 12:23:13 +02:00
David Teigland
8bcd482cc5 pvscan: avoid redundant activation
Use temp files in /run/lvm/vgs_online/ to keep track of when
a VG has been autoactivated by pvscan.  When pvscan autoactivates
a VG, it creates a temp file with the VG's name.  Before a
subsequent pvscan tries to autoactivate the same VG, it checks
if a temp file exists for the VG name, and if so it skips it.

This can commonly happen when many devices appear on the system
at once, which generates several concurrent pvscans.  In this case
the first pvscan does initialization by scanning all devices and
activating any complete VGs.  The other pvscans would attempt to
activate the same complete VGs again.  This extra work could
create a bottleneck of pvscan commands.

If a VG is deactivated by vgchange, the vg online file is removed.
If PVs are then disconnected/reconnected, pvscan will again
autoactivate the VG.

Also, this patch disables the VG refresh that could be called from
pvscan --cache -aay if lvmetad detects metadata inconsistencies.
The role of pvscan should be limited to basic autoactivation, and
any refresh scenarios are special cases that are not appropriate
for automation.

The warning printed by commands retrying an lvmetad connection
has been reduced to once every 10 seconds.  New output messages
have been added to pvscan to record when pvscan is falling back
to direct activation of all VGs.
2019-08-26 16:25:18 -05:00
Zdenek Kabelac
e03bdd7556 lv_manip: add synchronizations
New udev in rawhide seems to be 'dropping' udev rule operations for devices
that are no longer existing - while this is 'probably' a bug - it's
revealing moments in lvm2 that likely should not run in a single
transaction and we should wait for a cookie before submitting more work.

TODO: it seem more 'error' paths should always include synchronization
before starting deactivating 'just activated' devices.
We should probably figure out some 'automatic' solution for this instead
of placing sync_local_dev_name() all over the place...
2019-08-26 15:36:41 +02:00
Zdenek Kabelac
c4a6b9ded0 cache: improve vgremove loop
Support internal removal of 'cache origin' volume - which we
do not normally expose to a user - however internal processing
loops may hit this condition (depending on order of list LVs).

So when this operation is internally requested - we automatically
try to remove it's 'holding' LV (cache LV) - which will also
remove the origin.
2019-08-26 15:36:41 +02:00
Zdenek Kabelac
4743c4900d snapshot: always activate
Drop the 'cluster-only' optimization so we do resume ALL device
before we try to wait on cookie before 'removal' operation.

It's more correct order of operation - alhtough possibly slightly
less efficient - but until we have correct list of operations
'in-progress' we can't do anything better.
2019-08-26 15:36:41 +02:00
Zdenek Kabelac
c6e079cda3 activation: extend handling of pending_delete
With previous patch 30a98e4d6710a543692d40d11428ae4baea11b7b we
started to put devices one pending_delete list instead
of directly scheduling their removal.

However we have operations like 'snapshot merge' where we are
resuming device tree in 2 subsequent activation calls - so
1st such call will still have suspened devices and no chance
to push 'remove' ioctl.

Since we curently cannot easily solve this by doing just single
activation call (which would be preferred solution) - we introduce
a preservation of pending_delete via command structure and
then restore it on next activation call.

This way we keep to remove devices later - although it might be
not the best moment - this may need futher tunning.

Also we don't keep the list of operation in 1 trasaction
(unless we do verify udev symlinks) - this could probably
also make it more correct in terms of which 'remove' can
be combined we already running 'resume'.
2019-08-26 15:36:41 +02:00
David Teigland
f55b8e387f devices: put ifdef around BLKPBSZGET
BLKPBSZGET is not defined before kernel version 2.6.32
(e.g. rhel5)
2019-08-20 09:32:26 -05:00
Zdenek Kabelac
ba629ceea1 activation: add synchronization point
Resuming of 'error' table entry followed with it's dirrect removal
is now troublesame with latest udev as it may skip processing of
udev rules for already 'dropped' device nodes.

As we cannot 'synchronize' with udev while we know we have devices
in suspended state - rework 'cleanup' so it collects nodes
for removal into pending_delete list and process the list with
synchronization once we are without any suspended nodes.
2019-08-20 12:59:05 +02:00
Zdenek Kabelac
73d1646a00 pvmove: correcting read_ahead setting
When pvmove is finished, we do a tricky operation since we try to
resume multiple different device that were all joined into 1 big tree.

Currently we use the infromation from existing live DM table,
where we can get list of all holders of pvmove device.
We look for these nodes (by uuid) in new metadata, and we do now a full
regular device add into dm tree structure.  All devices should be
already PRELOAD with correct table before entering suspend state,
however for correctly working readahead we need to put correct info
also into RESUME tree.  Since table are preloaded, the same table
is skip and resume, but correct read ahead is now set.
2019-08-20 12:59:05 +02:00
David Teigland
7550665ba4 Fix rounding writes up to sector size
Do this at two levels, although one would be enough to
fix the problem seen recently:

- Ignore any reported sector size other than 512 of 4096.
  If either sector size (physical or logical) is reported
  as 512, then use 512.  If neither are reported as 512,
  and one or the other is reported as 4096, then use 4096.
  If neither is reported as either 512 or 4096, then use 512.

- When rounding up a limited write in bcache to be a multiple
  of the sector size, check that the resulting write size is
  not larger than the bcache block itself.  (This shouldn't
  happen if the sector size is 512 or 4096.)
2019-07-25 17:06:43 -05:00
Zdenek Kabelac
721a172edf cov: avoid recursive self-inclusion
Include: toolcontext.h -> dev-type.h -> label.h -> toolcontext.h
Replace with struct predeclaration.
2019-07-06 01:24:28 +02:00
Zdenek Kabelac
23478d9d21 cov: release iterator on error path
Another missed release on error path.
2019-07-06 01:24:28 +02:00
Zdenek Kabelac
b0e1019add gcc: clean uninitialized var warning
Some older gcc versions shows this (FP) warning:
label/label.c:360: warning: â€sector’ may be used uninitialized in this function
2019-07-06 01:24:28 +02:00
Zdenek Kabelac
9aedf68c32 gcc: cleanup warning of shawhing a global dclr
misc/lvm-globals.c:76: warning: declaration of â€use_aio’ shadows a global declaration
misc/lvm-globals.h:63: warning: shadowed declaration is here
2019-07-06 01:24:28 +02:00
Zdenek Kabelac
b62c0787de cov: remove unused headers 2019-06-25 17:34:56 +02:00
Zdenek Kabelac
7232458b6c cov: validate pagesize is not negative
As _init_free_list() cannot accept negative numbers
2019-06-25 17:33:47 +02:00
Zdenek Kabelac
8bea252a63 cov: clearer condition check
Make the code more obvious.
2019-06-25 17:33:47 +02:00
Zdenek Kabelac
66665f5e42 cov: add stack tracing for error paths
Add missing stack reports on error paths.
2019-06-25 17:33:25 +02:00
Zdenek Kabelac
2cd6cd3439 cov: check lv_info
Use lv_info results only when valid.
2019-06-25 17:32:44 +02:00
Zdenek Kabelac
82e7426028 cov: check result of dev_get_block_size 2019-06-25 17:32:44 +02:00
Zdenek Kabelac
862899cc88 cov: check result of dev_read_bytes 2019-06-25 17:32:44 +02:00
Zdenek Kabelac
09aafb61e3 cov: release iterator on error path 2019-06-25 17:32:44 +02:00
Zdenek Kabelac
d6bce03615 mirror: fix monitoring change
Commit a8921be641afe865c177e11b8859f4b937f76995 was supposedly a fix
for unwanted table reload - however before final commit, the tiny
change has been made that was believed to be an enhancment
of original prepared patch. Unfortunatelly the function
locking_is_clustered is not meant to be an equivalent test
of original function.  Drop this change and using
original patch.
2019-06-19 13:33:53 +02:00
David Teigland
c31e6b0aca lvmcache: remove unused_duplicate_devs list from cmd
Save the previous duplicate PVs in a global list instead
of a list on the cmd struct.  dmeventd reuses the cmd struct
for multiple commands, and the list entries between commands
were being freed (apparently), causing a segfault in dmeventd
when it tried to use items in cmd->unused_duplicate_devs
that had been saved there by the previous command.
2019-06-06 14:12:19 -05:00
Zdenek Kabelac
adf9bf80a3 cache: support no_discard_passdown
Recent kernel version from kernel commit:
de7180ff908b2bc0342e832dbdaa9a5f1ecaa33a
started to report in cache status line new flag:
no_discard_passdown

Whenever lvm spots unknown status it reports:
Unknown feature in status:

So add reconginzing this feature flag and also report this with

'lvs -o+kernel_discards'

When no_discard_passdown is found in status 'nopassdown' gets reported
for this field  (roughly matching what we report for thin-pools).
2019-06-05 15:54:58 +02:00
David Teigland
3669c33af4 lvmlockd: do not allow mirror LV to be activated shared
This reverts 518a8e8cfbb672c2bf5e3455f1fe7cd8d94eb5b0
  "lvmlockd: activate mirror LVs in shared mode with cmirrord"

because while activating a mirror LV with cmirrord worked,
changes to the active cmirror did not work.
2019-05-21 15:06:37 -05:00
Zdenek Kabelac
a8921be641 monitoring: fix monitoring change for cluster
Fix bug in table reload for clustered VG.
Function used in lv/vgchange --monitor y|n is using 'somewhat' hackish
shortcut and accesses activation function monitor_dev_for_events()
directly rather through full device refresh to avoid table reload
in case user want to only change monitoring state of device.
However since with old 'mirrors' there is table change dropping
handle_error there was in some cases needed table update.

This was put into monitor_dev_for_events() with assumption
vg_write_lock_held() could be used to decide if the actual monitoring
change comes from read-only lock-holding commands lvchange/vgchange.
However the clustered locking part (clvmd) actually doesn't differentiate
about this concept of having VG openned in read-only or write mode.
So it caused unwanted reloads of mirror tables even in commands like lvconvert.

Thus for clustered locking there is full table reload put into the code and
shortcut applies only for non-clustered locking.

Also the patch tries to avoid calling repeated LV refresh in case the
lv/vgchange uses  --refresh & --monitor.
2019-05-06 15:57:52 +02:00
Zdenek Kabelac
e79e092f8b lv_manip: better work with PERCENT_VG modifier
When using 'lvcreate -l100%VG' and there is big disproportion between
real available space and requested setting - automatically fallback
to 100%FREE.

Difference can be seen when VG is big and already most space was
allocated, so the requestion 100%VG can end (and by spec for % modifier
it's correct) as LV with size of 1%VG.  Usually this is not a big
problem - buit in some cases - like cache-pool allocation, this
can result a big difference for chunksize selection.

With this patch it's more closely match common-sense logic without
the need of reitteration of too big changes in lvm2 core ATM.

TODO: in the future there should be allocator solving all allocations
in a single call.
2019-04-30 12:24:53 +02:00
Zdenek Kabelac
4729b4af0b thin: select chunk size as power of 2
Whenever thin-pool chunk size is unspecified and left for lvm calculation
try to select the size as nearest highest power-of-2 instead of
just being a multiple of 64KiB. When multiple is bigger then 1MiB,
keep using 1MiB multiple.
2019-04-30 12:11:50 +02:00
Zdenek Kabelac
f3be66c002 lv_manip: insert remove layer skips pools
Fixing renaming of subLVs when removing and inserting layers - this
got visible when using stacked VDO pools.
2019-04-30 12:08:36 +02:00
Zdenek Kabelac
2047d405af activation: synchronize before removing devices
Udev is running udev-rule action upon 'resume'.

However lvm2 in special case is doing replacement of
'soon-to-be-removed' device with 'error' target for resuming
and then follows actual removal - the sequence is usually quick,
so when udev start action - it can result in 'strange' error
message in kernel log like:

Process '/usr/sbin/dmsetup info -j 253 -m 17 -c --nameprefixes --noheadings --rows -o name,uuid,suspended' failed with exit code 1.

To avoid this - we need to ensure there is synchronization wait for udev
between 'resume'  and 'remove' part of this process.

However existing code put strict requirement to avoid synchronizing with
udev inside critical section - but this originally came from requirement
to not do anything special while there could be devices in
suspend-state. Now we are able to see differnce between critical section
with or without suspended devices.  For udev synchronization only
suspended devices are prohibited to be there - so slightly relax
condition and allow calling and using 'fs_sync()' even inside critical
section - but there must not be any suspended device.
2019-04-30 12:07:42 +02:00
Zdenek Kabelac
f38cfd09c4 thin: fix maintenance of _pmspare
When metadata grows lvm2 may need to extend also _pmspare volume.
2019-04-30 12:02:49 +02:00
Zdenek Kabelac
515867bbad thin: resize metadata with data
When data are growing, adapt also size of metadata.
As we get way too many reports from users doing huge growths of
data portion while keep metadata small and avoiding using monitoring.

So to enhance the user-experience in case user requests grown of
thin-pool (without passing PV list for growth) - lvm2 will automaticaly
grown also the metadata part of thin-pool (if possible).
2019-04-30 12:02:49 +02:00
Zdenek Kabelac
bcf1aa99e6 thin: introduce estimate_thin_pool_metadata_size
Add function for estimation of thin-pool metadata size for given size of
data. Function is using already existing internal API so it can
be reused for resize of thin-pool data.
2019-04-30 12:02:49 +02:00
David Teigland
559cf0cd1e devices: drop open error message
This open error is being printed in more common,
non-error circumstances than expected.  After a
number of complaints make it only a debug message.
2019-04-23 09:42:25 -05:00
Marian Csontos
f5d1f4f086 Wiping require exclusive actvation
The master branch uses activate_lv only, but on the stable branch
activate_lv_excl_local was used. This patch restores the constraints.

Fix regression from bad cherry pick: 9b04851fc574ce9cffd30a51d2b750955239f316
2019-04-16 08:26:34 +02:00
Zdenek Kabelac
6064b9f1b2 gcc: cleanup const warning 2019-04-10 13:30:34 +02:00
Marian Csontos
b79f1e176f bcache: Fix memory leak 2019-04-04 10:19:15 +02:00
Heinz Mauelshagen
9b04851fc5 raid: fix (de)activation of RaidLVs with visible SubLVs
There's a small window during creation of a new RaidLV when
rmeta SubLVs are made visible to wipe them in order to prevent
erroneous discovery of stale RAID metadata.  In case a crash
prevents the SubLVs from being committed hidden after such
wiping, the RaidLV can still be activated with the SubLVs visible.
During deactivation though, a deadlock occurs because the visible
SubLVs are deactivated before the RaidLV.

The patch adds _check_raid_sublvs to the raid validation in merge.c,
an activation check to activate.c (paranoid, because the merge.c check
will prevent activation in case of visible SubLVs) and shares the
existing wiping function _clear_lvs in raid_manip.c moved to lv_manip.c
and renamed to activate_and_wipe_lvlist to remove code duplication.
Whilst on it, introduce activate_and_wipe_lv to share with
(lvconvert|lvchange).c.

Resolves: rhbz1633167
(cherry picked from commit dd5716ddf258c4a44819fa90d3356833ccf767b4)

Conflicts:
	WHATS_NEW
	lib/activate/activate.c
	lib/metadata/lv_manip.c
	lib/metadata/raid_manip.c
	tools/lvchange.c
	tools/lvconvert.c
2019-03-21 08:05:23 +01:00
David Teigland
dcf8f3111a pvscan: lvmetad init should set updating before scanning
When pvscan needs to initialize lvmetad (e.g. lvmetad has just
started and is empty), it should set the lvmetad state to "updating"
before it scans any devices.  Otherwise, many parallel pvscans
will try to initialize lvmetad, and in some cases earlier pvscans
with fewer devices information may replace newer pvscans with
more recent information.
2019-03-07 11:07:27 -06:00
David Teigland
ece0b131e5 config: improve scan_lvs description 2019-03-06 13:38:33 -06:00
Zdenek Kabelac
e974f6866a cleanup: move cast to det_t into MKDEV macro
(cherry picked from commit aa8b2d6a0feb91bb5ea4364cdc53a00dfa233dca)

Conflicts:
	daemons/clvmd/clvmd-common.h
	device_mapper/ioctl/libdm-iface.c
	device_mapper/libdm-common.c
	device_mapper/libdm-deptree.c
2019-03-05 12:39:17 +01:00
Zdenek Kabelac
a93699ece9 cov: remove unused assigns
(cherry picked from commit 70e3d0a613fb53e52f7a7cb31d65bcc2fa7ab738)

Conflicts:
	tools/pvscan.c
	tools/vgchange.c
2019-03-05 12:28:31 +01:00
David Teigland
590a1ebcf7 io: increase the default io memory from 4 to 8 MiB
This is the default bcache size that is created at the
start of the command.  It needs to be large enough to
hold a single copy of metadata for a given VG, or the
VG cannot be read or written (since the entire VG would
not fit into available memory.)

Increasing the default reduces the chances of anyone
needing to increase the default to use their VG.

The size can be set in lvm.conf global/io_memory_size;
the lower limit is 4 MiB and the upper limit is 128 MiB.
2019-03-04 11:18:34 -06:00
David Teigland
863a2e693e io: warn when metadata size approaches io memory size
When a single copy of metadata gets within 1MB of the
current io_memory_size value, begin printing a warning
that the io_memory_size should be increased.
2019-03-04 10:57:52 -06:00
David Teigland
8dbfdb5b73 config: add new setting io_memory_size
which defines the amount of memory that lvm will allocate
for bcache.  Increasing this setting is required if it is
smaller than a single copy of VG metadata.
2019-03-04 10:31:47 -06:00