1
0
mirror of git://sourceware.org/git/lvm2.git synced 2025-12-15 08:23:51 +03:00
Commit Graph

5736 Commits

Author SHA1 Message Date
David Teigland
a761f9fc9d label_scan/vg_read: use label_read_data to avoid disk reads
The new label_scan() function reads a large buffer of data
from the start of the disk, and saves it so that multiple
structs can be read from it.  Previously, only the label_header
was read from this buffer, and the code which needed data
structures that immediately followed the label_header would
read those from disk separately.  This created a large
number of small, unnecessary disk reads.

In each place that the two read paths (label_scan and vg_read)
need to read data from disk, first check if that data is
already available from the label_read_data buffer, and if
so just copy it from the buffer instead of reading from disk.

Code changes
------------

- passing the label_read_data struct down through
  both read paths to make it available.

- before every disk read, first check if the location
  and size of the desired piece of data exists fully
  in the label_read_data buffer, and if so copy it
  from there.  Otherwise, use the existing code to
  read the data from disk.

- adding some log_error messages on existing error paths
  that were already being updated for the reasons above.

- using similar naming for parallel functions on the two
  parallel read paths that are being updated above.

  label_scan path calls:
  read_metadata_location_summary, text_read_metadata_summary

  vg_read path calls:
  read_metadata_location_vg, text_read_metadata_file

  Previously, those functions were named:

  label_scan path calls:
  vgname_from_mda, text_vgsummary_import

  vg_read path calls:
  _find_vg_rlocn, text_vg_import_fd

I/O changes
-----------

In the label_scan path, the following data is either copied
from label_read_data or read from disk for each PV:

- label_header and pv_header
- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_summary)
- vg metadata (in config_file_read_fd)

Total of 4 reads per PV in the label_scan path.

In the vg_read path, the following data is either copied from
label_read_data or read from disk for each PV:

- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_vg)
- vg metadata (in config_file_read_fd)

Total of 3 reads per PV in the vg_read path.

For a common read/reporting command, each PV will be:

- read by the command's initial lvmcache_label_scan()
- read by lvmcache_label_rescan_vg() at the start of vg_read()
- read by vg_read()

Previously, this would cause 11 synchronous disk reads per PV:
4 from lvmcache_label_scan(), 4 from lvmcache_label_rescan_vg()
and 3 from vg_read().

With this commit's optimization, there are now 2 async disk reads
per PV: 1 from lvmcache_label_scan() and 1 from
lvmcache_label_rescan_vg().

When a second mda is used on a PV, it is located at the
end of the PV.  This second mda and copy of metadata will
not be found in the label_read_data buffer, and will always
require separate disk reads.
2017-10-27 12:32:44 -05:00
David Teigland
654525b33d independent metadata areas: fix bogus code
Fix mixing bitwise & and logical && which was
always 1 in any case.
2017-10-27 12:13:25 -05:00
David Teigland
2773767924 label_scan: fix independent metadata areas
This fixes the use of lvmcache_label_rescan_vg() in the previous
commit for the special case of independent metadata areas.

label scan is about discovering VG name to device associations
using information from disks, but devices in VGs with
independent metadata areas have no information on disk, so
the label scan does nothing for these VGs/devices.
With independent metadata areas, only the VG metadata found
in files is used.  This metadata is found and read in
vg_read in the processing phase.

lvmcache_label_rescan_vg() drops lvmcache info for the VG devices
before repeating the label scan on them.  In the case of
independent metadata areas, there is no metadata on devices, so the
label scan of the devices will find nothing, so will not recreate
the necessary vginfo/info data in lvmcache for the VG.  Fix this
by setting a flag in the lvmcache vginfo struct indicating that
the VG uses independent metadata areas, and label rescanning should
be skipped.

In the case of independent metadata areas, it is the metadata
processing in the vg_read phase that sets up the lvmcache
vginfo/info information, and label scan has no role.
2017-10-27 12:13:25 -05:00
David Teigland
606dc06e43 label_scan: move to start of command
LVM's general design for scanning/reading of metadata from disks is
that a command begins with a discovery phase, called "label scan",
in which it discovers which devices belong to lvm, what VGs exist on
those devices, and which devices are associated with each VG.
After this comes the processing phase, which is based around
processing specific VGs.  In this phase, lvm acquires a lock on
the VG, and rescans the devices associated with that VG, i.e.
it repeats the label scan steps on the devices in the VG in case
something has changed between the initial label scan and taking
the VG lock.  This ensures that the command is processing the
lastest, unchanging data on disk.

This commit moves the location of these label scans to make them
clearer and avoid unnecessary repeated calls to them.

Previously, the initial label scan was called as a side effect
from various utility functions.  This would lead to it being called
unnecessarily.  It is an expensive operation, and should only be
called when necessary.  Also, this is a primary step in the
function of the command, and as such it should be called prominently
at the top level of command processing, not as a hidden side effect
of a utility function.  lvm knows exactly where and when the
label scan needs to be done.  Because of this, move the label scan
calls from the internal functions to the top level of processing.

Other specific instances of lvmcache_label_scan() are still called
unnecessarily or unclearly by specific commands that do not use
the common process_each functions.  These will be improved in
future commits.

During the processing phase, rescanning labels for devices in a VG
needs to be done after the VG lock is acquired in case things have
changed since the initial label scan.  This was being done by way
of rescanning devices that had the INVALID flag set in lvmcache.
This usually approximated the right set of devices, but it was not
exact, and obfuscated the real requirement.  Correct this by using
a new function that rescans the devices in the VG:
lvmcache_label_rescan_vg().

Apart from being inexact, the rescanning was extremely well hidden.
_vg_read() would call ->create_instance(), _text_create_text_instance(),
_create_vg_text_instance() which would call lvmcache_label_scan()
which would call _scan_invalid() which repeats the label scan on
devices flagged INVALID.  lvmcache_label_rescan_vg() is now called
prominently by _vg_read() directly.
2017-10-27 12:13:25 -05:00
David Teigland
cfe35ec877 label_scan: call new label_scan from lvmcache_label_scan
To do label scanning, lvm code calls lvmcache_label_scan().
Change lvmcache_label_scan() to use the new label_scan()
which can use async io, rather than implementing its own
dev iter loop and calling the synchronous label_read() on
each device.

Also add lvmcache_label_rescan_vg() which calls the new
label_scan_devs() which does label scanning on only the
specified devices.  This is for a subsequent commit and
is not yet used.
2017-10-27 12:13:25 -05:00
David Teigland
2530dc4679 label_scan: add new implementation for async and sync
This adds the implementation without using it in the code.
The code still calls label_read() on each individual device
to do scanning.

Calling the new label_scan() will use async io if async io is
enabled in config settings.  If not enabled, or if async io fails,
label_scan() will fall back to using synchronous io.  If only some
aio ops fail, the code will attempt to perform synchronous io on just
the ios that failed.

Uses linux native aio system calls, not the posix wrappers which
are messier and may not have all the latest linux capabilities.

Internally, the same functionality is used before:

- iterate through each visible device on the system,
  provided from from dev-cache

- call _find_label_header on the dev to find the sector
  containing the label_header

- call _text_read to look at the pv_header and mda locations
  after the pv_header

- for each mda location, read the mda_header and the vg metadata

- add info/vginfo structs to lvmcache which associate the
  device name (info) with the VG name (vginfo) so that vg_read
  can know which devices to read for a given VG name

The new label scanning issues a "large" read beginning at the start
of the device, where large is configurable, but intended to cover
all the labels/headers/metadata that is located at the start of
the device.  This large data buffer from each device is saved in a
global list using a new 'label_read_data' struct.  Currently, this
buffer is only used to find the label_header from the first four
sectors of the device.  In subsequent commits, other functions
that read other structs/metadata will first try to find that data in
the saved label_read_data buffer.  In most common cases, the data
they need can simply be copied out of the existing buffer, and
they can avoid issuing another disk read to get it.
2017-10-27 12:13:25 -05:00
David Teigland
8c08b509fa command: add settings to enable async io
There are config settings to enable aio, and to configure
the concurrency and read size.
2017-10-27 12:13:21 -05:00
David Teigland
a41ffd52d6 io: add low level async io support
The interface consists of:

- A context struct, one for the entire command.

- An io struct, one per io operation (read).

- dev_async_context_setup() creates an aio context.

- dev_async_context_destroy() destroys an aio context.

- dev_async_alloc_ios() allocates a specified number of io structs,
  along with an associated buffer for the data.

- dev_async_free_ios() frees all the allocated io structs+buffers.

- dev_async_io_get() gets an available io struct from those
  allocated in alloc_ios.  If none are available, it will allocate
  a new io struct if under limit.

- dev_async_io_put() puts a used io struct back into the set
  of unused io structs, making it available for get.

- dev_async_read_submit() start an async read io.

- dev_async_getevents() collect async io completions.
2017-10-27 10:11:17 -05:00
Heinz Mauelshagen
4a3884245d raid: ignore --stripes/--stripesize on takeover
Converting from one raid level to another, no changes
of stripes or stripesize can be requested because those
are subject to reshaping.  I.e. the process requires to
takeover first and secondly request raid algorithm,
stripe or stripesize changes.

Ignore any related changes display warninngs
and proceed with the takeover.

Without this patch, a takeover requesting
stripesize change causes data corruption!
2017-10-26 17:16:23 +02:00
Zdenek Kabelac
837bfab75c log: better message when reached log limit
Add explaining message, when command was aborted due to the reach
of configure line number count (LVM_LOG_FILE_MAX_LINES)
for logging (used mainly with testing).
2017-10-26 14:04:58 +02:00
Zdenek Kabelac
0e7edd1d24 snapshot: improve validation
Do not allow to take snapshot of mirror/raid leg or log or metadata LV.
This was actually never supported, but user was able to create it,
and this put device stack in hardly fixable state (needs manual work).

This prevents such creation to pass.

Also improve validation when recreating snapshot volume type
from origin and COW volume.
2017-10-25 21:58:01 +02:00
Zdenek Kabelac
d6fcab900b lvextend: detect stacked cache lv used for thinpool
Ensure, that cacheLV is not tried to be resize until full support is
added.
2017-10-23 12:00:43 +02:00
Alasdair G Kergon
f3ae99dcc0 liblvm: Move lib code used exclusively into metadata-liblvm.c
Also remove some redundant function definitions from metadata.h.
2017-10-18 19:29:32 +01:00
Alasdair G Kergon
f1cc5b12fd tidy: Add missing underscores to statics. 2017-10-18 15:58:13 +01:00
David Teigland
1b319f39d6 lvmlockd: check error for sanlock access to lvmlock LV
When the sanlock daemon does not have permission to access
the lvmlock LV, make the error messages more helpful.
2017-10-17 13:45:53 -05:00
Alasdair G Kergon
146745ad88 device: Separate errors for dev not found and filtered.
Replaced the confusing device error message "not found (or ignored by
filtering)" by either "not found" or "excluded by a filter".
(Later we should be able to say which filter.)

Left the the liblvm code paths alone.
2017-10-17 02:12:41 +01:00
Zdenek Kabelac
186a3da998 thin: monitor also external origin
Add missing monitoring for external origin LVs and add -real suffix
for UUID used for monitoring of external origin.
2017-10-16 15:47:46 +02:00
David Teigland
6ac1e04b3a replicator: remove the code
It has not been used in a long time and is not
expected to be used further.
2017-10-13 16:20:42 -05:00
Heinz Mauelshagen
cf13a30eaa lvcreate: allow 100%FREE creation of "--type mirror" to work
Fixes the following case with 3PVs and 3 legs "mirror" LV:

# lvcreate -l100%FREE --type mirror -m2 vg3
  Insufficient free space for log allocation for logical volume .
  Unable to allocate extents for mirror log.

Related: rhbz1269533
2017-10-12 17:43:24 +02:00
Zdenek Kabelac
e02e5b0c5b activation: fix activation lock
Activation lock has a primary purpose to serialize locking of individual
LV in case there is no other protecting mechanism for parallel
execution.

However in the case an activated LV is composed from several other LVs,
noone should be able to manipulate with those LVs as well.

This patch add a very 'naive' global VG activation locking in this case.
In the future we may introduce smarter function detecting minimal closed
graph components if this will appear as bottleneck

Patch checks if the  VG Write lock is held - in this case we do not
need any more locking - command has exclusive access to VG.

In case we have clustered VG and we are activating an LV which does not
need other LVs - we also do not need any more locks.

In all other cases take respective lock - for single LV - use lvid,
for complex LVs  use vgname.
2017-10-11 14:24:28 +02:00
Zdenek Kabelac
9bd7615fef activation: fix locking resource name for activation lock
Avoid cutting away 1st. character for activation lock.
Unlike with VG name locks like #orphan we should not cut-off 1st.
characted from resource name.
2017-10-11 14:24:28 +02:00
Alasdair G Kergon
22789563de thin: Improve overprovisioning and repair warnings. 2017-10-09 19:48:00 +01:00
Heinz Mauelshagen
3a639d8144 raid: cleanup raid4/5/6/10 validation check 2017-10-09 16:13:45 +02:00
Heinz Mauelshagen
44275c763c raid: fix validation check for raid0 segment data_offset member
Commit 2f754b73ff missed one.
2017-10-09 16:03:35 +02:00
Heinz Mauelshagen
5f13e33d54 lvcreate: fix region size on striped RaidLVs
Creating striped RaidLVs with lv size not divisible by region size
caused the region size to be adjusted:

# lvcreate   --type raid5 -n region_check.32.00m_3 -i 3 -L 1g --nosync -R 32.00m raid_sanity
  Using default stripesize 64.00 KiB.
  Rounding size 1.00 GiB (256 extents) up to stripe boundary size <1.01 GiB(258 extents).
  WARNING: New raid5 won't be synchronised. Don't read what you didn't write!
  Using reduced mirror region size of 8.00 MiB
  Logical volume region_check.32.00m_3 created.

Fix by not imposing "mirror" constraints on "raid".

Resolves: rhbz1404007
2017-10-09 14:35:06 +02:00
Heinz Mauelshagen
2f754b73ff raid: fix validation checks for segment data_offset member
Commit 222e1e3ace was not
valuing special case of data_ofset member equal to 1.
2017-10-09 14:01:23 +02:00
Heinz Mauelshagen
554a761db2 raid: return previous reshape space allocation properly
Fix returning previous allocation of reshape space.
2017-10-09 13:55:01 +02:00
Alasdair G Kergon
486ed10848 vgmerge: Fix intermediate metadata corruption
vgmerge suffers from a similar problem to the one fixed in commit
8146548d25 ("vgsplit: Fix intermediate
metadata corruption.")

When merging, splitting or renaming VGs, use a new PV status flag
PV_MOVED_VG to mark the PVs that hold metadata with the old VG name and
use this to provide PV-level granularity instead of incorrectly assuming
all PVs in the VG are the same.
2017-10-06 02:20:45 +01:00
Heinz Mauelshagen
a95f656d0d raid: enhance conversion rejection message
Related: rhbz1439399
2017-10-04 17:05:59 +02:00
Alasdair G Kergon
8146548d25 vgsplit: Fix intermediate metadata corruption.
Changing the VG of a PV uses the same on-disk mechanism as vgrename.
This relies on recognising both the old and new VG names.  Prior to this
patch the vgsplit code incorrectly provided the new VG name twice
instead of the old and new ones.  This lead the low-level mechanism not
to recognise the device as already belonging to a VG and so paying no
attention to the location of its existing metadata, sometimes partly
overwriting it and then later trying to read the corrupt metadata and
issuing a checksum error.
2017-09-22 18:34:34 +01:00
David Teigland
f2ee0e7aca pvmove: require LV name in a shared VG
In a shared VG, only allow pvmove with a named LV,
so that only PE's used by the LV will be moved.
The LV is then activated exclusively, ensuring that
the PE's being moved are not used from another host.

Previously, pvmove was mistakenly allowed on a full PV.
This won't work when LVs using that PV are active on
other hosts.
2017-09-20 09:56:51 -05:00
David Teigland
518a8e8cfb lvmlockd: activate mirror LVs in shared mode with cmirrord
Previously lvmlockd disallowed mirror LVs to be activated
in shared mode.
2017-09-20 09:55:34 -05:00
David Teigland
8e8755319c lvcreate: use cmd defs to deny unspported lockd cases
In a shared VG, lvconvert must be used to create thin pools
and cache pools, not the lvcreate variants of those commands.
Deny these cases early in lvcreate using the new command defs.
Denying these cases deeper in the code was missing some
cleanup of the partially completed command.
2017-09-14 12:28:48 -05:00
David Teigland
d93a2bb741 revert tidy: prefer not using else after return
Revert the lvmlockd.c changes from:
  commit 0bf836aa14
  "tidy: prefer not using else after return"

The commit introduced at least one regression, which broke
lvcreate of a thin pool in a shared VG.
2017-09-14 12:28:48 -05:00
David Teigland
3071837e21 lvmlockd: always disallow mirror splitting
lv_raid_split() was correctly prevented in a shared VG,
but lv_raid_split_and_track() was missing that check.
2017-09-05 10:28:33 -05:00
David Teigland
f847fcd31a lvmlockd: print error about starting lock manager
In the case where lvmlockd is running, but no lock manager
is running, we should print a specific error message about
that situation.
2017-08-28 16:24:00 -05:00
Zdenek Kabelac
26d97f179f reporting: validate time parsing with strtol
Check for out-of-range numbers being result of strtol parsing.
2017-08-25 14:20:59 +02:00
Zdenek Kabelac
5de9444202 locking: avoid descriptor leak for nonblocking mode
When file-locking mode failed on locking, such description was leaked
(typically not an issue since command usually exists afterwards).
So shirt close() at the end of function and use it in all error paths.

Also make sure, when interrrupt is detected, it's really not holding
lock and returns 0.
2017-08-25 14:12:55 +02:00
Zdenek Kabelac
539a48a328 debug: add stack trace point 2017-08-22 10:23:31 +02:00
Zdenek Kabelac
c1e3f96c97 lvmcache: check for lvmcache_foreach_mda return code
lvmcache_foreach_mda() can fail for numerous reasons
and failing error code cannot be ignored (out-of-memory...)

TODO: might need more error handling tunning.
2017-08-22 10:23:31 +02:00
David Teigland
df5c296426 lvmlockd: zero extended lvmlock LV
After the internal lvmlock LV (holding sanlock leases) is
extended to hold more leases, it needs to be zeroed.
sanlock expects to see either zeroed blocks or blocks
initialized with leases.
2017-08-15 11:56:31 -05:00
Peter Rajnoha
3c978f7bcc pvcreate: fix check for 2nd mda at end of disk fits if using pvcreate --restorefile
Fix code checking that the 2nd mda which is at the end of disk really
fits the available free space and avoid any DA and MDA interleaving when
we already have DA preallocated. This mainly applies when we're restoring
a PV from VG backup using pvcreate --restorefile where we may already have
some DA preallocated - this means the PV was in a VG before with already
allocated space from it (the LVs were created). Hence we need to avoid
stepping into DA - the MDA can never ever be inside in such case!

The code responsible for this calculation was already in
_text_pv_add_metadata_area fn, but it had a bug in the calculation where
we subtracted one more sector by mistake and then the code could still
incorrectly allocate the MDA inside existing DA. The patch also renames
the variable in the code so it doesn't confuse us in future.

Also, if the 2nd mda doesn't fit, don't silently continue with just 1
MDA (at the start of the disk). If 2nd mda was requested and we can't
create that due to unavailable space, error out correctly (the patch
also adds a test to shell/pvcreate-operation.sh for this case).
2017-08-15 13:40:25 +02:00
Heinz Mauelshagen
222e1e3ace raid: more validation checks for segment data_offset member
Upgrade commit fb641c3423 with additional checks.
2017-08-14 15:00:15 +02:00
Alasdair G Kergon
4fa5add6b1 pvcreate: Wipe cached bootloaderarea when wiping label.
Previously the cache remembered an existing bootloaderarea and
reinstated it (without even checking for overlap) when asked to
write out the PV.  pvcreate could write out an incorrect layout.
2017-08-11 20:32:04 +01:00
Alasdair G Kergon
fe423ef583 lvmconfig: Add options to produce file preamble
Use --withgeneralpreamble and --withlocalpreamble instead of
concatenating files.
2017-08-05 16:23:34 +01:00
Zdenek Kabelac
00fdf01d9d makefiles: cleanups 2017-08-01 11:53:32 +02:00
Zdenek Kabelac
2232e82d25 makefiles: fixing linking
Avoid adding -g more then once for debug builds.
Avoid enabling  DEBUG_MEM when we build multithreaded tools.
Link executables with -fPIE -pie and --export-dynamic LDFLAGS
Introduce PROGS_FLAGS to add option to pass flags for external libs.
Link  lvm2 internally library only when really used.
Link DAEMON_LIBS with daemons.
Pass VALGRIND_CFLAGS internally
Set shell failure mode on couple places.
2017-08-01 11:53:30 +02:00
Zdenek Kabelac
8256170e6a thin: warn about too big chunks size
lvm2 warned about zeroing and too big chunksize (>=512KiB), but
only during lvconvert, so lvcreate was creating thin-pools
without any warning about possible slowness of thin provisioning
because of zeroing.
2017-08-01 11:52:27 +02:00
Alasdair G Kergon
3654f478e1 toolcontext: Improve invalid units error message. 2017-07-27 00:51:50 +01:00
Zdenek Kabelac
876c4a1b3b tidy: declaration names match implementation
Put in sync some naming used for function declaration and
actual in-code implementation.
2017-07-20 19:16:41 +02:00