lvmcache: simplify metadata cache

The copy of VG metadata stored in lvmcache was not being used in general. It pretended to be a generic VG metadata cache, but was not being used except for clvmd activation. There it was used to avoid reading from disk while devices were suspended, i.e. in resume. This removes the code that attempted to make this look like a generic metadata cache, and replaces with with something narrowly targetted to what it's actually used for. This is a way of passing the VG from suspend to resume in clvmd. Since in the case of clvmd one caller can't simply pass the same VG to both suspend and resume, suspend needs to stash the VG somewhere that resume can grab it from. (resume doesn't want to read it from disk since devices are suspended.) The lvmcache vginfo struct is used as a convenient place to stash the VG to pass it from suspend to resume, even though it isn't related to the lvmcache or vginfo. These suspended_vg* vginfo fields should not be used or touched anywhere else, they are only to be used for passing the VG data from suspend to resume in clvmd. The VG data being passed between suspend and resume is never modified, and will only exist in the brief period between suspend and resume in clvmd. suspend has both old (current) and new (precommitted) copies of the VG metadata. It stashes both of these in the vginfo prior to suspending devices. When vg_commit is successful, it sets a flag in vginfo as before, signaling the transition from old to new metadata. resume grabs the VG stashed by suspend. If the vg_commit happened, it grabs the new VG, and if the vg_commit didn't happen it grabs the old VG. The VG is then used to resume LVs. This isolates clvmd-specific code and usage from the normal lvm vg_read code, making the code simpler and the behavior easier to verify. Sequence of operations: - lv_suspend() has both vg_old and vg_new and stashes a copy of each onto the vginfo: lvmcache_save_suspended_vg(vg_old); lvmcache_save_suspended_vg(vg_new); - vg_commit() happens, which causes all clvmd instances to call lvmcache_commit_metadata(vg). A flag is set in the vginfo indicating the transition from the old to new VG: vginfo->suspended_vg_committed = 1; - lv_resume() needs either vg_old or vg_new to use in resuming LVs. It doesn't want to read the VG from disk since devices are suspended, so it gets the VG stashed by lv_suspend: vg = lvmcache_get_suspended_vg(vgid); If the vg_commit did not happen, suspended_vg_committed will not be set, and in this case, lvmcache_get_suspended_vg() will return the old VG instead of the new VG, and it will resume LVs based on the old metadata.
label_scan: remove extra label scan and read for orphan PVs
2025-09-24 21:44:22 +03:00 · 2017-11-10 10:53:57 -06:00 · 2017-11-10 10:53:57 -06:00 · 2017-11-10 10:53:57 -06:00 · 2017-11-10 10:53:57 -06:00 · 2017-11-10 10:53:57 -06:00
50 changed files with 3398 additions and 1040 deletions
--- a/69
+++ b/69
@@ -707,7 +707,9 @@ FSADM
 ELDFLAGS
 DM_LIB_PATCHLEVEL
 DMEVENTD_PATH
+AIO_LIBS
 DL_LIBS
+AIO
 DEVMAPPER
 DEFAULT_USE_LVMLOCKD
 DEFAULT_USE_LVMPOLLD
@@ -954,6 +956,7 @@ enable_profiling
 enable_testing
 enable_valgrind_pool
 enable_devmapper
+enable_aio
 enable_lvmetad
 enable_lvmpolld
 enable_lvmlockd_sanlock
@@ -1692,6 +1695,7 @@ Optional Features:
  --enable-testing        enable testing targets in the makefile
  --enable-valgrind-pool  enable valgrind awareness of pools
  --disable-devmapper     disable LVM2 device-mapper interaction
+  --disable-aio           disable async i/o
  --enable-lvmetad        enable the LVM Metadata Daemon
  --enable-lvmpolld       enable the LVM Polling Daemon
  --enable-lvmlockd-sanlock
@@ -3179,6 +3183,7 @@ case "$host_os" in
 		LDDEPS="$LDDEPS .export.sym"
 		LIB_SUFFIX=so
 		DEVMAPPER=yes
+		AIO=yes
 		BUILD_LVMETAD=no
 		BUILD_LVMPOLLD=no
 		LOCKDSANLOCK=no
@@ -3198,6 +3203,7 @@ case "$host_os" in
 		CLDNOWHOLEARCHIVE=
 		LIB_SUFFIX=dylib
 		DEVMAPPER=yes
+		AIO=no
 		ODIRECT=no
 		DM_IOCTLS=no
 		SELINUX=no
@@ -11825,6 +11831,67 @@ $as_echo "#define DEVMAPPER_SUPPORT 1" >>confdefs.h

 fi

+################################################################################
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to use aio" >&5
+$as_echo_n "checking whether to use aio... " >&6; }
+# Check whether --enable-aio was given.
+if test "${enable_aio+set}" = set; then :
+  enableval=$enable_aio; AIO=$enableval
+fi
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $AIO" >&5
+$as_echo "$AIO" >&6; }
+
+if test "$AIO" = yes; then
+	{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for io_setup in -laio" >&5
+$as_echo_n "checking for io_setup in -laio... " >&6; }
+if ${ac_cv_lib_aio_io_setup+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-laio  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char io_setup ();
+int
+main ()
+{
+return io_setup ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_aio_io_setup=yes
+else
+  ac_cv_lib_aio_io_setup=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_aio_io_setup" >&5
+$as_echo "$ac_cv_lib_aio_io_setup" >&6; }
+if test "x$ac_cv_lib_aio_io_setup" = xyes; then :
+
+$as_echo "#define AIO_SUPPORT 1" >>confdefs.h
+
+		AIO_LIBS="-laio"
+		AIO_SUPPORT=yes
+else
+  AIO_LIBS=
+		AIO_SUPPORT=no
+fi
+
+fi
+
 ################################################################################
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build LVMetaD" >&5
 $as_echo_n "checking whether to build LVMetaD... " >&6; }
@@ -15774,6 +15841,8 @@ _ACEOF



+
+



--- a/configure.in
+++ b/configure.in
@@ -39,6 +39,7 @@ case "$host_os" in
 		LDDEPS="$LDDEPS .export.sym"
 		LIB_SUFFIX=so
 		DEVMAPPER=yes
+		AIO=yes
 		BUILD_LVMETAD=no
 		BUILD_LVMPOLLD=no
 		LOCKDSANLOCK=no
@@ -58,6 +59,7 @@ case "$host_os" in
 		CLDNOWHOLEARCHIVE=
 		LIB_SUFFIX=dylib
 		DEVMAPPER=yes
+		AIO=no
 		ODIRECT=no
 		DM_IOCTLS=no
 		SELINUX=no
@@ -1121,6 +1123,24 @@ if test "$DEVMAPPER" = yes; then
 	AC_DEFINE([DEVMAPPER_SUPPORT], 1, [Define to 1 to enable LVM2 device-mapper interaction.])
 fi

+################################################################################
+dnl -- Disable aio
+AC_MSG_CHECKING(whether to use aio)
+AC_ARG_ENABLE(aio,
+	      AC_HELP_STRING([--disable-aio],
+			     [disable async i/o]),
+	      AIO=$enableval)
+AC_MSG_RESULT($AIO)
+
+if test "$AIO" = yes; then
+	AC_CHECK_LIB(aio, io_setup,
+		[AC_DEFINE([AIO_SUPPORT], 1, [Define to 1 if aio is available.])
+		AIO_LIBS="-laio"
+		AIO_SUPPORT=yes],
+		[AIO_LIBS=
+		AIO_SUPPORT=no ])
+fi
+
 ################################################################################
 dnl -- Build lvmetad
 AC_MSG_CHECKING(whether to build LVMetaD)
@@ -2060,9 +2080,11 @@ AC_SUBST(DEFAULT_USE_LVMETAD)
 AC_SUBST(DEFAULT_USE_LVMPOLLD)
 AC_SUBST(DEFAULT_USE_LVMLOCKD)
 AC_SUBST(DEVMAPPER)
+AC_SUBST(AIO)
 AC_SUBST(DLM_CFLAGS)
 AC_SUBST(DLM_LIBS)
 AC_SUBST(DL_LIBS)
+AC_SUBST(AIO_LIBS)
 AC_SUBST(DMEVENTD_PATH)
 AC_SUBST(DM_LIB_PATCHLEVEL)
 AC_SUBST(ELDFLAGS)
--- a/daemons/clvmd/Makefile.in
+++ b/daemons/clvmd/Makefile.in
@@ -77,6 +77,10 @@ include $(top_builddir)/make.tmpl
 LIBS += $(LVMINTERNAL_LIBS) -ldevmapper $(PTHREAD_LIBS)
 CFLAGS += -fno-strict-aliasing $(EXTRA_EXEC_CFLAGS)

+ifeq ("@AIO@", "yes")
+	LIBS += $(AIO_LIBS)
+endif
+
 INSTALL_TARGETS = \
 	install_clvmd

--- a/daemons/clvmd/lvm-functions.c
+++ b/daemons/clvmd/lvm-functions.c
@@ -661,6 +661,7 @@ int do_refresh_cache(void)
 		return -1;
 	}

+	cmd->use_aio = 0;
 	init_full_scan_done(0);
 	init_ignore_suspended_devices(1);
 	lvmcache_force_next_label_scan();
@@ -920,6 +921,7 @@ int init_clvm(struct dm_hash_table *excl_uuid)
 	/* Check lvm.conf is setup for cluster-LVM */
 	check_config();
 	init_ignore_suspended_devices(1);
+	cmd->use_aio = 0;

 	/* Trap log messages so we can pass them back to the user */
 	init_log_fn(lvm2_log_fn);
--- a/include/configure.h.in
+++ b/include/configure.h.in
@@ -1,5 +1,8 @@
 /* include/configure.h.in.  Generated from configure.in by autoheader.  */

+/* Define to 1 if aio is available. */
+#undef AIO_SUPPORT
+
 /* Define to 1 to use libblkid detection of signatures when wiping. */
 #undef BLKID_WIPING_SUPPORT

--- a/lib/activate/activate.c
+++ b/lib/activate/activate.c
@@ -28,6 +28,7 @@
 #include "config.h"
 #include "segtype.h"
 #include "sharedlib.h"
+#include "lvmcache.h"

 #include <limits.h>
 #include <fcntl.h>
@@ -2123,6 +2124,17 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 	if (!lv_info(cmd, lv, laopts->origin_only, &info, 0, 0))
 		goto_out;

+	/*
+	 * Save old and new (current and precommitted) versions of the
+	 * VG metadata for lv_resume() to use, since lv_resume can't
+	 * read metadata given that devices are suspended.  lv_resume()
+	 * will resume LVs using the old/current metadata if the vg_commit
+	 * did happen (or failed), and it will resume LVs using the
+	 * new/precommitted metadata if the vg_commit succeeded.
+	 */
+	lvmcache_save_suspended_vg(lv->vg, 0);
+	lvmcache_save_suspended_vg(lv_pre->vg, 1);
+
 	if (!info.exists || info.suspended) {
 		if (!error_if_not_suspended) {
 			r = 1;
@@ -2281,15 +2293,54 @@ static int _lv_resume(struct cmd_context *cmd, const char *lvid_s,
 		      struct lv_activate_opts *laopts, int error_if_not_active,
 	              const struct logical_volume *lv)
 {
-	const struct logical_volume *lv_to_free = NULL;
+	struct volume_group *vg = NULL;
+	struct logical_volume *lv_found = NULL;
+	const union lvid *lvid;
+	const char *vgid;
 	struct lvinfo info;
 	int r = 0;

 	if (!activation())
 		return 1;

-	if (!lv && !(lv_to_free = lv = lv_from_lvid(cmd, lvid_s, 0)))
-		goto_out;
+	/*
+	 * When called in clvmd, lvid_s is set and lv is not.  We need to
+	 * get the VG metadata without reading disks because devs are
+	 * suspended.  lv_suspend() saved old and new VG metadata for us
+	 * to use here.  If vg_commit() happened, lvmcache_get_suspended_vg
+	 * will return the new metadata for us to use in resuming LVs.
+	 * If vg_commit() did not happen, lvmcache_get_suspended_vg
+	 * returns the old metadata which we use to resume LVs.
+	 */
+	if (!lv && lvid_s) {
+		lvid = (const union lvid *) lvid_s;
+		vgid = (const char *)lvid->id[0].uuid;
+
+		if ((vg = lvmcache_get_suspended_vg(vgid))) {
+			log_debug_activation("Resuming LVID %s found saved vg seqno %d %s", lvid_s, vg->seqno, vg->name);
+			if ((lv_found = find_lv_in_vg_by_lvid(vg, lvid))) {
+				log_debug_activation("Resuming LVID %s found saved LV %s", lvid_s, display_lvname(lv_found));
+				lv = lv_found;
+			} else
+				log_debug_activation("Resuming LVID %s did not find saved LV", lvid_s);
+		} else
+			log_debug_activation("Resuming LVID %s did not find saved VG", lvid_s);
+
+		/*
+		 * resume must have been called without a preceding suspend,
+		 * so we need to read the vg.
+		 */
+
+		if (!lv) {
+			log_debug_activation("Resuming LVID %s reading VG", lvid_s);
+			if (!(lv_found = lv_from_lvid(cmd, lvid_s, 0))) {
+				log_debug_activation("Resuming LVID %s failed to read VG", lvid_s);
+				goto out;
+			}
+
+			lv = lv_found;
+		}
+	}

 	if (!lv_is_origin(lv) && !lv_is_thin_volume(lv) && !lv_is_thin_pool(lv))
 		laopts->origin_only = 0;
@@ -2334,9 +2385,6 @@ static int _lv_resume(struct cmd_context *cmd, const char *lvid_s,

 	r = 1;
 out:
-	if (lv_to_free)
-		release_vg(lv_to_free->vg);
-
 	return r;
 }

@@ -2463,6 +2511,10 @@ int lv_activation_filter(struct cmd_context *cmd, const char *lvid_s,
 			 int *activate_lv, const struct logical_volume *lv)
 {
 	const struct logical_volume *lv_to_free = NULL;
+	struct volume_group *vg = NULL;
+	struct logical_volume *lv_found = NULL;
+	const union lvid *lvid;
+	const char *vgid;
 	int r = 0;

 	if (!activation()) {
@@ -2470,6 +2522,24 @@ int lv_activation_filter(struct cmd_context *cmd, const char *lvid_s,
 		return 1;
 	}

+	/*
+	 * This function is called while devices are suspended,
+	 * so try to use the copy of the vg that was saved in
+	 * lv_suspend.
+	 */
+	if (!lv && lvid_s) {
+		lvid = (const union lvid *) lvid_s;
+		vgid = (const char *)lvid->id[0].uuid;
+
+		if ((vg = lvmcache_get_suspended_vg(vgid))) {
+			log_debug_activation("activation_filter for %s found saved VG seqno %d %s", lvid_s, vg->seqno, vg->name);
+			if ((lv_found = find_lv_in_vg_by_lvid(vg, lvid))) {
+				log_debug_activation("activation_filter for %s found saved LV %s", lvid_s, display_lvname(lv_found));
+				lv = lv_found;
+			}
+		}
+	}
+
 	if (!lv && !(lv_to_free = lv = lv_from_lvid(cmd, lvid_s, 0)))
 		goto_out;

--- a/lib/cache/lvmcache.c
+++ b/lib/cache/lvmcache.c
@@ -63,15 +63,42 @@ struct lvmcache_vginfo {
 	char *lock_type;
 	uint32_t mda_checksum;
 	size_t mda_size;
-	size_t vgmetadata_size;
-	char *vgmetadata;	/* Copy of VG metadata as format_text string */
-	struct dm_config_tree *cft; /* Config tree created from vgmetadata */
-				    /* Lifetime is directly tied to vgmetadata */
-	struct volume_group *cached_vg;
-	unsigned holders;
-	unsigned vg_use_count;	/* Counter of vg reusage */
-	unsigned precommitted;	/* Is vgmetadata live or precommitted? */
-	unsigned cached_vg_invalidated;	/* Signal to regenerate cached_vg */
+	int independent_metadata_location; /* metadata read from independent areas */
+
+	/*
+	 * The following are not related to lvmcache or vginfo,
+	 * but are borrowing the vginfo to store the data.
+	 *
+	 * suspended_vg_* are used only by clvmd suspend/resume.
+	 * In suspend, both old (current) and new (precommitted)
+	 * metadata is saved.  (Each in three forms: buffer, cft,
+	 * and vg).  In resume, if the vg was committed
+	 * (suspended_vg_committed is set), then LVs are resumed
+	 * using the new metadata, but if the vg wasn't committed,
+	 * then LVs are resumed using the old metadata.
+	 *
+	 * suspended_vg_committed is set to 1 when clvmd gets
+	 * LCK_VG_COMMIT from vg_commit().
+	 *
+	 * These fields are only used between suspend and resume
+	 * in clvmd, and should never be used in any other way.
+	 * The contents of this data are never changed.  This
+	 * data does not really belong in lvmcache, it's unrelated
+	 * to lvmcache or vginfo, but it's just a convenient place
+	 * for clvmd to stash the VG between suspend and resume
+	 * (since the same caller isn't present to pass the VG to
+	 * both suspend and resume in the case of clvmd.)
+	 *
+	 * This data is not really a "cache" of the VG, it is just
+	 * a location to pass the VG between suspend and resume.
+	 */
+	int suspended_vg_committed;
+	char *suspended_vg_old_buf;
+	struct dm_config_tree *suspended_vg_old_cft;
+	struct volume_group *suspended_vg_old;
+	char *suspended_vg_new_buf;
+	struct dm_config_tree *suspended_vg_new_cft;
+	struct volume_group *suspended_vg_new;
 };

 static struct dm_hash_table *_pvid_hash = NULL;
@@ -138,73 +165,7 @@ void lvmcache_seed_infos_from_lvmetad(struct cmd_context *cmd)
 	_has_scanned = 1;
 }

-/* Volume Group metadata cache functions */
-static void _free_cached_vgmetadata(struct lvmcache_vginfo *vginfo)
-{
-	if (!vginfo || !vginfo->vgmetadata)
-		return;
-
-	dm_free(vginfo->vgmetadata);
-
-	vginfo->vgmetadata = NULL;
-
-	/* Release also cached config tree */
-	if (vginfo->cft) {
-		dm_config_destroy(vginfo->cft);
-		vginfo->cft = NULL;
-	}
-
-	log_debug_cache("Metadata cache: VG %s wiped.", vginfo->vgname);
-
-	release_vg(vginfo->cached_vg);
-}
-
-/*
- * Cache VG metadata against the vginfo with matching vgid.
- */
-static void _store_metadata(struct volume_group *vg, unsigned precommitted)
-{
-	char uuid[64] __attribute__((aligned(8)));
-	struct lvmcache_vginfo *vginfo;
-	char *data;
-	size_t size;
-
-	if (!(vginfo = lvmcache_vginfo_from_vgid((const char *)&vg->id))) {
-		stack;
-		return;
-	}
-
-	if (!(size = export_vg_to_buffer(vg, &data))) {
-		stack;
-		_free_cached_vgmetadata(vginfo);
-		return;
-	}
-
-	/* Avoid reparsing of the same data string */
-	if (vginfo->vgmetadata && vginfo->vgmetadata_size == size &&
-	    strcmp(vginfo->vgmetadata, data) == 0)
-		dm_free(data);
-	else {
-		_free_cached_vgmetadata(vginfo);
-		vginfo->vgmetadata_size = size;
-		vginfo->vgmetadata = data;
-	}
-
-	vginfo->precommitted = precommitted;
-
-	if (!id_write_format((const struct id *)vginfo->vgid, uuid, sizeof(uuid))) {
-		stack;
-		return;
-	}
-
-	log_debug_cache("Metadata cache: VG %s (%s) stored (%" PRIsize_t " bytes%s).",
-			vginfo->vgname, uuid, size,
-			precommitted ? ", precommitted" : "");
-}
-
-static void _update_cache_info_lock_state(struct lvmcache_info *info,
-					  int locked,
-					  int *cached_vgmetadata_valid)
+static void _update_cache_info_lock_state(struct lvmcache_info *info, int locked)
 {
 	int was_locked = (info->status & CACHE_LOCKED) ? 1 : 0;

@@ -212,10 +173,8 @@ static void _update_cache_info_lock_state(struct lvmcache_info *info,
 	 * Cache becomes invalid whenever lock state changes unless
 	 * exclusive VG_GLOBAL is held (i.e. while scanning).
 	 */
-	if (!lvmcache_vgname_is_locked(VG_GLOBAL) && (was_locked != locked)) {
+	if (!lvmcache_vgname_is_locked(VG_GLOBAL) && (was_locked != locked))
 		info->status |= CACHE_INVALID;
-		*cached_vgmetadata_valid = 0;
-	}

 	if (locked)
 		info->status |= CACHE_LOCKED;
@@ -227,14 +186,9 @@ static void _update_cache_vginfo_lock_state(struct lvmcache_vginfo *vginfo,
 					    int locked)
 {
 	struct lvmcache_info *info;
-	int cached_vgmetadata_valid = 1;

 	dm_list_iterate_items(info, &vginfo->infos)
-		_update_cache_info_lock_state(info, locked,
-					      &cached_vgmetadata_valid);
-
-	if (!cached_vgmetadata_valid)
-		_free_cached_vgmetadata(vginfo);
+		_update_cache_info_lock_state(info, locked);
 }

 static void _update_cache_lock_state(const char *vgname, int locked)
@@ -247,6 +201,35 @@ static void _update_cache_lock_state(const char *vgname, int locked)
 	_update_cache_vginfo_lock_state(vginfo, locked);
 }

+static void _suspended_vg_free(struct lvmcache_vginfo *vginfo, int free_old, int free_new)
+{
+	if (free_old) {
+		if (vginfo->suspended_vg_old_buf)
+			dm_free(vginfo->suspended_vg_old_buf);
+		if (vginfo->suspended_vg_old_cft)
+			dm_config_destroy(vginfo->suspended_vg_old_cft);
+		if (vginfo->suspended_vg_old)
+			release_vg(vginfo->suspended_vg_old);
+
+		vginfo->suspended_vg_old_buf = NULL;
+		vginfo->suspended_vg_old_cft = NULL;
+		vginfo->suspended_vg_old = NULL;
+	}
+
+	if (free_new) {
+		if (vginfo->suspended_vg_new_buf)
+			dm_free(vginfo->suspended_vg_new_buf);
+		if (vginfo->suspended_vg_new_cft)
+			dm_config_destroy(vginfo->suspended_vg_new_cft);
+		if (vginfo->suspended_vg_new)
+			release_vg(vginfo->suspended_vg_new);
+
+		vginfo->suspended_vg_new_buf = NULL;
+		vginfo->suspended_vg_new_cft = NULL;
+		vginfo->suspended_vg_new = NULL;
+	}
+}
+
 static void _drop_metadata(const char *vgname, int drop_precommitted)
 {
 	struct lvmcache_vginfo *vginfo;
@@ -255,25 +238,98 @@ static void _drop_metadata(const char *vgname, int drop_precommitted)
 	if (!(vginfo = lvmcache_vginfo_from_vgname(vgname, NULL)))
 		return;

-	/*
-	 * Invalidate cached PV labels.
-	 * If cached precommitted metadata exists that means we
-	 * already invalidated the PV labels (before caching it)
-	 * and we must not do it again.
-	 */
-	if (!drop_precommitted && vginfo->precommitted && !vginfo->vgmetadata)
-		log_error(INTERNAL_ERROR "metadata commit (or revert) missing before "
-			  "dropping metadata from cache.");
-
-	if (drop_precommitted || !vginfo->precommitted)
+	if (drop_precommitted)
 		dm_list_iterate_items(info, &vginfo->infos)
 			info->status |= CACHE_INVALID;

-	_free_cached_vgmetadata(vginfo);
-
-	/* VG revert */
 	if (drop_precommitted)
-		vginfo->precommitted = 0;
+		_suspended_vg_free(vginfo, 0, 1);
+	else
+		_suspended_vg_free(vginfo, 1, 1);
+}
+
+void lvmcache_save_suspended_vg(struct volume_group *vg, int precommitted)
+{
+	struct lvmcache_vginfo *vginfo;
+	struct format_instance *fid;
+	struct format_instance_ctx fic;
+	struct volume_group *susp_vg = NULL;
+	struct dm_config_tree *susp_cft = NULL;
+	char *susp_buf = NULL;
+	size_t size;
+	int new = precommitted;
+	int old = !precommitted;
+
+	if (!(vginfo = lvmcache_vginfo_from_vgid((const char *)&vg->id)))
+		goto_bad;
+
+	/* already saved */
+	if (old && vginfo->suspended_vg_old &&
+	    (vginfo->suspended_vg_old->seqno == vg->seqno))
+		return;
+
+	/* already saved */
+	if (new && vginfo->suspended_vg_new &&
+	    (vginfo->suspended_vg_new->seqno == vg->seqno))
+		return;
+
+	_suspended_vg_free(vginfo, old, new);
+
+	if (!(size = export_vg_to_buffer(vg, &susp_buf)))
+		goto_bad;
+
+	fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
+	fic.context.vg_ref.vg_name = vginfo->vgname;
+	fic.context.vg_ref.vg_id = vginfo->vgid;
+	if (!(fid = vginfo->fmt->ops->create_instance(vginfo->fmt, &fic)))
+		goto_bad;
+
+	if (!(susp_cft = config_tree_from_string_without_dup_node_check(susp_buf)))
+		goto_bad;
+
+	if (!(susp_vg = import_vg_from_config_tree(susp_cft, fid)))
+		goto_bad;
+
+	if (old) {
+		vginfo->suspended_vg_old_buf = susp_buf;
+		vginfo->suspended_vg_old_cft = susp_cft;
+		vginfo->suspended_vg_old = susp_vg;
+		log_debug_cache("lvmcache saved suspended vg old seqno %d %s", vg->seqno, vg->name);
+	} else {
+		vginfo->suspended_vg_new_buf = susp_buf;
+		vginfo->suspended_vg_new_cft = susp_cft;
+		vginfo->suspended_vg_new = susp_vg;
+		log_debug_cache("lvmcache saved suspended vg new seqno %d %s", vg->seqno, vg->name);
+	}
+	return;
+
+bad:
+	_suspended_vg_free(vginfo, old, new);
+	log_debug_cache("lvmcache failed to save suspended pre %d vg %s", precommitted, vg->name);
+}
+
+struct volume_group *lvmcache_get_suspended_vg(const char *vgid)
+{
+	struct lvmcache_vginfo *vginfo;
+
+	if (!(vginfo = lvmcache_vginfo_from_vgid(vgid)))
+		return_NULL;
+
+
+	if (vginfo->suspended_vg_committed)
+		return vginfo->suspended_vg_new;
+	else
+		return vginfo->suspended_vg_old;
+}
+
+void lvmcache_drop_suspended_vg(struct volume_group *vg)
+{
+	struct lvmcache_vginfo *vginfo;
+
+	if (!(vginfo = lvmcache_vginfo_from_vgid((const char *)&vg->id)))
+		return;
+
+	_suspended_vg_free(vginfo, 1, 1);
 }

 /*
@@ -288,11 +344,7 @@ void lvmcache_commit_metadata(const char *vgname)
 	if (!(vginfo = lvmcache_vginfo_from_vgname(vgname, NULL)))
 		return;

-	if (vginfo->precommitted) {
-		log_debug_cache("Precommitted metadata cache: VG %s upgraded to committed.",
-				vginfo->vgname);
-		vginfo->precommitted = 0;
-	}
+	vginfo->suspended_vg_committed = 1;
 }

 void lvmcache_drop_metadata(const char *vgname, int drop_precommitted)
@@ -542,7 +594,6 @@ const struct format_type *lvmcache_fmt_from_vgname(struct cmd_context *cmd,
 {
 	struct lvmcache_vginfo *vginfo;
 	struct lvmcache_info *info;
-	struct label *label;
 	struct dm_list *devh, *tmp;
 	struct dm_list devs;
 	struct device_list *devl;
@@ -587,7 +638,7 @@ const struct format_type *lvmcache_fmt_from_vgname(struct cmd_context *cmd,

 	dm_list_iterate_safe(devh, tmp, &devs) {
 		devl = dm_list_item(devh, struct device_list);
-		(void) label_read(devl->dev, &label, UINT64_C(0));
+		label_read(devl->dev, NULL, UINT64_C(0));
 		dm_list_del(&devl->list);
 		dm_free(devl);
 	}
@@ -675,18 +726,6 @@ static int _info_is_valid(struct lvmcache_info *info)
 	return 1;
 }

-static int _vginfo_is_valid(struct lvmcache_vginfo *vginfo)
-{
-	struct lvmcache_info *info;
-
-	/* Invalid if any info is invalid */
-	dm_list_iterate_items(info, &vginfo->infos)
-		if (!_info_is_valid(info))
-			return 0;
-
-	return 1;
-}
-
 /* vginfo is invalid if it does not contain at least one valid info */
 static int _vginfo_is_invalid(struct lvmcache_vginfo *vginfo)
 {
@@ -752,7 +791,7 @@ char *lvmcache_vgname_from_pvid(struct cmd_context *cmd, const char *pvid)
 	struct lvmcache_info *info;
 	char *vgname;

-	if (!lvmcache_device_from_pvid(cmd, (const struct id *)pvid, NULL, NULL)) {
+	if (!lvmcache_device_from_pvid(cmd, (const struct id *)pvid, NULL)) {
 		log_error("Couldn't find device with uuid %s.", pvid);
 		return NULL;
 	}
@@ -768,19 +807,42 @@ char *lvmcache_vgname_from_pvid(struct cmd_context *cmd, const char *pvid)
 	return vgname;
 }

-static void _rescan_entry(struct lvmcache_info *info)
+/*
+ * FIXME: get rid of the CACHE_INVALID state and rescanning
+ * infos with that flag.  The code should just know which devices
+ * need scanning and when.
+ */
+static int _label_scan_invalid(struct cmd_context *cmd)
 {
-	struct label *label;
+	struct dm_list devs;
+	struct dm_hash_node *n;
+	struct device_list *devl;
+	struct lvmcache_info *info;
+	int dev_count = 0;
+	int ret;

-	if (info->status & CACHE_INVALID)
-		(void) label_read(info->dev, &label, UINT64_C(0));
-}
+	dm_list_init(&devs);

-static int _scan_invalid(void)
-{
-	dm_hash_iter(_pvid_hash, (dm_hash_iterate_fn) _rescan_entry);
+	dm_hash_iterate(n, _pvid_hash) {
+		if (!(info = dm_hash_get_data(_pvid_hash, n)))
+			continue;

-	return 1;
+		if (!(info->status & CACHE_INVALID))
+			continue;
+
+		if (!(devl = dm_pool_zalloc(cmd->mem, sizeof(*devl))))
+			return_0;
+
+		devl->dev = info->dev;
+		dm_list_add(&devs, &devl->list);
+		dev_count++;
+	}
+
+	log_debug_cache("Scanning %d devs with invalid info.", dev_count);
+
+	ret = label_scan_devs(cmd, &devs);
+
+	return ret;
 }

 /*
@@ -1095,17 +1157,89 @@ next:
 	goto next;
 }

+/*
+ * The initial label_scan at the start of the command is done without
+ * holding VG locks.  Then for each VG identified during the label_scan,
+ * vg_read(vgname) is called while holding the VG lock.  The labels
+ * and metadata on this VG's devices could have changed between the
+ * initial unlocked label_scan and the current vg_read().  So, we reread
+ * the labels/metadata for each device in the VG now that we hold the
+ * lock, and use this for processing the VG.
+ *
+ * FIXME: In some cases, the data read by label_scan may be fine, and not
+ * need to be reread here. e.g. a reporting command, possibly with a
+ * special option, could skip this second reread.  Or, we could look
+ * at the VG seqno in each copy of the metadata read in the first label
+ * scan, and if they all match, consider it good enough to use for
+ * reporting without rereading it.  (A command modifying the VG would
+ * always want to reread while the lock is held before modifying.)
+ *
+ * A label scan is ultimately creating associations between devices
+ * and VGs so that when vg_read wants to get VG metadata, it knows
+ * which devices to read.  In the special case where VG metadata is
+ * stored in files on the file system (configured in lvm.conf), the
+ * vginfo->independent_metadata_location flag is set during label scan.
+ * When we get here to rescan, we are revalidating the device to VG
+ * mapping from label scan by repeating the label scan on a subset of
+ * devices.  If we see independent_metadata_location is set from the
+ * initial label scan, we know that there is nothing to do because
+ * there is no device to VG mapping to revalidate, since the VG metadata
+ * comes directly from files.
+ */
+
+int lvmcache_label_rescan_vg(struct cmd_context *cmd, const char *vgname, const char *vgid)
+{
+	struct dm_list devs;
+	struct device_list *devl;
+	struct lvmcache_vginfo *vginfo;
+	struct lvmcache_info *info;
+
+	if (lvmetad_used())
+		return 1;
+
+	dm_list_init(&devs);
+
+	if (!(vginfo = lvmcache_vginfo_from_vgname(vgname, vgid)))
+		return_0;
+
+	/*
+	 * When the VG metadata is from an independent location,
+	 * then rescanning the devices in the VG won't find the
+	 * metadata, and will destroy the vginfo/info associations
+	 * that were created during label scan when the
+	 * independent locations were read.
+	 */
+	if (vginfo->independent_metadata_location)
+		return 1;
+
+	dm_list_iterate_items(info, &vginfo->infos) {
+		if (!(devl = dm_malloc(sizeof(*devl)))) {
+			log_error("device_list element allocation failed");
+			return 0;
+		}
+		devl->dev = info->dev;
+		dm_list_add(&devs, &devl->list);
+	}
+
+	label_scan_devs(cmd, &devs);
+
+	/*
+	 * TODO: grab vginfo again, and compare vginfo->infos
+	 * to what was found above before rereading labels.
+	 * If there are any info->devs now that were not in the
+	 * first devs list, then do label_read on those also.
+	 */
+
+	return 1;
+}
+
 int lvmcache_label_scan(struct cmd_context *cmd)
 {
 	struct dm_list del_cache_devs;
 	struct dm_list add_cache_devs;
 	struct lvmcache_info *info;
 	struct device_list *devl;
-	struct label *label;
-	struct dev_iter *iter;
-	struct device *dev;
 	struct format_type *fmt;
-	int dev_count = 0;

 	int r = 0;

@@ -1123,34 +1257,40 @@ int lvmcache_label_scan(struct cmd_context *cmd)
 		goto out;
 	}

+	/*
+	 * Scan devices whose info struct has the INVALID flag set.
+	 * When scanning has read the pv_header, mda_header and
+	 * mda locations, it will clear the INVALID flag (via
+	 * lvmcache_make_valid).
+	 */
 	if (_has_scanned && !_force_label_scan) {
-		r = _scan_invalid();
+		r = _label_scan_invalid(cmd);
 		goto out;
 	}

 	if (_force_label_scan && (cmd->full_filter && !cmd->full_filter->use_count) && !refresh_filters(cmd))
 		goto_out;

-	if (!cmd->full_filter || !(iter = dev_iter_create(cmd->full_filter, _force_label_scan))) {
-		log_error("dev_iter creation failed");
+	if (!cmd->full_filter) {
+		log_error("label scan is missing full filter");
 		goto out;
 	}

-	log_very_verbose("Scanning device labels");
-
 	/*
 	 * Duplicates found during this label scan are added to _found_duplicate_devs().
 	 */
 	_destroy_duplicate_device_list(&_found_duplicate_devs);

-	while ((dev = dev_iter_get(iter))) {
-		(void) label_read(dev, &label, UINT64_C(0));
-		dev_count++;
-	}
-
-	dev_iter_destroy(iter);
-
-	log_very_verbose("Scanned %d device labels", dev_count);
+	/*
+	 * Do the actual scanning.  This populates lvmcache
+	 * with infos/vginfos based on reading headers from
+	 * each device, and a vg summary from each mda.
+	 *
+	 * Note that this will *skip* scanning a device if
+	 * an info struct already exists in lvmcache for
+	 * the device.
+	 */
+	label_scan(cmd);

 	/*
 	 * _choose_preferred_devs() returns:
@@ -1184,7 +1324,7 @@ int lvmcache_label_scan(struct cmd_context *cmd)

 		dm_list_iterate_items(devl, &add_cache_devs) {
 			log_debug_cache("Rescan preferred device %s for lvmcache", dev_name(devl->dev));
-			(void) label_read(devl->dev, &label, UINT64_C(0));
+			label_read(devl->dev, NULL, UINT64_C(0));
 		}

 		dm_list_splice(&_unused_duplicate_devs, &del_cache_devs);
@@ -1216,129 +1356,12 @@ int lvmcache_label_scan(struct cmd_context *cmd)
 	return r;
 }

-struct volume_group *lvmcache_get_vg(struct cmd_context *cmd, const char *vgname,
-				     const char *vgid, unsigned precommitted)
-{
-	struct lvmcache_vginfo *vginfo;
-	struct volume_group *vg = NULL;
-	struct format_instance *fid;
-	struct format_instance_ctx fic;
-
-	/*
-	 * We currently do not store precommitted metadata in lvmetad at
-	 * all. This means that any request for precommitted metadata is served
-	 * using the classic scanning mechanics, and read from disk or from
-	 * lvmcache.
-	 */
-	if (lvmetad_used() && !precommitted) {
-		/* Still serve the locally cached VG if available */
-		if (vgid && (vginfo = lvmcache_vginfo_from_vgid(vgid)) &&
-		    vginfo->vgmetadata && (vg = vginfo->cached_vg))
-			goto out;
-		return lvmetad_vg_lookup(cmd, vgname, vgid);
-	}
-
-	if (!vgid || !(vginfo = lvmcache_vginfo_from_vgid(vgid)) || !vginfo->vgmetadata)
-		return NULL;
-
-	if (!_vginfo_is_valid(vginfo))
-		return NULL;
-
-	/*
-	 * Don't return cached data if either:
-	 * (i)  precommitted metadata is requested but we don't have it cached
-	 *      - caller should read it off disk;
-	 * (ii) live metadata is requested but we have precommitted metadata cached
-	 *      and no devices are suspended so caller may read it off disk.
-	 *
-	 * If live metadata is requested but we have precommitted metadata cached
-	 * and devices are suspended, we assume this precommitted metadata has
-	 * already been preloaded and committed so it's OK to return it as live.
-	 * Note that we do not clear the PRECOMMITTED flag.
-	 */
-	if ((precommitted && !vginfo->precommitted) ||
-	    (!precommitted && vginfo->precommitted && !critical_section()))
-		return NULL;
-
-	/* Use already-cached VG struct when available */
-	if ((vg = vginfo->cached_vg) && !vginfo->cached_vg_invalidated)
-		goto out;
-
-	release_vg(vginfo->cached_vg);
-
-	fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
-	fic.context.vg_ref.vg_name = vginfo->vgname;
-	fic.context.vg_ref.vg_id = vgid;
-	if (!(fid = vginfo->fmt->ops->create_instance(vginfo->fmt, &fic)))
-		return_NULL;
-
-	/* Build config tree from vgmetadata, if not yet cached */
-	if (!vginfo->cft &&
-	    !(vginfo->cft =
-	      config_tree_from_string_without_dup_node_check(vginfo->vgmetadata)))
-		goto_bad;
-
-	if (!(vg = import_vg_from_config_tree(vginfo->cft, fid)))
-		goto_bad;
-
-	/* Cache VG struct for reuse */
-	vginfo->cached_vg = vg;
-	vginfo->holders = 1;
-	vginfo->vg_use_count = 0;
-	vginfo->cached_vg_invalidated = 0;
-	vg->vginfo = vginfo;
-
-	if (!dm_pool_lock(vg->vgmem, detect_internal_vg_cache_corruption()))
-		goto_bad;
-
-out:
-	vginfo->holders++;
-	vginfo->vg_use_count++;
-	log_debug_cache("Using cached %smetadata for VG %s with %u holder(s).",
-			vginfo->precommitted ? "pre-committed " : "",
-			vginfo->vgname, vginfo->holders);
-
-	return vg;
-
-bad:
-	_free_cached_vgmetadata(vginfo);
-	return NULL;
-}
-
-// #if 0
-int lvmcache_vginfo_holders_dec_and_test_for_zero(struct lvmcache_vginfo *vginfo)
-{
-	log_debug_cache("VG %s decrementing %d holder(s) at %p.",
-			vginfo->cached_vg->name, vginfo->holders, vginfo->cached_vg);
-
-	if (--vginfo->holders)
-		return 0;
-
-	if (vginfo->vg_use_count > 1)
-		log_debug_cache("VG %s reused %d times.",
-				vginfo->cached_vg->name, vginfo->vg_use_count);
-
-	/* Debug perform crc check only when it's been used more then once */
-	if (!dm_pool_unlock(vginfo->cached_vg->vgmem,
-			    detect_internal_vg_cache_corruption() &&
-			    (vginfo->vg_use_count > 1)))
-		stack;
-
-	vginfo->cached_vg->vginfo = NULL;
-	vginfo->cached_vg = NULL;
-
-	return 1;
-}
-// #endif
-
 int lvmcache_get_vgnameids(struct cmd_context *cmd, int include_internal,
 			   struct dm_list *vgnameids)
 {
 	struct vgnameid_list *vgnl;
 	struct lvmcache_vginfo *vginfo;

-	lvmcache_label_scan(cmd);
-
 	dm_list_iterate_items(vginfo, &_vginfos) {
 		if (!include_internal && is_orphan_vg(vginfo->vgname))
 			continue;
@@ -1443,61 +1466,45 @@ struct dm_list *lvmcache_get_pvids(struct cmd_context *cmd, const char *vgname,
 	return pvids;
 }

-static struct device *_device_from_pvid(const struct id *pvid,
-					uint64_t *label_sector)
+int lvmcache_get_vg_devs(struct cmd_context *cmd,
+			 struct lvmcache_vginfo *vginfo,
+			 struct dm_list *devs)
+{
+	struct lvmcache_info *info;
+	struct device_list *devl;
+
+	dm_list_iterate_items(info, &vginfo->infos) {
+		if (!(devl = dm_pool_zalloc(cmd->mem, sizeof(*devl))))
+			return_0;
+
+		devl->dev = info->dev;
+		dm_list_add(devs, &devl->list);
+	}
+	return 1;
+}
+
+static struct device *_device_from_pvid(const struct id *pvid, uint64_t *label_sector)
 {
 	struct lvmcache_info *info;
-	struct label *label;

 	if ((info = lvmcache_info_from_pvid((const char *) pvid, NULL, 0))) {
-		if (lvmetad_used()) {
-			if (info->label && label_sector)
-				*label_sector = info->label->sector;
-			return info->dev;
-		}
-
-		if (label_read(info->dev, &label, UINT64_C(0))) {
-			info = (struct lvmcache_info *) label->info;
-			if (id_equal(pvid, (struct id *) &info->dev->pvid)) {
-				if (label_sector)
-					*label_sector = label->sector;
-				return info->dev;
-                        }
-		}
+		if (info->label && label_sector)
+			*label_sector = info->label->sector;
+		return info->dev;
 	}
+
 	return NULL;
 }

-struct device *lvmcache_device_from_pvid(struct cmd_context *cmd, const struct id *pvid,
-				unsigned *scan_done_once, uint64_t *label_sector)
+struct device *lvmcache_device_from_pvid(struct cmd_context *cmd, const struct id *pvid, uint64_t *label_sector)
 {
 	struct device *dev;

-	/* Already cached ? */
-	dev = _device_from_pvid(pvid, label_sector);
-	if (dev)
-		return dev;
-
-	lvmcache_label_scan(cmd);
-
-	/* Try again */
-	dev = _device_from_pvid(pvid, label_sector);
-	if (dev)
-		return dev;
-
-	if (critical_section() || (scan_done_once && *scan_done_once))
-		return NULL;
-
-	lvmcache_force_next_label_scan();
-	lvmcache_label_scan(cmd);
-	if (scan_done_once)
-		*scan_done_once = 1;
-
-	/* Try again */
 	dev = _device_from_pvid(pvid, label_sector);
 	if (dev)
 		return dev;

+	log_debug_devs("No device with uuid %s.", (const char *)pvid);
 	return NULL;
 }

@@ -1505,7 +1512,6 @@ const char *lvmcache_pvid_from_devname(struct cmd_context *cmd,
 				       const char *devname)
 {
 	struct device *dev;
-	struct label *label;

 	if (!(dev = dev_cache_get(devname, cmd->filter))) {
 		log_error("%s: Couldn't find device.  Check your filters?",
@@ -1513,7 +1519,7 @@ const char *lvmcache_pvid_from_devname(struct cmd_context *cmd,
 		return NULL;
 	}

-	if (!(label_read(dev, &label, UINT64_C(0))))
+	if (!label_read(dev, NULL, UINT64_C(0)))
 		return NULL;

 	return dev->pvid;
@@ -1535,8 +1541,6 @@ static int _free_vginfo(struct lvmcache_vginfo *vginfo)
 	struct lvmcache_vginfo *primary_vginfo, *vginfo2;
 	int r = 1;

-	_free_cached_vgmetadata(vginfo);
-
 	vginfo2 = primary_vginfo = lvmcache_vginfo_from_vgname(vginfo->vgname, NULL);

 	if (vginfo == primary_vginfo) {
@@ -1559,6 +1563,7 @@ static int _free_vginfo(struct lvmcache_vginfo *vginfo)
 	dm_free(vginfo->system_id);
 	dm_free(vginfo->vgname);
 	dm_free(vginfo->creation_host);
+	_suspended_vg_free(vginfo, 1, 1);

 	if (*vginfo->vgid && _vgid_hash &&
 	    lvmcache_vginfo_from_vgid(vginfo->vgid) == vginfo)
@@ -1997,12 +2002,6 @@ int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vg
 	    !is_orphan_vg(info->vginfo->vgname) && critical_section())
 		return 1;

-	/* If making a PV into an orphan, any cached VG metadata may become
-	 * invalid, incorrectly still referencing device structs.
-	 * (Example: pvcreate -ff) */
-	if (is_orphan_vg(vgname) && info->vginfo && !is_orphan_vg(info->vginfo->vgname))
-		info->vginfo->cached_vg_invalidated = 1;
-
 	/* If moving PV from orphan to real VG, always mark it valid */
 	if (!is_orphan_vg(vgname))
 		info->status &= ~CACHE_INVALID;
@@ -2040,10 +2039,6 @@ int lvmcache_update_vg(struct volume_group *vg, unsigned precommitted)
 			return_0;
 	}

-	/* store text representation of vg to cache */
-	if (vg->cmd->current_settings.cache_vgmetadata)
-		_store_metadata(vg, precommitted);
-
 	return 1;
 }

@@ -2377,56 +2372,29 @@ int lvmcache_fid_add_mdas_vg(struct lvmcache_vginfo *vginfo, struct format_insta
 	return 1;
 }

-static int _get_pv_if_in_vg(struct lvmcache_info *info,
-			    struct physical_volume *pv)
-{
-	char vgname[NAME_LEN + 1];
-	char vgid[ID_LEN + 1];
-
-	if (info->vginfo && info->vginfo->vgname &&
-	    !is_orphan_vg(info->vginfo->vgname)) {
-		/*
-		 * get_pv_from_vg_by_id() may call
-		 * lvmcache_label_scan() and drop cached
-		 * vginfo so make a local copy of string.
-		 */
-		(void) dm_strncpy(vgname, info->vginfo->vgname, sizeof(vgname));
-		memcpy(vgid, info->vginfo->vgid, sizeof(vgid));
-
-		if (get_pv_from_vg_by_id(info->fmt, vgname, vgid,
-					 info->dev->pvid, pv))
-			return 1;
-	}
-
-	return 0;
-}
-
 int lvmcache_populate_pv_fields(struct lvmcache_info *info,
-				struct physical_volume *pv,
-				int scan_label_only)
+				struct volume_group *vg,
+				struct physical_volume *pv)
 {
 	struct data_area_list *da;
-
-	/* Have we already cached vgname? */
-	if (!scan_label_only && _get_pv_if_in_vg(info, pv))
-		return 1;
-
-	/* Perform full scan (just the first time) and try again */
-	if (!scan_label_only && !critical_section() && !full_scan_done()) {
-		lvmcache_force_next_label_scan();
-		lvmcache_label_scan(info->fmt->cmd);
-
-		if (_get_pv_if_in_vg(info, pv))
-			return 1;
+	
+	if (!info->label) {
+		log_error("No cached label for orphan PV %s", pv_dev_name(pv));
+		return 0;
 	}

-	/* Orphan */
+	pv->label_sector = info->label->sector;
 	pv->dev = info->dev;
 	pv->fmt = info->fmt;
 	pv->size = info->device_size >> SECTOR_SHIFT;
 	pv->vg_name = FMT_TEXT_ORPHAN_VG_NAME;
 	memcpy(&pv->id, &info->dev->pvid, sizeof(pv->id));

+	if (!pv->size) {
+		log_error("PV %s size is zero.", dev_name(info->dev));
+		return 0;
+	}
+
 	/* Currently only support exactly one data area */
 	if (dm_list_size(&info->das) != 1) {
 		log_error("Must be exactly one data area (found %d) on PV %s",
@@ -2607,6 +2575,10 @@ struct label *lvmcache_get_label(struct lvmcache_info *info) {
 	return info->label;
 }

+/*
+ * After label_scan reads pv_header, mda_header and mda locations
+ * from a PV, it clears the INVALID flag.
+ */
 void lvmcache_make_valid(struct lvmcache_info *info) {
 	info->status &= ~CACHE_INVALID;
 }
@@ -2662,6 +2634,14 @@ int lvmcache_vgid_is_cached(const char *vgid) {
 	return 1;
 }

+void lvmcache_set_independent_location(const char *vgname)
+{
+	struct lvmcache_vginfo *vginfo;
+
+	if ((vginfo = lvmcache_vginfo_from_vgname(vgname, NULL)))
+		vginfo->independent_metadata_location = 1;
+}
+
 /*
 * Return true iff it is impossible to find out from this info alone whether the
 * PV in question is or is not an orphan.
--- a/lib/cache/lvmcache.h
+++ b/lib/cache/lvmcache.h
@@ -74,6 +74,7 @@ void lvmcache_destroy(struct cmd_context *cmd, int retain_orphans, int reset);
 */
 void lvmcache_force_next_label_scan(void);
 int lvmcache_label_scan(struct cmd_context *cmd);
+int lvmcache_label_rescan_vg(struct cmd_context *cmd, const char *vgname, const char *vgid);

 /* Add/delete a device */
 struct lvmcache_info *lvmcache_add(struct labeller *labeller, const char *pvid,
@@ -105,10 +106,8 @@ struct lvmcache_vginfo *lvmcache_vginfo_from_vgid(const char *vgid);
 struct lvmcache_info *lvmcache_info_from_pvid(const char *pvid, struct device *dev, int valid_only);
 const char *lvmcache_vgname_from_vgid(struct dm_pool *mem, const char *vgid);
 const char *lvmcache_vgid_from_vgname(struct cmd_context *cmd, const char *vgname);
-struct device *lvmcache_device_from_pvid(struct cmd_context *cmd, const struct id *pvid,
-				unsigned *scan_done_once, uint64_t *label_sector);
-const char *lvmcache_pvid_from_devname(struct cmd_context *cmd,
-				       const char *devname);
+struct device *lvmcache_device_from_pvid(struct cmd_context *cmd, const struct id *pvid, uint64_t *label_sector);
+const char *lvmcache_pvid_from_devname(struct cmd_context *cmd, const char *devname);
 char *lvmcache_vgname_from_pvid(struct cmd_context *cmd, const char *pvid);
 const char *lvmcache_vgname_from_info(struct lvmcache_info *info);
 const struct format_type *lvmcache_fmt_from_info(struct lvmcache_info *info);
@@ -134,9 +133,6 @@ int lvmcache_get_vgnameids(struct cmd_context *cmd, int include_internal,
 struct dm_list *lvmcache_get_pvids(struct cmd_context *cmd, const char *vgname,
 				const char *vgid);

-/* Returns cached volume group metadata. */
-struct volume_group *lvmcache_get_vg(struct cmd_context *cmd, const char *vgname,
-				     const char *vgid, unsigned precommitted);
 void lvmcache_drop_metadata(const char *vgname, int drop_precommitted);
 void lvmcache_commit_metadata(const char *vgname);

@@ -146,8 +142,8 @@ int lvmcache_fid_add_mdas(struct lvmcache_info *info, struct format_instance *fi
 int lvmcache_fid_add_mdas_pv(struct lvmcache_info *info, struct format_instance *fid);
 int lvmcache_fid_add_mdas_vg(struct lvmcache_vginfo *vginfo, struct format_instance *fid);
 int lvmcache_populate_pv_fields(struct lvmcache_info *info,
-				struct physical_volume *pv,
-				int scan_label_only);
+				struct volume_group *vg,
+				struct physical_volume *pv);
 int lvmcache_check_format(struct lvmcache_info *info, const struct format_type *fmt);
 void lvmcache_del_mdas(struct lvmcache_info *info);
 void lvmcache_del_das(struct lvmcache_info *info);
@@ -215,4 +211,12 @@ void lvmcache_remove_unchosen_duplicate(struct device *dev);

 int lvmcache_pvid_in_unchosen_duplicates(const char *pvid);

+void lvmcache_save_suspended_vg(struct volume_group *vg, int precommitted);
+struct volume_group *lvmcache_get_suspended_vg(const char *vgid);
+void lvmcache_drop_suspended_vg(struct volume_group *vg);
+
+int lvmcache_get_vg_devs(struct cmd_context *cmd,
+                         struct lvmcache_vginfo *vginfo,
+                         struct dm_list *devs);
+void lvmcache_set_independent_location(const char *vgname);
 #endif
--- a/lib/cache/lvmetad.c
+++ b/lib/cache/lvmetad.c
@@ -39,7 +39,7 @@ static int64_t _lvmetad_update_timeout;

 static int _found_lvm1_metadata = 0;

-static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg);
+static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg, const char *vgid, struct format_type *fmt);

 static uint64_t _monotonic_seconds(void)
 {
@@ -1090,14 +1090,17 @@ struct volume_group *lvmetad_vg_lookup(struct cmd_context *cmd, const char *vgna
 		 * invalidated the cached vg.
 		 */
 		if (rescan) {
-			if (!(vg2 = _lvmetad_pvscan_vg(cmd, vg))) {
+			if (!(vg2 = _lvmetad_pvscan_vg(cmd, vg, vgid, fmt))) {
 				log_debug_lvmetad("VG %s from lvmetad not found during rescan.", vgname);
 				fid = NULL;
 				release_vg(vg);
 				vg = NULL;
 				goto out;
 			}
+			fid->ref_count++;
 			release_vg(vg);
+			fid->ref_count--;
+			fmt->ops->destroy_instance(fid);
 			vg = vg2;
 			fid = vg2->fid;
 		}
@@ -1105,14 +1108,14 @@ struct volume_group *lvmetad_vg_lookup(struct cmd_context *cmd, const char *vgna
 		dm_list_iterate_items(pvl, &vg->pvs) {
 			if (!_pv_update_struct_pv(pvl->pv, fid)) {
 				vg = NULL;
-				goto_out;	/* FIXME error path */
+				goto_out;	/* FIXME: use an error path that disables lvmetad */
 			}
 		}

 		dm_list_iterate_items(pvl, &vg->pvs_outdated) {
 			if (!_pv_update_struct_pv(pvl->pv, fid)) {
 				vg = NULL;
-				goto_out;	/* FIXME error path */
+				goto_out;	/* FIXME: use an error path that disables lvmetad */
 			}
 		}

@@ -1756,6 +1759,7 @@ int lvmetad_pv_gone_by_dev(struct device *dev)
 */

 struct _lvmetad_pvscan_baton {
+	struct cmd_context *cmd;
 	struct volume_group *vg;
 	struct format_instance *fid;
 };
@@ -1763,10 +1767,14 @@ struct _lvmetad_pvscan_baton {
 static int _lvmetad_pvscan_single(struct metadata_area *mda, void *baton)
 {
 	struct _lvmetad_pvscan_baton *b = baton;
+	struct device *mda_dev = mda_get_device(mda);
+	struct label_read_data *ld;
 	struct volume_group *vg;

+	ld = get_label_read_data(b->cmd, mda_dev);
+
 	if (mda_is_ignored(mda) ||
-	    !(vg = mda->ops->vg_read(b->fid, "", mda, NULL, NULL, 1)))
+	    !(vg = mda->ops->vg_read(b->fid, "", mda, ld, NULL, NULL)))
 		return 1;

 	/* FIXME Also ensure contents match etc. */
@@ -1778,6 +1786,37 @@ static int _lvmetad_pvscan_single(struct metadata_area *mda, void *baton)
 	return 1;
 }

+/*
+ * FIXME: handle errors and do proper comparison of metadata from each area
+ * like vg_read and fall back to real vg_read from disk if there's any problem.
+ */
+
+static int _lvmetad_pvscan_vg_single(struct metadata_area *mda, void *baton)
+{
+	struct _lvmetad_pvscan_baton *b = baton;
+	struct device *mda_dev = mda_get_device(mda);
+	struct label_read_data *ld;
+	struct volume_group *vg = NULL;
+
+	if (mda_is_ignored(mda))
+		return 1;
+
+	ld = get_label_read_data(b->cmd, mda_dev);
+
+	if (!(vg = mda->ops->vg_read(b->fid, "", mda, ld, NULL, NULL)))
+		return 1;
+
+	if (!b->vg)
+		b->vg = vg;
+	else if (vg->seqno > b->vg->seqno) {
+		release_vg(b->vg);
+		b->vg = vg;
+	} else
+		release_vg(vg);
+
+	return 1;
+}
+
 /*
 * The lock manager may detect that the vg cached in lvmetad is out of date,
 * due to something like an lvcreate from another host.
@@ -1787,41 +1826,41 @@ static int _lvmetad_pvscan_single(struct metadata_area *mda, void *baton)
 * the VG, and that PV may have been reused for another VG.
 */

-static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg)
+static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg,
+					      const char *vgid, struct format_type *fmt)
 {
 	char pvid_s[ID_LEN + 1] __attribute__((aligned(8)));
 	char uuid[64] __attribute__((aligned(8)));
-	struct label *label;
-	struct volume_group *vg_ret = NULL;
-	struct dm_config_tree *vgmeta_ret = NULL;
 	struct dm_config_tree *vgmeta;
 	struct pv_list *pvl, *pvl_new;
-	struct device_list *devl, *devl_new, *devlsafe;
+	struct device_list *devl, *devlsafe;
 	struct dm_list pvs_scan;
 	struct dm_list pvs_drop;
-	struct dm_list pvs_new;
+	struct lvmcache_vginfo *vginfo = NULL;
 	struct lvmcache_info *info = NULL;
 	struct format_instance *fid;
 	struct format_instance_ctx fic = { .type = 0 };
 	struct _lvmetad_pvscan_baton baton;
+	struct volume_group *save_vg;
+	struct dm_config_tree *save_meta;
 	struct device *save_dev = NULL;
 	uint32_t save_seqno = 0;
-	int missing_devs = 0;
-	int check_new_pvs = 0;
+	int found_new_pvs = 0;
+	int retried_reads = 0;
 	int found;

+	save_vg = NULL;
+	save_meta = NULL;
+	save_dev = NULL;
+	save_seqno = 0;
+
 	dm_list_init(&pvs_scan);
 	dm_list_init(&pvs_drop);
-	dm_list_init(&pvs_new);

-	log_debug_lvmetad("Rescanning VG %s (seqno %u).", vg->name, vg->seqno);
+	log_debug_lvmetad("Rescan VG %s to update lvmetad (seqno %u).", vg->name, vg->seqno);

 	/*
-	 * Another host may have added a PV to the VG, and some
-	 * commands do not always populate their lvmcache with
-	 * all devs from lvmetad, so they would fail to find
-	 * the new PV when scanning the VG.  So make sure this
-	 * command knows about all PVs from lvmetad.
+	 * Make sure this command knows about all PVs from lvmetad.
 	 */
 	lvmcache_seed_infos_from_lvmetad(cmd);

@@ -1836,54 +1875,111 @@ static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct v
 		dm_list_add(&pvs_scan, &devl->list);
 	}

-scan_more:
+	/*
+	 * Rescan labels/metadata only from devs that we previously
+	 * saw in the VG.  If we find below that there are new PVs
+	 * in the VG, we'll have to rescan all devices to find which
+	 * device(s) are now being used.
+	 */
+	log_debug_lvmetad("Rescan VG %s scanning data from devs in previous metadata.", vg->name);
+
+	label_scan_devs(cmd, &pvs_scan);

 	/*
-	 * Run the equivalent of lvmetad_pvscan_single on each dev in the VG.
+	 * Check if any pvs_scan entries are no longer PVs.
+	 * In that case, label_read/_find_label_header will have
+	 * found no label_header, and would have dropped the
+	 * info struct for the device from lvmcache.  So, if
+	 * we look up the info struct here and don't find it,
+	 * we can infer it's no longer a PV.
+	 *
+	 * FIXME: we should record specific results from the
+	 * label_read and then check specifically for whatever
+	 * result means "no label was found", rather than going
+	 * about this indirectly via the lvmcache side effects.
+	 */
+	dm_list_iterate_items_safe(devl, devlsafe, &pvs_scan) {
+		if (!(info = lvmcache_info_from_pvid(devl->dev->pvid, devl->dev, 0))) {
+			/* Another host removed this PV from the VG. */
+			log_debug_lvmetad("Rescan VG %s from %s dropping dev (no label).",
+					  vg->name, dev_name(devl->dev));
+			dm_list_move(&pvs_drop, &devl->list);
+		}
+	}
+
+	fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
+	fic.context.vg_ref.vg_name = vg->name;
+	fic.context.vg_ref.vg_id = vgid;
+
+ retry_reads:
+
+	if (!(fid = fmt->ops->create_instance(fmt, &fic))) {
+		/* FIXME: are there only internal reasons for failures here? */
+		log_error("Reading VG %s failed to create format instance.", vg->name);
+		return NULL;
+	}
+
+	/* FIXME: not sure if this is necessary */
+	fid->ref_count++;
+
+	baton.fid = fid;
+	baton.cmd = cmd;
+
+	/*
+	 * FIXME: this vg_read path does not have the ability to repair
+	 * any problems with the VG, e.g. VG on one dev has an older
+	 * seqno.  When vg_read() is reworked, we need to fall back
+	 * to using that from here (and vg_read's from lvmetad) when
+	 * there is a problem.  Perhaps by disabling lvmetad when a
+	 * VG problem is detected, causing commands to fully fall
+	 * back to disk, which will repair the VG.  Then lvmetad can
+	 * be repopulated and re-enabled (possibly automatically.)
+	 */
+
+	/*
+	 * Do a low level vg_read on each dev, verify the vg returned
+	 * from metadata on each device is for the VG being read
+	 * (the PV may have been removed from the VG being read and
+	 * added to a different one), and return this vg to the caller
+	 * as the current vg to use.
+	 *
+	 * The label scan above will have saved in lvmcache which
+	 * vg each device is used in, so we could figure that part
+	 * out without doing the vg_read.
 	 */
 	dm_list_iterate_items_safe(devl, devlsafe, &pvs_scan) {
 		if (!devl->dev)
 			continue;

-		log_debug_lvmetad("Rescan VG %s scanning %s.", vg->name, dev_name(devl->dev));
-
-		if (!label_read(devl->dev, &label, 0)) {
-			/* Another host removed this PV from the VG. */
-			log_debug_lvmetad("Rescan VG %s found %s was removed.", vg->name, dev_name(devl->dev));
-
-			if ((info = lvmcache_info_from_pvid(devl->dev->pvid, NULL, 0)))
-				lvmcache_del(info);
+		log_debug_lvmetad("Rescan VG %s getting metadata from %s.",
+				  vg->name, dev_name(devl->dev));

+		/*
+		 * The info struct for this dev knows what and where
+		 * the mdas are for this dev (the label scan saved
+		 * the mda locations for this dev on the lvmcache info struct).
+		 */
+		if (!(info = lvmcache_info_from_pvid(devl->dev->pvid, devl->dev, 0))) {
+			log_debug_lvmetad("Rescan VG %s from %s dropping dev (no info).",
+					  vg->name, dev_name(devl->dev));
 			dm_list_move(&pvs_drop, &devl->list);
 			continue;
 		}

-		info = (struct lvmcache_info *) label->info;
-
 		baton.vg = NULL;
-		baton.fid = lvmcache_fmt(info)->ops->create_instance(lvmcache_fmt(info), &fic);
-		if (!baton.fid)
-			return_NULL;
-
-		if (baton.fid->fmt->features & FMT_OBSOLETE) {
-			log_debug_lvmetad("Ignoring obsolete format on PV %s in VG %s.", dev_name(devl->dev), vg->name);
-			lvmcache_fmt(info)->ops->destroy_instance(baton.fid);
-			dm_list_move(&pvs_drop, &devl->list);
-			continue;
-		}

 		/*
 		 * Read VG metadata from this dev's mdas.
 		 */
-		lvmcache_foreach_mda(info, _lvmetad_pvscan_single, &baton);
+		lvmcache_foreach_mda(info, _lvmetad_pvscan_vg_single, &baton);

 		/*
 		 * The PV may have been removed from the VG by another host
 		 * since we last read the VG.
 		 */
 		if (!baton.vg) {
-			log_debug_lvmetad("Rescan VG %s did not find %s.", vg->name, dev_name(devl->dev));
-			lvmcache_fmt(info)->ops->destroy_instance(baton.fid);
+			log_debug_lvmetad("Rescan VG %s from %s dropping dev (no metadata).",
+					  vg->name, dev_name(devl->dev));
 			dm_list_move(&pvs_drop, &devl->list);
 			continue;
 		}
@@ -1893,10 +1989,15 @@ scan_more:
 		 * different VG since we last read the VG.
 		 */
 		if (strcmp(baton.vg->name, vg->name)) {
-			log_debug_lvmetad("Rescan VG %s found different VG %s on PV %s.",
-					  vg->name, baton.vg->name, dev_name(devl->dev));
+			log_debug_lvmetad("Rescan VG %s from %s dropping dev (other VG %s).",
+					  vg->name, dev_name(devl->dev), baton.vg->name);
+			release_vg(baton.vg);
+			continue;
+		}
+
+		if (!(vgmeta = export_vg_to_config_tree(baton.vg))) {
+			log_error("VG export to config tree failed");
 			release_vg(baton.vg);
-			dm_list_move(&pvs_drop, &devl->list);
 			continue;
 		}

@@ -1906,20 +2007,35 @@ scan_more:
 		 * read from each other dev.
 		 */

-		if (!save_seqno)
-			save_seqno = baton.vg->seqno;
+		if (save_vg && (save_seqno != baton.vg->seqno)) {
+			/* FIXME: fall back to vg_read to correct this. */
+			log_warn("WARNING: inconsistent metadata for VG %s on devices %s seqno %u and %s seqno %u.",
+				 vg->name, dev_name(save_dev), save_seqno,
+				 dev_name(devl->dev), baton.vg->seqno);
+			log_warn("WARNING: temporarily disable lvmetad to repair metadata.");

-		if (!(vgmeta = export_vg_to_config_tree(baton.vg))) {
-			log_error("VG export to config tree failed");
-			release_vg(baton.vg);
-			return NULL;
+			/* Use the most recent */
+			if (save_seqno < baton.vg->seqno) {
+				release_vg(save_vg);
+				dm_config_destroy(save_meta);
+				save_vg = baton.vg;
+				save_meta = vgmeta;
+				save_seqno = baton.vg->seqno;
+				save_dev = devl->dev;
+			} else {
+				release_vg(baton.vg);
+				dm_config_destroy(vgmeta);
+			}
+			continue;
 		}

-		if (!vgmeta_ret) {
-			vgmeta_ret = vgmeta;
+		if (!save_vg) {
+			save_vg = baton.vg;
+			save_meta = vgmeta;
+			save_seqno = baton.vg->seqno;
 			save_dev = devl->dev;
 		} else {
-			struct dm_config_node *meta1 = vgmeta_ret->root;
+			struct dm_config_node *meta1 = save_meta->root;
 			struct dm_config_node *meta2 = vgmeta->root;
 			struct dm_config_node *sib1 = meta1->sib;
 			struct dm_config_node *sib2 = meta2->sib;
@@ -1944,73 +2060,128 @@ scan_more:
 			meta2->sib = NULL;

 			if (compare_config(meta1, meta2)) {
+				/* FIXME: fall back to vg_read to correct this. */
+				log_warn("WARNING: inconsistent metadata for VG %s on devices %s seqno %u and %s seqno %u.",
+					 vg->name, dev_name(save_dev), save_seqno,
+					 dev_name(devl->dev), baton.vg->seqno);
+				log_warn("WARNING: temporarily disable lvmetad to repair metadata.");
 				log_error("VG %s metadata comparison failed for device %s vs %s",
 					  vg->name, dev_name(devl->dev), save_dev ? dev_name(save_dev) : "none");
-				_log_debug_inequality(vg->name, vgmeta_ret->root, vgmeta->root);
+				_log_debug_inequality(vg->name, save_meta->root, vgmeta->root);

 				meta1->sib = sib1;
 				meta2->sib = sib2;
-				dm_config_destroy(vgmeta);
-				dm_config_destroy(vgmeta_ret);
+
+				/* no right choice, just use the previous copy */
 				release_vg(baton.vg);
-				return NULL;
+				dm_config_destroy(vgmeta);
 			}
 			meta1->sib = sib1;
 			meta2->sib = sib2;
+			release_vg(baton.vg);
 			dm_config_destroy(vgmeta);
 		}
+	}

-		/*
-		 * Look for any new PVs in the VG metadata that were not in our
-		 * previous version of the VG.  Add them to pvs_new to be
-		 * scanned in this loop just like the old PVs.
-		 */
-		if (!check_new_pvs) {
-			check_new_pvs = 1;
-			dm_list_iterate_items(pvl_new, &baton.vg->pvs) {
-				found = 0;
-				dm_list_iterate_items(pvl, &vg->pvs) {
-					if (pvl_new->pv->dev != pvl->pv->dev)
-						continue;
-					found = 1;
-					break;
-				}
-				if (found)
+	/* FIXME: see above */
+	fid->ref_count--;
+
+	/*
+	 * Look for any new PVs in the VG metadata that were not in our
+	 * previous version of the VG.
+	 *
+	 * (Don't look for new PVs after a rescan and retry.)
+	 */
+	found_new_pvs = 0;
+
+	if (save_vg && !retried_reads) {
+		dm_list_iterate_items(pvl_new, &save_vg->pvs) {
+			found = 0;
+			dm_list_iterate_items(pvl, &vg->pvs) {
+				if (pvl_new->pv->dev != pvl->pv->dev)
 					continue;
-				if (!pvl_new->pv->dev) {
-					strncpy(pvid_s, (char *) &pvl_new->pv->id, sizeof(pvid_s) - 1);
-					if (!id_write_format((const struct id *)&pvid_s, uuid, sizeof(uuid)))
-						stack;
-					log_error("Device not found for PV %s in VG %s", uuid, vg->name);
-					missing_devs++;
-					continue;
-				}
-				if (!(devl_new = dm_pool_zalloc(cmd->mem, sizeof(*devl_new))))
-					return_NULL;
-				devl_new->dev = pvl_new->pv->dev;
-				dm_list_add(&pvs_new, &devl_new->list);
-				log_debug_lvmetad("Rescan VG %s found %s was added.", vg->name, dev_name(devl_new->dev));
+				found = 1;
+				break;
+			}
+
+			/*
+			 * PV in new VG metadata not found in old VG metadata.
+			 * There's a good chance we don't know about this new
+			 * PV or what device it's on; a label scan is needed
+			 * of all devices so we know which device the VG is
+			 * now using.
+			 */
+			if (!found) {
+				found_new_pvs++;
+				strncpy(pvid_s, (char *) &pvl_new->pv->id, sizeof(pvid_s) - 1);
+				if (!id_write_format((const struct id *)&pvid_s, uuid, sizeof(uuid)))
+					stack;
+				log_debug_lvmetad("Rescan VG %s found new PV %s.", vg->name, uuid);
 			}
 		}
+	}

-		release_vg(baton.vg);
+	if (!save_vg && retried_reads) {
+		log_error("VG %s not found after rescanning devices.", vg->name);
+		goto out;
 	}

 	/*
-	 * Do the same scanning above for any new PVs.
+	 * Do a full rescan of devices, then look up which devices the
+	 * scan found for this VG name, and select those devices to
+	 * read metadata from in the loop above (rather than the list
+	 * of devices we created from our last copy of the vg metadata.)
+	 *
+	 * Case 1: VG we knew is no longer on any of the devices we knew it
+	 * to be on (save_vg is NULL, which means the metadata wasn't found
+	 * when reading mdas on each of the initial pvs_scan devices).
+	 * Rescan all devs and then retry reading metadata from the devs that
+	 * the scan finds associated with this VG.
+	 *
+	 * Case 2: VG has new PVs but we don't know what devices they are
+	 * so rescan all devs and then retry reading metadata from the devs
+	 * that the scan finds associated with this VG.
+	 *
+	 * (N.B. after a retry, we don't check for found_new_pvs.)
 	 */
-	if (!dm_list_empty(&pvs_new)) {
-		dm_list_init(&pvs_scan);
-		dm_list_splice(&pvs_scan, &pvs_new);
-		dm_list_init(&pvs_new);
-		log_debug_lvmetad("Rescan VG %s found new PVs to scan.", vg->name);
-		goto scan_more;
-	}
+	if (!save_vg || found_new_pvs) {
+		if (!save_vg)
+			log_debug_lvmetad("Rescan VG %s did not find VG on previous devs.", vg->name);
+		if (found_new_pvs)
+			log_debug_lvmetad("Rescan VG %s scanning all devs to find new PVs.", vg->name);

-	if (missing_devs) {
-		if (vgmeta_ret)
-			dm_config_destroy(vgmeta_ret);
-		return_NULL;
+		label_scan_force(cmd);
+
+		if (!(vginfo = lvmcache_vginfo_from_vgname(vg->name, NULL))) {
+			log_error("VG %s vg info not found after rescanning devices.", vg->name);
+			goto out;
+		}
+
+		/*
+		 * Set pvs_scan to devs that the label scan found
+		 * in the VG and retry the metadata reading loop.
+		 */
+		dm_list_init(&pvs_scan);
+
+		if (!lvmcache_get_vg_devs(cmd, vginfo, &pvs_scan)) {
+			log_error("VG %s info devs not found after rescanning devices.", vg->name);
+			goto out;
+		}
+
+		log_debug_lvmetad("Rescan VG %s has %d PVs after label scan.",
+				  vg->name, dm_list_size(&pvs_scan));
+
+		if (save_vg)
+			release_vg(save_vg);
+		if (save_meta)
+			dm_config_destroy(save_meta);
+		save_vg = NULL;
+		save_meta = NULL;
+		save_dev = NULL;
+		save_seqno = 0;
+		found_new_pvs = 0;
+		retried_reads = 1;
+		goto retry_reads;
 	}

 	/*
@@ -2019,52 +2190,50 @@ scan_more:
 	dm_list_iterate_items(devl, &pvs_drop) {
 		if (!devl->dev)
 			continue;
-		log_debug_lvmetad("Rescan VG %s dropping %s.", vg->name, dev_name(devl->dev));
-		if (!lvmetad_pv_gone_by_dev(devl->dev))
-			return_NULL;
+		log_debug_lvmetad("Rescan VG %s removing %s from lvmetad.", vg->name, dev_name(devl->dev));
+		if (!lvmetad_pv_gone_by_dev(devl->dev)) {
+			/* FIXME: use an error path that disables lvmetad */
+			log_error("Failed to remove %s from lvmetad.", dev_name(devl->dev));
+		}
 	}

 	/*
-	 * Update the VG in lvmetad.
+	 * Update lvmetad with the newly read version of the VG.
+	 * When the seqno is unchanged the cached VG can be left.
 	 */
-	if (vgmeta_ret) {
-		fid = lvmcache_fmt(info)->ops->create_instance(lvmcache_fmt(info), &fic);
-		if (!(vg_ret = import_vg_from_config_tree(vgmeta_ret, fid))) {
-			log_error("VG import from config tree failed");
-			lvmcache_fmt(info)->ops->destroy_instance(fid);
-			goto out;
+	if (save_vg && (save_seqno != vg->seqno)) {
+		dm_list_iterate_items(devl, &pvs_scan) {
+			if (!devl->dev)
+				continue;
+			log_debug_lvmetad("Rescan VG %s removing %s from lvmetad to replace.",
+					  vg->name, dev_name(devl->dev));
+			if (!lvmetad_pv_gone_by_dev(devl->dev)) {
+				/* FIXME: use an error path that disables lvmetad */
+				log_error("Failed to remove %s from lvmetad.", dev_name(devl->dev));
+			}
 		}

+		log_debug_lvmetad("Rescan VG %s updating lvmetad from seqno %u to seqno %u.",
+				  vg->name, vg->seqno, save_seqno);
+
 		/*
-		 * Update lvmetad with the newly read version of the VG.
-		 * When the seqno is unchanged the cached VG can be left.
+		 * If this vg_update fails the cached metadata in
+		 * lvmetad will remain invalid.
 		 */
-		if (save_seqno != vg->seqno) {
-			dm_list_iterate_items(devl, &pvs_scan) {
-				if (!devl->dev)
-					continue;
-				log_debug_lvmetad("Rescan VG %s dropping to replace %s.", vg->name, dev_name(devl->dev));
-				if (!lvmetad_pv_gone_by_dev(devl->dev))
-					return_NULL;
-			}
-
-			log_debug_lvmetad("Rescan VG %s updating lvmetad from seqno %u to seqno %u.",
-					  vg->name, vg->seqno, save_seqno);
-
-			/*
-			 * If this vg_update fails the cached metadata in
-			 * lvmetad will remain invalid.
-			 */
-			vg_ret->lvmetad_update_pending = 1;
-			if (!lvmetad_vg_update_finish(vg_ret))
-				log_error("Failed to update lvmetad with new VG meta");
+		save_vg->lvmetad_update_pending = 1;
+		if (!lvmetad_vg_update_finish(save_vg)) {
+			/* FIXME: use an error path that disables lvmetad */
+			log_error("Failed to update lvmetad with new VG meta");
 		}
-		dm_config_destroy(vgmeta_ret);
 	}
 out:
-	if (vg_ret)
-		log_debug_lvmetad("Rescan VG %s done (seqno %u).", vg_ret->name, vg_ret->seqno);
-	return vg_ret;
+	if (!save_vg && fid)
+		fmt->ops->destroy_instance(fid);
+	if (save_meta)
+		dm_config_destroy(save_meta);
+	if (save_vg)
+		log_debug_lvmetad("Rescan VG %s done (new seqno %u).", save_vg->name, save_vg->seqno);
+	return save_vg;
 }

 int lvmetad_pvscan_single(struct cmd_context *cmd, struct device *dev,
@@ -2074,9 +2243,12 @@ int lvmetad_pvscan_single(struct cmd_context *cmd, struct device *dev,
 	struct label *label;
 	struct lvmcache_info *info;
 	struct _lvmetad_pvscan_baton baton;
+	const struct format_type *fmt;
 	/* Create a dummy instance. */
 	struct format_instance_ctx fic = { .type = 0 };

+	log_debug_lvmetad("Scan metadata from dev %s", dev_name(dev));
+
 	if (!lvmetad_used()) {
 		log_error("Cannot proceed since lvmetad is not active.");
 		return 0;
@@ -2087,23 +2259,31 @@ int lvmetad_pvscan_single(struct cmd_context *cmd, struct device *dev,
 		return 1;
 	}

-	if (!label_read(dev, &label, 0)) {
-		log_print_unless_silent("No PV label found on %s.", dev_name(dev));
+	if (!(info = lvmcache_info_from_pvid(dev->pvid, dev, 0))) {
+		log_print_unless_silent("No PV info found on %s for PVID %s.", dev_name(dev), dev->pvid);
 		if (!lvmetad_pv_gone_by_dev(dev))
 			goto_bad;
 		return 1;
 	}

-	info = (struct lvmcache_info *) label->info;
+	if (!(label = lvmcache_get_label(info))) {
+		log_print_unless_silent("No PV label found for %s.", dev_name(dev));
+		if (!lvmetad_pv_gone_by_dev(dev))
+			goto_bad;
+		return 1;
+	}

+	fmt = lvmcache_fmt(info);
+
+	baton.cmd = cmd;
 	baton.vg = NULL;
-	baton.fid = lvmcache_fmt(info)->ops->create_instance(lvmcache_fmt(info), &fic);
+	baton.fid = fmt->ops->create_instance(fmt, &fic);

 	if (!baton.fid)
 		goto_bad;

-	if (baton.fid->fmt->features & FMT_OBSOLETE) {
-		lvmcache_fmt(info)->ops->destroy_instance(baton.fid);
+	if (fmt->features & FMT_OBSOLETE) {
+		fmt->ops->destroy_instance(baton.fid);
 		log_warn("WARNING: Disabling lvmetad cache which does not support obsolete (lvm1) metadata.");
 		lvmetad_set_disabled(cmd, LVMETAD_DISABLE_REASON_LVM1);
 		_found_lvm1_metadata = 1;
@@ -2117,9 +2297,9 @@ int lvmetad_pvscan_single(struct cmd_context *cmd, struct device *dev,
 	lvmcache_foreach_mda(info, _lvmetad_pvscan_single, &baton);

 	if (!baton.vg)
-		lvmcache_fmt(info)->ops->destroy_instance(baton.fid);
+		fmt->ops->destroy_instance(baton.fid);

-	if (!lvmetad_pv_found(cmd, (const struct id *) &dev->pvid, dev, lvmcache_fmt(info),
+	if (!lvmetad_pv_found(cmd, (const struct id *) &dev->pvid, dev, fmt,
 			      label->sector, baton.vg, found_vgnames, changed_vgnames)) {
 		release_vg(baton.vg);
 		goto_bad;
@@ -2185,6 +2365,13 @@ int lvmetad_pvscan_all_devs(struct cmd_context *cmd, int do_wait)
 		replacing_other_update = 1;
 	}

+	label_scan(cmd);
+
+	if (lvmcache_found_duplicate_pvs()) {
+		log_warn("WARNING: Scan found duplicate PVs.");
+		return 0;
+	}
+
 	log_verbose("Scanning all devices to update lvmetad.");

 	if (!(iter = dev_iter_create(cmd->lvmetad_filter, 1))) {
@@ -2555,6 +2742,8 @@ void lvmetad_validate_global_cache(struct cmd_context *cmd, int force)
 	 */
 	_lvmetad_get_pv_cache_list(cmd, &pvc_before);

+	log_debug_lvmetad("Rescan all devices to validate global cache.");
+
 	/*
 	 * Update the local lvmetad cache so it correctly reflects any
 	 * changes made on remote hosts.  (It's possible that this command
@@ -2623,7 +2812,7 @@ void lvmetad_validate_global_cache(struct cmd_context *cmd, int force)
 		_update_changed_pvs_in_udev(cmd, &pvc_before, &pvc_after);
 	}

-	log_debug_lvmetad("Validating global lvmetad cache finished");
+	log_debug_lvmetad("Rescanned all devices");
 }

 int lvmetad_vg_is_foreign(struct cmd_context *cmd, const char *vgname, const char *vgid)
--- a/lib/commands/toolcontext.c
+++ b/lib/commands/toolcontext.c
@@ -542,6 +542,7 @@ static int _process_config(struct cmd_context *cmd)
 	const struct dm_config_value *cv;
 	int64_t pv_min_kb;
 	int udev_disabled = 0;
+	int scan_size_kb;
 	char sysfs_dir[PATH_MAX];

 	if (!_check_config(cmd))
@@ -625,6 +626,29 @@ static int _process_config(struct cmd_context *cmd)
 	cmd->default_settings.udev_sync = udev_disabled ? 0 :
 		find_config_tree_bool(cmd, activation_udev_sync_CFG, NULL);

+#ifdef AIO_SUPPORT
+	cmd->use_aio = find_config_tree_bool(cmd, devices_scan_async_CFG, NULL);
+#else
+	cmd->use_aio = 0;
+	if (find_config_tree_bool(cmd, devices_scan_async_CFG, NULL))
+		log_verbose("Ignoring scan_async, no async I/O support.");
+#endif
+	scan_size_kb = find_config_tree_int(cmd, devices_scan_size_CFG, NULL);
+
+	if (!scan_size_kb || (scan_size_kb < 0) || (scan_size_kb % 4)) {
+		log_warn("WARNING: Ignoring invalid scan_size %d KB, using default %u KB.",
+			 scan_size_kb, DEFAULT_SCAN_SIZE_KB);
+		log_warn("scan_size has units of KB and must be a multiple of 4 KB.");
+		scan_size_kb = DEFAULT_SCAN_SIZE_KB;
+	}
+
+	cmd->default_settings.scan_size_kb = scan_size_kb;
+
+	if (cmd->use_aio)
+		log_debug("Using async io with scan_size %u KB.", scan_size_kb);
+	else
+		log_debug("Using sync io with scan_size %u KB.", scan_size_kb);
+
 	/*
 	 * Set udev_fallback lazily on first use since it requires
 	 * checking DM driver version which is an extra ioctl!
@@ -685,9 +709,6 @@ static int _process_config(struct cmd_context *cmd)
 	if (find_config_tree_bool(cmd, report_two_word_unknown_device_CFG, NULL))
 		init_unknown_device_name("unknown device");

-	init_detect_internal_vg_cache_corruption
-		(find_config_tree_bool(cmd, global_detect_internal_vg_cache_corruption_CFG, NULL));
-
 	if (!_init_system_id(cmd))
 		return_0;

@@ -1996,7 +2017,6 @@ struct cmd_context *create_toolcontext(unsigned is_long_lived,
 	if (set_filters && !init_filters(cmd, 1))
 		goto_out;

-	cmd->default_settings.cache_vgmetadata = 1;
 	cmd->current_settings = cmd->default_settings;

 	cmd->initialized.config = 1;
@@ -2226,6 +2246,8 @@ void destroy_toolcontext(struct cmd_context *cmd)
 	    !cmd->filter->dump(cmd->filter, 1))
 		stack;

+	label_scan_destroy(cmd);
+
 	archive_exit(cmd);
 	backup_exit(cmd);
 	lvmcache_destroy(cmd, 0, 0);
--- a/lib/commands/toolcontext.h
+++ b/lib/commands/toolcontext.h
@@ -39,7 +39,7 @@ struct config_info {
 	int udev_rules;
 	int udev_sync;
 	int udev_fallback;
-	int cache_vgmetadata;
+	int scan_size_kb;
 	const char *msg_prefix;
 	const char *fmt_name;
 	uint64_t unit_factor;
@@ -164,6 +164,8 @@ struct cmd_context {
 	unsigned vg_notify:1;
 	unsigned lv_notify:1;
 	unsigned pv_notify:1;
+	unsigned use_aio:1;
+	unsigned pvscan_cache_single:1;

 	/*
 	 * Filtering.
--- a/lib/config/config.c
+++ b/lib/config/config.c
@@ -494,7 +494,7 @@ int override_config_tree_from_profile(struct cmd_context *cmd,
 * and function avoids parsing of mda into config tree which
 * remains unmodified and should not be used.
 */
-int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
+int config_file_read_fd(struct dm_config_tree *cft, struct device *dev, char *buf_async,
 			off_t offset, size_t size, off_t offset2, size_t size2,
 			checksum_fn_t checksum_fn, uint32_t checksum,
 			int checksum_only, int no_dup_node_check)
@@ -517,7 +517,18 @@ int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
 	if (!(dev->flags & DEV_REGULAR) || size2)
 		use_mmap = 0;

-	if (use_mmap) {
+	if (buf_async) {
+		if (!(buf = dm_malloc(size + size2))) {
+			log_error("Failed to allocate circular buffer.");
+			return 0;
+		}
+
+		memcpy(buf, buf_async + offset, size);
+		if (size2)
+			memcpy(buf + size, buf_async + offset2, size2);
+
+		fb = buf;
+	} else if (use_mmap) {
 		mmap_offset = offset % lvm_getpagesize();
 		/* memory map the file */
 		fb = mmap((caddr_t) 0, size + mmap_offset, PROT_READ,
@@ -532,6 +543,7 @@ int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
 			log_error("Failed to allocate circular buffer.");
 			return 0;
 		}
+
 		if (!dev_read_circular(dev, (uint64_t) offset, size,
 				       (uint64_t) offset2, size2, buf)) {
 			goto out;
@@ -601,7 +613,7 @@ int config_file_read(struct dm_config_tree *cft)
 		}
 	}

-	r = config_file_read_fd(cft, cf->dev, 0, (size_t) info.st_size, 0, 0,
+	r = config_file_read_fd(cft, cf->dev, NULL, 0, (size_t) info.st_size, 0, 0,
 				(checksum_fn_t) NULL, 0, 0, 0);

 	if (!cf->keep_open) {
--- a/lib/config/config.h
+++ b/lib/config/config.h
@@ -239,7 +239,7 @@ config_source_t config_get_source_type(struct dm_config_tree *cft);
 typedef uint32_t (*checksum_fn_t) (uint32_t initial, const uint8_t *buf, uint32_t size);

 struct dm_config_tree *config_open(config_source_t source, const char *filename, int keep_open);
-int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
+int config_file_read_fd(struct dm_config_tree *cft, struct device *dev, char *buf_async,
 			off_t offset, size_t size, off_t offset2, size_t size2,
 			checksum_fn_t checksum_fn, uint32_t checksum,
 			int skip_parse, int no_dup_node_check);
--- a/lib/config/config_settings.h
+++ b/lib/config/config_settings.h
@@ -457,6 +457,23 @@ cfg(devices_allow_changes_with_duplicate_pvs_CFG, "allow_changes_with_duplicate_
 	"Enabling this setting allows the VG to be used as usual even with\n"
 	"uncertain devices.\n")

+cfg(devices_scan_async_CFG, "scan_async", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_BOOL, DEFAULT_SCAN_ASYNC, vsn(2, 2, 176), NULL, 0, NULL,
+	"Use async I/O to read headers and metadata from disks in parallel.\n")
+
+cfg(devices_scan_size_CFG, "scan_size", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_INT, DEFAULT_SCAN_SIZE_KB, vsn(2, 2, 176), NULL, 0, NULL,
+	"Number of KiB to read from each disk when scanning disks.\n"
+	"The initial scan size is intended to cover all the headers\n"
+	"and metadata that LVM places at the start of each disk so\n"
+	"that a single read operation can retrieve them all.\n"
+	"Any headers or metadata that lie beyond this size require\n"
+	"an additional disk read. Must be a multiple of 4KiB.\n")
+
+cfg(devices_async_events_CFG, "async_events", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_INT, DEFAULT_ASYNC_EVENTS, vsn(2, 2, 176), NULL, 0, NULL,
+	"Max number of concurrent async reads when scanning disks.\n"
+	"Up to this many disks can be read concurrently when scanning\n"
+	"disks with async I/O. This setting may be limited by the system\n"
+	"aio configuration. This should not exceed the open files limit.\n")
+
 cfg_array(allocation_cling_tag_list_CFG, "cling_tag_list", allocation_CFG_SECTION, CFG_DEFAULT_UNDEFINED, CFG_TYPE_STRING, NULL, vsn(2, 2, 77), NULL, 0, NULL,
 	"Advise LVM which PVs to use when searching for new space.\n"
 	"When searching for free space to extend an LV, the 'cling' allocation\n"
@@ -868,11 +885,8 @@ cfg(global_abort_on_internal_errors_CFG, "abort_on_internal_errors", global_CFG_
 	"Treat any internal errors as fatal errors, aborting the process that\n"
 	"encountered the internal error. Please only enable for debugging.\n")

-cfg(global_detect_internal_vg_cache_corruption_CFG, "detect_internal_vg_cache_corruption", global_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_DETECT_INTERNAL_VG_CACHE_CORRUPTION, vsn(2, 2, 96), NULL, 0, NULL,
-	"Internal verification of VG structures.\n"
-	"Check if CRC matches when a parsed VG is used multiple times. This\n"
-	"is useful to catch unexpected changes to cached VG structures.\n"
-	"Please only enable for debugging.\n")
+cfg(global_detect_internal_vg_cache_corruption_CFG, "detect_internal_vg_cache_corruption", global_CFG_SECTION, 0, CFG_TYPE_BOOL, 0, vsn(2, 2, 96), NULL, vsn(2, 2, 174), NULL,
+	"No longer used.\n")

 cfg(global_metadata_read_only_CFG, "metadata_read_only", global_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_METADATA_READ_ONLY, vsn(2, 2, 75), NULL, 0, NULL,
 	"No operations that change on-disk metadata are permitted.\n"
--- a/lib/config/defaults.h
+++ b/lib/config/defaults.h
@@ -179,7 +179,6 @@
 #define DEFAULT_LOGLEVEL 0
 #define DEFAULT_INDENT 1
 #define DEFAULT_ABORT_ON_INTERNAL_ERRORS 0
-#define DEFAULT_DETECT_INTERNAL_VG_CACHE_CORRUPTION 0
 #define DEFAULT_UNITS "r"
 #define DEFAULT_SUFFIX 1
 #define DEFAULT_HOSTTAGS 0
@@ -267,4 +266,8 @@
 #define DEFAULT_THIN_POOL_AUTOEXTEND_THRESHOLD 100
 #define DEFAULT_THIN_POOL_AUTOEXTEND_PERCENT 20

+#define DEFAULT_ASYNC_EVENTS 100
+#define DEFAULT_SCAN_ASYNC 1
+#define DEFAULT_SCAN_SIZE_KB 128
+
 #endif				/* _LVM_DEFAULTS_H */
--- a/lib/device/dev-cache.c
+++ b/lib/device/dev-cache.c
@@ -1081,6 +1081,8 @@ static void _full_scan(int dev_scan)
 	if (_cache.has_scanned && !dev_scan)
 		return;

+	log_debug_devs("Adding device paths to dev cache");
+
 	_insert_dirs(&_cache.dirs);

 	(void) dev_cache_index_devs();
@@ -1090,6 +1092,8 @@ static void _full_scan(int dev_scan)

 	_cache.has_scanned = 1;
 	init_full_scan_done(1);
+
+	log_debug_devs("Added %d device paths to dev cache", dm_hash_get_num_entries(_cache.names));
 }

 int dev_cache_has_scanned(void)
--- a/lib/device/dev-io.c
+++ b/lib/device/dev-io.c
@@ -827,3 +827,324 @@ int dev_set(struct device *dev, uint64_t offset, size_t len, int value)

 	return (len == 0);
 }
+
+#ifdef AIO_SUPPORT
+
+/*
+ * io_setup() wrapper:
+ * async_event_count is the max number of concurrent async
+ * i/os, i.e. the number of devices that can be read at once
+ *
+ * max_io_alloc_count: max number of aio structs to allocate,
+ * each with a buf_len size buffer.
+ *
+ * max_buf_alloc_bytes: max number of bytes to use for buffers
+ * attached to all aio structs; each aio struct gets a
+ * buf_len size buffer.
+ *
+ * When only max_io_alloc_count is set, it is used directly.
+ *
+ * When only max_buf_alloc_bytes is set, the number of aio
+ * structs is determined by this number divided by buf_len.
+ *
+ * When both are set, max_io_alloc_count is reduced, if needed,
+ * to whatever value max_buf_alloc_bytes would allow.
+ *
+ * When both are zero, there is no limit on the number of aio
+ * structs.  If allocation fails for an aio struct or its buffer,
+ * the code should revert to synchronous io.
+ */
+
+struct dev_async_context *dev_async_context_setup(unsigned async_event_count,
+						  unsigned max_io_alloc_count,
+						  unsigned max_buf_alloc_bytes,
+						  int buf_len)
+{
+	struct dev_async_context *ac;
+	unsigned nr_events = DEFAULT_ASYNC_EVENTS;
+	int count;
+	int error;
+
+	if (async_event_count)
+		nr_events = async_event_count;
+
+	if (!(ac = malloc(sizeof(struct dev_async_context))))
+		return_0;
+
+	memset(ac, 0, sizeof(struct dev_async_context));
+
+	dm_list_init(&ac->unused_ios);
+
+	error = io_setup(nr_events, &ac->aio_ctx);
+
+	if (error < 0) {
+		log_warn("WARNING: async io setup error %d with %u events.", error, nr_events);
+		free(ac);
+		return_0;
+	}
+
+
+	if (!max_io_alloc_count && !max_buf_alloc_bytes)
+		count = 0;
+	else if (!max_io_alloc_count && max_buf_alloc_bytes)
+		count = max_buf_alloc_bytes / buf_len;
+	else if (max_io_alloc_count && max_buf_alloc_bytes) {
+		if (max_io_alloc_count * buf_len > max_buf_alloc_bytes)
+			count = max_buf_alloc_bytes / buf_len;
+	} else
+		count = max_io_alloc_count;
+
+	ac->max_ios = count;
+	return ac;
+}
+
+void dev_async_context_destroy(struct dev_async_context *ac)
+{
+	io_destroy(ac->aio_ctx);
+	free(ac);
+}
+
+static struct dev_async_io *_async_io_alloc(int buf_len)
+{
+	struct dev_async_io *aio;
+	char *buf;
+	char **p_buf;
+
+	/*
+	 * mem pool doesn't seem to work for this, probably because
+	 * of the memalign that follows.
+	 */
+	if (!(aio = malloc(sizeof(struct dev_async_io))))
+		return_0;
+
+	memset(aio, 0, sizeof(struct dev_async_io));
+
+	buf = NULL;
+	p_buf = &buf;
+
+	if (posix_memalign((void *)p_buf, getpagesize(), buf_len)) {
+		free(aio);
+		return_NULL;
+	}
+
+	memset(buf, 0, buf_len);
+
+	aio->buf = buf;
+	aio->buf_len = buf_len;
+	return aio;
+}
+
+static void _async_io_free(struct dev_async_io *aio)
+{
+	if (aio->buf)
+		free(aio->buf);
+	free(aio);
+}
+
+int dev_async_alloc_ios(struct dev_async_context *ac, int num, int buf_len, int *available)
+{
+	struct dev_async_io *aio;
+	int count;
+	int i;
+
+	/* FIXME: check if num wants more pre allocated? */
+	if (!dm_list_empty(&ac->unused_ios))
+		return 1;
+
+	/* 
+	 * When no limit is used and no pre-alloc number is set,
+	 * then no ios are allocated up front, but the are
+	 * allocated as needed in get().
+	 */
+	if (!ac->max_ios && !num) {
+		*available = 0;
+		return 1;
+	}
+
+	if (num && !ac->max_ios)
+		count = num;
+	else if (!num && ac->max_ios)
+		count = ac->max_ios;
+	else if (num > ac->max_ios)
+		count = ac->max_ios;
+	else if (num < ac->max_ios)
+		count = num;
+	else
+		count = ac->max_ios;
+
+	for (i = 0; i < count; i++) {
+		if (!(aio = _async_io_alloc(buf_len))) {
+			ac->num_ios = i;
+			*available = i;
+			return 1;
+		}
+		dm_list_add(&ac->unused_ios, &aio->list);
+	}
+
+	ac->num_ios = count;
+	*available = count;
+	return 1;
+}
+
+void dev_async_free_ios(struct dev_async_context *ac)
+{
+	struct dev_async_io *aio, *aio2;
+
+	dm_list_iterate_items_safe(aio, aio2, &ac->unused_ios) {
+		dm_list_del(&aio->list);
+		_async_io_free(aio);
+	}
+}
+
+struct dev_async_io *dev_async_io_get(struct dev_async_context *ac, int buf_len)
+{
+	struct dm_list *aio_item;
+	struct dev_async_io *aio;
+
+	if (!(aio_item = dm_list_first(&ac->unused_ios)))
+		goto alloc_new;
+
+	aio = dm_list_item(aio_item, struct dev_async_io);
+	dm_list_del(&aio->list);
+	return aio;
+
+ alloc_new:
+	/* alloc on demand if there is no max or we have used less than max */
+	if (!ac->max_ios || (ac->num_ios < ac->max_ios)) {
+		if ((aio = _async_io_alloc(buf_len))) {
+			ac->num_ios++;
+			return aio;
+		}
+	}
+
+	return NULL;
+}
+
+void dev_async_io_put(struct dev_async_context *ac, struct dev_async_io *aio)
+{
+	if (!ac)
+		_async_io_free(aio);
+	else {
+		memset(aio->buf, 0, aio->buf_len);
+		aio->dev = NULL;
+		aio->len = 0;
+		aio->done = 0;
+		aio->result = 0;
+		dm_list_add(&ac->unused_ios, &aio->list);
+	}
+}
+
+/* io_submit() wrapper */
+
+int dev_async_read_submit(struct dev_async_context *ac, struct dev_async_io *aio,
+			  struct device *dev, uint32_t len, uint64_t offset, int *nospace)
+{
+	struct iocb *iocb = &aio->iocb;
+	int error;
+
+	*nospace = 0;
+
+	if (len > aio->buf_len)
+		return_0;
+
+	aio->len = len;
+
+	iocb->data = aio;
+	iocb->aio_fildes = dev_fd(dev);
+	iocb->aio_lio_opcode = IO_CMD_PREAD;
+	iocb->u.c.buf = aio->buf;
+	iocb->u.c.nbytes = len;
+	iocb->u.c.offset = offset;
+
+	error = io_submit(ac->aio_ctx, 1, &iocb);
+	if (error == -EAGAIN)
+		*nospace = 1;
+	if (error < 0)
+		return 0;
+	return 1;
+}
+
+/* io_getevents() wrapper */
+
+int dev_async_getevents(struct dev_async_context *ac, int wait_count, struct timespec *timeout,
+			int *done_count)
+{
+	int wait_nr;
+	int rv;
+	int i;
+
+	*done_count = 0;
+
+ retry:
+	memset(&ac->events, 0, sizeof(ac->events));
+
+	if (wait_count >= MAX_GET_EVENTS)
+		wait_nr = MAX_GET_EVENTS;
+	else
+		wait_nr = wait_count;
+
+	rv = io_getevents(ac->aio_ctx, 1, wait_nr, (struct io_event *)&ac->events, timeout);
+
+	if (rv == -EINTR)
+		goto retry;
+	if (rv < 0)
+		return 0;
+	if (!rv)
+		return 1;
+
+	for (i = 0; i < rv; i++) {
+		struct iocb *iocb = ac->events[i].obj;
+		struct dev_async_io *aio = iocb->data;
+		aio->result = ac->events[i].res;
+		aio->done = 1;
+	}
+
+	*done_count = rv;
+	return 1;
+}
+
+#else /* AIO_SUPPORT */
+
+struct dev_async_context *dev_async_context_setup(unsigned async_event_count,
+						  unsigned max_io_alloc_count,
+						  unsigned max_buf_alloc_bytes,
+						  int buf_len)
+{
+	return NULL;
+}
+
+void dev_async_context_destroy(struct dev_async_context *ac)
+{
+}
+
+int dev_async_alloc_ios(struct dev_async_context *ac, int num, int buf_len, int *available)
+{
+	return 0;
+}
+
+void dev_async_free_ios(struct dev_async_context *ac)
+{
+}
+
+struct dev_async_io *dev_async_io_get(struct dev_async_context *ac, int buf_len)
+{
+	return NULL;
+}
+
+void dev_async_io_put(struct dev_async_context *ac, struct dev_async_io *aio)
+{
+}
+
+int dev_async_read_submit(struct dev_async_context *ac, struct dev_async_io *aio,
+			  struct device *dev, uint32_t len, uint64_t offset, int *nospace)
+{
+	return 0;
+}
+
+int dev_async_getevents(struct dev_async_context *ac, int wait_count, struct timespec *timeout,
+			int *done_count)
+{
+	return 0;
+}
+
+#endif /* AIO_SUPPORT */
--- a/lib/device/device.h
+++ b/lib/device/device.h
@@ -19,6 +19,7 @@
 #include "uuid.h"

 #include <fcntl.h>
+#include <libaio.h>

 #define DEV_ACCESSED_W		0x00000001	/* Device written to? */
 #define DEV_REGULAR		0x00000002	/* Regular file? */
@@ -90,6 +91,32 @@ struct device_area {
 	uint64_t size;		/* Bytes */
 };

+/*
+ * We'll collect the results of this many async reads
+ * in one system call.  It shouldn't matter much what
+ * number is used here.
+ */
+#define MAX_GET_EVENTS 16
+
+struct dev_async_context {
+	io_context_t aio_ctx;
+	struct io_event events[MAX_GET_EVENTS]; /* for processing completions */
+	struct dm_list unused_ios; /* unused/available aio structcs */
+	int num_ios; /* number of allocated aio structs */
+	int max_ios; /* max number of aio structs to allocate */
+};
+
+struct dev_async_io {
+	struct dm_list list;
+	struct iocb iocb;
+	struct device *dev;
+	char *buf;
+	uint32_t buf_len; /* size of buf */
+	uint32_t len; /* size of submitted io */
+	int done;
+	int result;
+};
+
 /*
 * Support for external device info.
 */
@@ -144,4 +171,27 @@ void dev_destroy_file(struct device *dev);
 /* Return a valid device name from the alias list; NULL otherwise */
 const char *dev_name_confirmed(struct device *dev, int quiet);

+struct dev_async_context *dev_async_context_setup(unsigned async_event_count,
+						  unsigned max_io_alloc_count,
+						  unsigned max_buf_alloc_bytes,
+						  int buf_len);
+void dev_async_context_destroy(struct dev_async_context *ac);
+
+/* allocate aio structs (with buffers), up to the max specified during context setup */
+int dev_async_alloc_ios(struct dev_async_context *ac, int num, int buf_len, int *available);
+
+/* free aio structs (and buffers) */
+void dev_async_free_ios(struct dev_async_context *ac);
+
+/* get an available aio struct (with buffer) */
+struct dev_async_io *dev_async_io_get(struct dev_async_context *ac, int buf_len);
+
+/* make an aio struct (with buffer) available for use (by another get) */
+void dev_async_io_put(struct dev_async_context *ac, struct dev_async_io *aio);
+
+int dev_async_read_submit(struct dev_async_context *ac, struct dev_async_io *aio,
+                          struct device *dev, uint32_t len, uint64_t offset, int *nospace);
+int dev_async_getevents(struct dev_async_context *ac, int wait_count, struct timespec *timeout,
+			int *done_count);
+
 #endif
--- a/lib/format1/format1.c
+++ b/lib/format1/format1.c
@@ -180,9 +180,9 @@ out:
 static struct volume_group *_format1_vg_read(struct format_instance *fid,
 				     const char *vg_name,
 				     struct metadata_area *mda __attribute__((unused)),
+				     struct label_read_data *ld __attribute__((unused)),
 				     struct cached_vg_fmtdata **vg_fmtdata __attribute__((unused)),
-				     unsigned *use_previous_vg __attribute__((unused)),
-				     int single_device __attribute__((unused)))
+				     unsigned *use_previous_vg __attribute__((unused)))
 {
 	struct volume_group *vg;
 	struct disk_list *dl;
--- a/lib/format1/lvm1-label.c
+++ b/lib/format1/lvm1-label.c
@@ -55,6 +55,7 @@ static int _lvm1_write(struct label *label __attribute__((unused)), void *buf __
 }

 static int _lvm1_read(struct labeller *l, struct device *dev, void *buf,
+		 struct label_read_data *ld,
 		 struct label **label)
 {
 	struct pv_disk *pvd = (struct pv_disk *) buf;
--- a/lib/format_pool/format_pool.c
+++ b/lib/format_pool/format_pool.c
@@ -101,9 +101,9 @@ static int _check_usp(const char *vgname, struct user_subpool *usp, int sp_count
 static struct volume_group *_pool_vg_read(struct format_instance *fid,
 					  const char *vg_name,
 					  struct metadata_area *mda __attribute__((unused)),
+					  struct label_read_data *ld __attribute__((unused)),
 					  struct cached_vg_fmtdata **vg_fmtdata __attribute__((unused)),
-					  unsigned *use_previous_vg __attribute__((unused)),
-					  int single_device __attribute__((unused)))
+					  unsigned *use_previous_vg __attribute__((unused)))
 {
 	struct volume_group *vg;
 	struct user_subpool *usp;
--- a/lib/format_pool/pool_label.c
+++ b/lib/format_pool/pool_label.c
@@ -56,6 +56,7 @@ static int _pool_write(struct label *label __attribute__((unused)), void *buf __
 }

 static int _pool_read(struct labeller *l, struct device *dev, void *buf,
+		 struct label_read_data *ld,
 		 struct label **label)
 {
 	struct pool_list pl;
--- a/lib/format_text/archive.c
+++ b/lib/format_text/archive.c
@@ -321,7 +321,7 @@ static void _display_archive(struct cmd_context *cmd, struct archive_file *af)
 	 * retrieve the archive time and description.
 	 */
 	/* FIXME Use variation on _vg_read */
-	if (!(vg = text_vg_import_file(tf, af->path, &when, &desc))) {
+	if (!(vg = text_read_metadata_file(tf, af->path, &when, &desc))) {
 		log_error("Unable to read archive file.");
 		tf->fmt->ops->destroy_instance(tf);
 		return;
--- a/lib/format_text/archiver.c
+++ b/lib/format_text/archiver.c
@@ -320,7 +320,7 @@ struct volume_group *backup_read_vg(struct cmd_context *cmd,
 	}

 	dm_list_iterate_items(mda, &tf->metadata_areas_in_use) {
-		if (!(vg = mda->ops->vg_read(tf, vg_name, mda, NULL, NULL, 0)))
+		if (!(vg = mda->ops->vg_read(tf, vg_name, mda, NULL, NULL, NULL)))
 			stack;
 		break;
 	}
--- a/lib/format_text/format-text.c
+++ b/lib/format_text/format-text.c
@@ -190,7 +190,7 @@ static int _pv_analyze_mda_raw (const struct format_type * fmt,
 	if (!dev_open_readonly(area->dev))
 		return_0;

-	if (!(mdah = raw_read_mda_header(fmt, area)))
+	if (!(mdah = raw_read_mda_header(fmt, area, NULL)))
 		goto_out;

 	rlocn = mdah->raw_locns;
@@ -316,15 +316,26 @@ static void _xlate_mdah(struct mda_header *mdah)
 	}
 }

-static int _raw_read_mda_header(struct mda_header *mdah, struct device_area *dev_area)
+static int _raw_read_mda_header(struct mda_header *mdah, struct device_area *dev_area,
+				struct label_read_data *ld)
 {
 	if (!dev_open_readonly(dev_area->dev))
 		return_0;

-	if (!dev_read(dev_area->dev, dev_area->start, MDA_HEADER_SIZE, mdah)) {
-		if (!dev_close(dev_area->dev))
-			stack;
-		return_0;
+	if (!ld || (ld->buf_len < dev_area->start + MDA_HEADER_SIZE)) {
+		log_debug_metadata("Reading mda header sector from %s at %llu",
+			   	   dev_name(dev_area->dev), (unsigned long long)dev_area->start);
+
+		if (!dev_read(dev_area->dev, dev_area->start, MDA_HEADER_SIZE, mdah)) {
+			if (!dev_close(dev_area->dev))
+				stack;
+			return_0;
+		}
+	} else {
+		log_debug_metadata("Copying mda header sector from %s buffer at %llu",
+			   	   dev_name(dev_area->dev), (unsigned long long)dev_area->start);
+
+		memcpy(mdah, ld->buf + dev_area->start, MDA_HEADER_SIZE);
 	}

 	if (!dev_close(dev_area->dev))
@@ -366,7 +377,8 @@ static int _raw_read_mda_header(struct mda_header *mdah, struct device_area *dev
 }

 struct mda_header *raw_read_mda_header(const struct format_type *fmt,
-				       struct device_area *dev_area)
+				       struct device_area *dev_area,
+				       struct label_read_data *ld)
 {
 	struct mda_header *mdah;

@@ -375,7 +387,7 @@ struct mda_header *raw_read_mda_header(const struct format_type *fmt,
 		return NULL;
 	}

-	if (!_raw_read_mda_header(mdah, dev_area)) {
+	if (!_raw_read_mda_header(mdah, dev_area, ld)) {
 		dm_pool_free(fmt->cmd->mem, mdah);
 		return NULL;
 	}
@@ -402,8 +414,14 @@ static int _raw_write_mda_header(const struct format_type *fmt,
 	return 1;
 }

-static struct raw_locn *_find_vg_rlocn(struct device_area *dev_area,
+/*
+ * FIXME: unify this with read_metadata_location() which is used
+ * in the label scanning path.
+ */
+
+static struct raw_locn *_read_metadata_location_vg(struct device_area *dev_area,
 				       struct mda_header *mdah,
+				       struct label_read_data *ld,
 				       const char *vgname,
 				       int *precommitted)
 {
@@ -438,11 +456,20 @@ static struct raw_locn *_find_vg_rlocn(struct device_area *dev_area,
 	if (!*vgname)
 		return rlocn;

-	/* FIXME Loop through rlocns two-at-a-time.  List null-terminated. */
-	/* FIXME Ignore if checksum incorrect!!! */
-	if (!dev_read(dev_area->dev, dev_area->start + rlocn->offset,
-		      sizeof(vgnamebuf), vgnamebuf))
-		goto_bad;
+	/*
+	 * Verify that the VG metadata pointed to by the rlocn
+	 * begins with a valid vgname.
+	 */
+	if (!ld || (ld->buf_len < dev_area->start + rlocn->offset + NAME_LEN)) {
+		/* FIXME Loop through rlocns two-at-a-time.  List null-terminated. */
+		/* FIXME Ignore if checksum incorrect!!! */
+		if (!dev_read(dev_area->dev, dev_area->start + rlocn->offset,
+		      	      sizeof(vgnamebuf), vgnamebuf))
+			goto_bad;
+	} else {
+		memset(vgnamebuf, 0, sizeof(vgnamebuf));
+		memcpy(vgnamebuf, ld->buf + dev_area->start + rlocn->offset, NAME_LEN);
+	}

 	if (!strncmp(vgnamebuf, vgname, len = strlen(vgname)) &&
 	    (isspace(vgnamebuf[len]) || vgnamebuf[len] == '{'))
@@ -488,10 +515,10 @@ static int _raw_holds_vgname(struct format_instance *fid,
 	if (!dev_open_readonly(dev_area->dev))
 		return_0;

-	if (!(mdah = raw_read_mda_header(fid->fmt, dev_area)))
+	if (!(mdah = raw_read_mda_header(fid->fmt, dev_area, NULL)))
 		return_0;

-	if (_find_vg_rlocn(dev_area, mdah, vgname, &noprecommit))
+	if (_read_metadata_location_vg(dev_area, mdah, NULL, vgname, &noprecommit))
 		r = 1;

 	if (!dev_close(dev_area->dev))
@@ -503,10 +530,10 @@ static int _raw_holds_vgname(struct format_instance *fid,
 static struct volume_group *_vg_read_raw_area(struct format_instance *fid,
 					      const char *vgname,
 					      struct device_area *area,
+					      struct label_read_data *ld,
 					      struct cached_vg_fmtdata **vg_fmtdata,
 					      unsigned *use_previous_vg,
-					      int precommitted,
-					      int single_device)
+					      int precommitted)
 {
 	struct volume_group *vg = NULL;
 	struct raw_locn *rlocn;
@@ -515,10 +542,10 @@ static struct volume_group *_vg_read_raw_area(struct format_instance *fid,
 	char *desc;
 	uint32_t wrap = 0;

-	if (!(mdah = raw_read_mda_header(fid->fmt, area)))
+	if (!(mdah = raw_read_mda_header(fid->fmt, area, ld)))
 		goto_out;

-	if (!(rlocn = _find_vg_rlocn(area, mdah, vgname, &precommitted))) {
+	if (!(rlocn = _read_metadata_location_vg(area, mdah, ld, vgname, &precommitted))) {
 		log_debug_metadata("VG %s not found on %s", vgname, dev_name(area->dev));
 		goto out;
 	}
@@ -532,25 +559,25 @@ static struct volume_group *_vg_read_raw_area(struct format_instance *fid,
 		goto out;
 	}

-	/* FIXME 64-bit */
-	if (!(vg = text_vg_import_fd(fid, NULL, vg_fmtdata, use_previous_vg, single_device, area->dev, 
-				     (off_t) (area->start + rlocn->offset),
-				     (uint32_t) (rlocn->size - wrap),
-				     (off_t) (area->start + MDA_HEADER_SIZE),
-				     wrap, calc_crc, rlocn->checksum, &when,
-				     &desc)) && (!use_previous_vg || !*use_previous_vg))
-		goto_out;
+	vg = text_read_metadata(fid, area->dev, NULL, ld, vg_fmtdata, use_previous_vg,
+				(off_t) (area->start + rlocn->offset),
+				(uint32_t) (rlocn->size - wrap),
+				(off_t) (area->start + MDA_HEADER_SIZE),
+				wrap,
+				calc_crc,
+				rlocn->checksum,
+				&when, &desc);

-	if (vg)
-		log_debug_metadata("Read %s %smetadata (%u) from %s at %" PRIu64 " size %"
-				   PRIu64, vg->name, precommitted ? "pre-commit " : "",
-				   vg->seqno, dev_name(area->dev),
-				   area->start + rlocn->offset, rlocn->size);
-	else
-		log_debug_metadata("Skipped reading %smetadata from %s at %" PRIu64 " size %"
-				   PRIu64 " with matching checksum.", precommitted ? "pre-commit " : "",
-				   dev_name(area->dev),
-				   area->start + rlocn->offset, rlocn->size);
+	if (!vg) {
+		/* FIXME: detect and handle errors, and distinguish from the optimization
+		   that skips parsing the metadata which also returns NULL. */
+	}
+
+	log_debug_metadata("Found metadata on %s at %"PRIu64" size %"PRIu64" for VG %s",
+			   dev_name(area->dev),
+			   area->start + rlocn->offset,
+			   rlocn->size,
+			   vgname);

 	if (vg && precommitted)
 		vg->status |= PRECOMMITTED;
@@ -562,9 +589,9 @@ static struct volume_group *_vg_read_raw_area(struct format_instance *fid,
 static struct volume_group *_vg_read_raw(struct format_instance *fid,
 					 const char *vgname,
 					 struct metadata_area *mda,
+					 struct label_read_data *ld,
 					 struct cached_vg_fmtdata **vg_fmtdata,
-					 unsigned *use_previous_vg,
-					 int single_device)
+					 unsigned *use_previous_vg)
 {
 	struct mda_context *mdac = (struct mda_context *) mda->metadata_locn;
 	struct volume_group *vg;
@@ -572,7 +599,7 @@ static struct volume_group *_vg_read_raw(struct format_instance *fid,
 	if (!dev_open_readonly(mdac->area.dev))
 		return_NULL;

-	vg = _vg_read_raw_area(fid, vgname, &mdac->area, vg_fmtdata, use_previous_vg, 0, single_device);
+	vg = _vg_read_raw_area(fid, vgname, &mdac->area, ld, vg_fmtdata, use_previous_vg, 0);

 	if (!dev_close(mdac->area.dev))
 		stack;
@@ -583,6 +610,7 @@ static struct volume_group *_vg_read_raw(struct format_instance *fid,
 static struct volume_group *_vg_read_precommit_raw(struct format_instance *fid,
 						   const char *vgname,
 						   struct metadata_area *mda,
+						   struct label_read_data *ld,
 						   struct cached_vg_fmtdata **vg_fmtdata,
 						   unsigned *use_previous_vg)
 {
@@ -592,7 +620,7 @@ static struct volume_group *_vg_read_precommit_raw(struct format_instance *fid,
 	if (!dev_open_readonly(mdac->area.dev))
 		return_NULL;

-	vg = _vg_read_raw_area(fid, vgname, &mdac->area, vg_fmtdata, use_previous_vg, 1, 0);
+	vg = _vg_read_raw_area(fid, vgname, &mdac->area, ld, vg_fmtdata, use_previous_vg, 1);

 	if (!dev_close(mdac->area.dev))
 		stack;
@@ -630,10 +658,10 @@ static int _vg_write_raw(struct format_instance *fid, struct volume_group *vg,
 	if (!dev_open(mdac->area.dev))
 		return_0;

-	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area)))
+	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area, NULL)))
 		goto_out;

-	rlocn = _find_vg_rlocn(&mdac->area, mdah, old_vg_name ? : vg->name, &noprecommit);
+	rlocn = _read_metadata_location_vg(&mdac->area, mdah, NULL, old_vg_name ? : vg->name, &noprecommit);
 	mdac->rlocn.offset = _next_rlocn_offset(rlocn, mdah);

 	if (!fidtc->raw_metadata_buf &&
@@ -736,10 +764,10 @@ static int _vg_commit_raw_rlocn(struct format_instance *fid,
 	if (!found)
 		return 1;

-	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area)))
+	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area, NULL)))
 		goto_out;

-	if (!(rlocn = _find_vg_rlocn(&mdac->area, mdah, old_vg_name ? : vg->name, &noprecommit))) {
+	if (!(rlocn = _read_metadata_location_vg(&mdac->area, mdah, NULL, old_vg_name ? : vg->name, &noprecommit))) {
 		mdah->raw_locns[0].offset = 0;
 		mdah->raw_locns[0].size = 0;
 		mdah->raw_locns[0].checksum = 0;
@@ -846,10 +874,10 @@ static int _vg_remove_raw(struct format_instance *fid, struct volume_group *vg,
 	if (!dev_open(mdac->area.dev))
 		return_0;

-	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area)))
+	if (!(mdah = raw_read_mda_header(fid->fmt, &mdac->area, NULL)))
 		goto_out;

-	if (!(rlocn = _find_vg_rlocn(&mdac->area, mdah, vg->name, &noprecommit))) {
+	if (!(rlocn = _read_metadata_location_vg(&mdac->area, mdah, NULL, vg->name, &noprecommit))) {
 		rlocn = &mdah->raw_locns[0];
 		mdah->raw_locns[1].offset = 0;
 	}
@@ -883,8 +911,10 @@ static struct volume_group *_vg_read_file_name(struct format_instance *fid,
 	time_t when;
 	char *desc;

-	if (!(vg = text_vg_import_file(fid, read_path, &when, &desc)))
-		return_NULL;
+	if (!(vg = text_read_metadata_file(fid, read_path, &when, &desc))) {
+		log_error("Failed to read VG %s from %s", vgname, read_path);
+		return NULL;
+	}

 	/*
 	 * Currently you can only have a single volume group per
@@ -907,9 +937,9 @@ static struct volume_group *_vg_read_file_name(struct format_instance *fid,
 static struct volume_group *_vg_read_file(struct format_instance *fid,
 					  const char *vgname,
 					  struct metadata_area *mda,
+					  struct label_read_data *ld,
 					  struct cached_vg_fmtdata **vg_fmtdata,
-					  unsigned *use_previous_vg __attribute__((unused)),
-					  int single_device __attribute__((unused)))
+					  unsigned *use_previous_vg __attribute__((unused)))
 {
 	struct text_context *tc = (struct text_context *) mda->metadata_locn;

@@ -919,6 +949,7 @@ static struct volume_group *_vg_read_file(struct format_instance *fid,
 static struct volume_group *_vg_read_precommit_file(struct format_instance *fid,
 						    const char *vgname,
 						    struct metadata_area *mda,
+					  	    struct label_read_data *ld,
 						    struct cached_vg_fmtdata **vg_fmtdata,
 						    unsigned *use_previous_vg __attribute__((unused)))
 {
@@ -1107,6 +1138,9 @@ static int _scan_file(const struct format_type *fmt, const char *vgname)

 	dir_list = &((struct mda_lists *) fmt->private)->dirs;

+	if (!dm_list_empty(dir_list))
+		log_debug_metadata("Scanning independent files for %s", vgname ? vgname : "VGs");
+
 	dm_list_iterate_items(dl, dir_list) {
 		if (!(d = opendir(dl->dir))) {
 			log_sys_error("opendir", dl->dir);
@@ -1139,10 +1173,14 @@ static int _scan_file(const struct format_type *fmt, const char *vgname)
 					stack;
 					break;
 				}
+
+				log_debug_metadata("Scanning independent file %s for VG %s", path, scanned_vgname);
+
 				if ((vg = _vg_read_file_name(fid, scanned_vgname,
 							     path))) {
 					/* FIXME Store creation host in vg */
 					lvmcache_update_vg(vg, 0);
+					lvmcache_set_independent_location(vg->name);
 					release_vg(vg);
 				}
 			}
@@ -1154,8 +1192,9 @@ static int _scan_file(const struct format_type *fmt, const char *vgname)
 	return 1;
 }

-int vgname_from_mda(const struct format_type *fmt,
-		    struct mda_header *mdah, struct device_area *dev_area,
+int read_metadata_location_summary(const struct format_type *fmt,
+		    struct mda_header *mdah, struct label_read_data *ld,
+		    struct device_area *dev_area,
 		    struct lvmcache_vgsummary *vgsummary, uint64_t *mda_free_sectors)
 {
 	struct raw_locn *rlocn;
@@ -1163,13 +1202,12 @@ int vgname_from_mda(const struct format_type *fmt,
 	unsigned int len = 0;
 	char buf[NAME_LEN + 1] __attribute__((aligned(8)));
 	uint64_t buffer_size, current_usage;
-	unsigned used_cached_metadata = 0;

 	if (mda_free_sectors)
 		*mda_free_sectors = ((dev_area->size - MDA_HEADER_SIZE) / 2) >> SECTOR_SHIFT;

 	if (!mdah) {
-		log_error(INTERNAL_ERROR "vgname_from_mda called with NULL pointer for mda_header");
+		log_error(INTERNAL_ERROR "read_metadata_location_summary called with NULL pointer for mda_header");
 		return 0;
 	}

@@ -1180,15 +1218,21 @@ int vgname_from_mda(const struct format_type *fmt,
 	 * If no valid offset, do not try to search for vgname
 	 */
 	if (!rlocn->offset) {
-		log_debug("%s: found metadata with offset 0.",
-			  dev_name(dev_area->dev));
+		log_debug_metadata("Metadata location on %s at %"PRIu64" has offset 0.",
+				   dev_name(dev_area->dev), dev_area->start + rlocn->offset);
 		return 0;
 	}

-	/* Do quick check for a vgname */
-	if (!dev_read(dev_area->dev, dev_area->start + rlocn->offset,
-		      NAME_LEN, buf))
-		return_0;
+	/*
+	 * Verify that the VG metadata pointed to by the rlocn
+	 * begins with a valid vgname.
+	 */
+	if (!ld || (ld->buf_len < dev_area->start + rlocn->offset + NAME_LEN)) {
+		if (!dev_read(dev_area->dev, dev_area->start + rlocn->offset, NAME_LEN, buf))
+			return_0;
+	} else {
+		memcpy(buf, ld->buf + dev_area->start + rlocn->offset, NAME_LEN);
+	}

 	while (buf[len] && !isspace(buf[len]) && buf[len] != '{' &&
 	       len < (NAME_LEN - 1))
@@ -1197,47 +1241,65 @@ int vgname_from_mda(const struct format_type *fmt,
 	buf[len] = '\0';

 	/* Ignore this entry if the characters aren't permissible */
-	if (!validate_name(buf))
+	if (!validate_name(buf)) {
+		log_error("Metadata location on %s at %"PRIu64" begins with invalid VG name.",
+			  dev_name(dev_area->dev), dev_area->start + rlocn->offset);
 		return_0;
+	}

 	/* We found a VG - now check the metadata */
 	if (rlocn->offset + rlocn->size > mdah->size)
 		wrap = (uint32_t) ((rlocn->offset + rlocn->size) - mdah->size);

 	if (wrap > rlocn->offset) {
-		log_error("%s: metadata too large for circular buffer",
-			  dev_name(dev_area->dev));
+		log_error("Metadata location on %s at %"PRIu64" is too large for circular buffer.",
+			  dev_name(dev_area->dev), dev_area->start + rlocn->offset);
 		return 0;
 	}

-	/* Did we see this metadata before? */
+	/*
+	 * Did we see this metadata before?
+	 * Look in lvmcache to see if there is vg info matching
+	 * the checksum/size that we see in the mda_header (rlocn)
+	 * on this device.  If so, then vgsummary->name is is set
+	 * and controls if the "checksum_only" flag passed to
+	 * text_read_metadata_summary() is 1 or 0.
+	 *
+	 * If checksum_only = 1, then text_read_metadata_summary()
+	 * will read the metadata from this device, and run the
+	 * checksum function on it.  If the calculated checksum
+	 * of the metadata matches the checksum in the mda_header,
+	 * which also matches the checksum saved in vginfo from
+	 * another device, then it skips parsing the metadata into
+	 * a config tree, which saves considerable cpu time.
+	 */
 	vgsummary->mda_checksum = rlocn->checksum;
 	vgsummary->mda_size = rlocn->size;
+	lvmcache_lookup_mda(vgsummary);

-	if (lvmcache_lookup_mda(vgsummary))
-		used_cached_metadata = 1;
-
-	/* FIXME 64-bit */
-	if (!text_vgsummary_import(fmt, dev_area->dev,
+	if (!text_read_metadata_summary(fmt, dev_area->dev, ld,
 				(off_t) (dev_area->start + rlocn->offset),
 				(uint32_t) (rlocn->size - wrap),
 				(off_t) (dev_area->start + MDA_HEADER_SIZE),
 				wrap, calc_crc, vgsummary->vgname ? 1 : 0,
-				vgsummary))
+				vgsummary)) {
+		log_error("Metadata location on %s at %"PRIu64" has invalid summary for VG.",
+			  dev_name(dev_area->dev), dev_area->start + rlocn->offset);
 		return_0;
+	}

 	/* Ignore this entry if the characters aren't permissible */
-	if (!validate_name(vgsummary->vgname))
+	if (!validate_name(vgsummary->vgname)) {
+		log_error("Metadata location on %s at %"PRIu64" has invalid VG name.",
+			  dev_name(dev_area->dev), dev_area->start + rlocn->offset);
 		return_0;
+	}

-	log_debug_metadata("%s: %s metadata at %" PRIu64 " size %" PRIu64
-			   " (in area at %" PRIu64 " size %" PRIu64
-			   ") for %s (" FMTVGID ")",
+	log_debug_metadata("Found metadata summary on %s at %"PRIu64" size %"PRIu64" for VG %s",
 			   dev_name(dev_area->dev),
-			   used_cached_metadata ? "Using cached" : "Found",
 			   dev_area->start + rlocn->offset,
-			   rlocn->size, dev_area->start, dev_area->size, vgsummary->vgname,
-			   (char *)&vgsummary->vgid);
+			   rlocn->size,
+			   vgsummary->vgname);

 	if (mda_free_sectors) {
 		current_usage = (rlocn->size + SECTOR_SIZE - UINT64_C(1)) -
@@ -1253,6 +1315,8 @@ int vgname_from_mda(const struct format_type *fmt,
 	return 1;
 }

+/* used for independent_metadata_areas */
+
 static int _scan_raw(const struct format_type *fmt, const char *vgname __attribute__((unused)))
 {
 	struct raw_list *rl;
@@ -1264,27 +1328,34 @@ static int _scan_raw(const struct format_type *fmt, const char *vgname __attribu

 	raw_list = &((struct mda_lists *) fmt->private)->raws;

+	if (!dm_list_empty(raw_list))
+		log_debug_metadata("Scanning independent raw locations for %s", vgname ? vgname : "VGs");
+
 	fid.fmt = fmt;
 	dm_list_init(&fid.metadata_areas_in_use);
 	dm_list_init(&fid.metadata_areas_ignored);

 	dm_list_iterate_items(rl, raw_list) {
+		log_debug_metadata("Scanning independent dev %s", dev_name(rl->dev_area.dev));
+
 		/* FIXME We're reading mdah twice here... */
 		if (!dev_open_readonly(rl->dev_area.dev)) {
 			stack;
 			continue;
 		}

-		if (!(mdah = raw_read_mda_header(fmt, &rl->dev_area))) {
+		if (!(mdah = raw_read_mda_header(fmt, &rl->dev_area, NULL))) {
 			stack;
 			goto close_dev;
 		}

-		/* TODO: caching as in vgname_from_mda() (trigger this code?) */
-		if (vgname_from_mda(fmt, mdah, &rl->dev_area, &vgsummary, NULL)) {
-			vg = _vg_read_raw_area(&fid, vgsummary.vgname, &rl->dev_area, NULL, NULL, 0, 0);
-			if (vg)
+		/* TODO: caching as in read_metadata_location() (trigger this code?) */
+		if (read_metadata_location_summary(fmt, mdah, NULL, &rl->dev_area, &vgsummary, NULL)) {
+			vg = _vg_read_raw_area(&fid, vgsummary.vgname, &rl->dev_area, NULL, NULL, NULL, 0);
+			if (vg) {
 				lvmcache_update_vg(vg, 0);
+				lvmcache_set_independent_location(vg->name);
+			}
 		}
 	close_dev:
 		if (!dev_close(rl->dev_area.dev))
@@ -1294,9 +1365,13 @@ static int _scan_raw(const struct format_type *fmt, const char *vgname __attribu
 	return 1;
 }

+/* used for independent_metadata_areas */
+
 static int _text_scan(const struct format_type *fmt, const char *vgname)
 {
-	return (_scan_file(fmt, vgname) & _scan_raw(fmt, vgname));
+	_scan_file(fmt, vgname);
+	_scan_raw(fmt, vgname);
+	return 1;
 }

 struct _write_single_mda_baton {
@@ -1525,36 +1600,6 @@ static uint64_t _metadata_locn_offset_raw(void *metadata_locn)
 	return mdac->area.start;
 }

-static int _text_pv_read(const struct format_type *fmt, const char *pv_name,
-		    struct physical_volume *pv, int scan_label_only)
-{
-	struct lvmcache_info *info;
-	struct device *dev;
-
-	if (!(dev = dev_cache_get(pv_name, fmt->cmd->filter)))
-		return_0;
-
-	if (lvmetad_used()) {
-		info = lvmcache_info_from_pvid(dev->pvid, dev, 0);
-		if (!info && !lvmetad_pv_lookup_by_dev(fmt->cmd, dev, NULL))
-			return 0;
-		info = lvmcache_info_from_pvid(dev->pvid, dev, 0);
-	} else {
-		struct label *label;
-		if (!(label_read(dev, &label, UINT64_C(0))))
-			return_0;
-		info = label->info;
-	}
-
-	if (!info)
-		return_0;
-
-	if (!lvmcache_populate_pv_fields(info, pv, scan_label_only))
-		return 0;
-
-	return 1;
-}
-
 static int _text_pv_initialise(const struct format_type *fmt,
 			       struct pv_create_args *pva,
 			       struct physical_volume *pv)
@@ -1748,6 +1793,8 @@ static struct metadata_area_ops _metadata_text_raw_ops = {
 	.mda_import_text = _mda_import_text_raw
 };

+/* used only for sending info to lvmetad */
+
 static int _mda_export_text_raw(struct metadata_area *mda,
 				struct dm_config_tree *cft,
 				struct dm_config_node *parent)
@@ -1755,7 +1802,13 @@ static int _mda_export_text_raw(struct metadata_area *mda,
 	struct mda_context *mdc = (struct mda_context *) mda->metadata_locn;
 	char mdah[MDA_HEADER_SIZE]; /* temporary */

-	if (!mdc || !_raw_read_mda_header((struct mda_header *)mdah, &mdc->area))
+	if (!mdc) {
+		log_error(INTERNAL_ERROR "mda_export_text_raw no mdc");
+		return 1; /* pretend the MDA does not exist */
+	}
+
+	/* FIXME: why aren't ignore,start,size,free_sectors available? */
+	if (!_raw_read_mda_header((struct mda_header *)mdah, &mdc->area, NULL))
 		return 1; /* pretend the MDA does not exist */

 	return config_make_nodes(cft, parent, NULL,
@@ -1766,6 +1819,8 @@ static int _mda_export_text_raw(struct metadata_area *mda,
 				 NULL) ? 1 : 0;
 }

+/* used only for receiving info from lvmetad */
+
 static int _mda_import_text_raw(struct lvmcache_info *info, const struct dm_config_node *cn)
 {
 	struct device *device;
@@ -1995,22 +2050,6 @@ static int _create_vg_text_instance(struct format_instance *fid,
 		}

 		if (type & FMT_INSTANCE_MDAS) {
-			/*
-			 * TODO in theory, this function should be never reached
-			 * while in critical_section(), because lvmcache's
-			 * cached_vg should be valid. However, this assumption
-			 * sometimes fails (possibly due to inconsistent
-			 * (precommit) metadata and/or missing devices), and
-			 * calling lvmcache_label_scan inside the critical
-			 * section may be fatal (i.e. deadlock).
-			 */
-			if (!critical_section())
-				/* Scan PVs in VG for any further MDAs */
-				/*
-				 * FIXME Only scan PVs believed to be in the VG.
- 				 */
-				lvmcache_label_scan(fid->fmt->cmd);
-
 			if (!(vginfo = lvmcache_vginfo_from_vgname(vg_name, vg_id)))
 				goto_out;
 			if (!lvmcache_fid_add_mdas_vg(vginfo, fid))
@@ -2409,7 +2448,6 @@ static struct format_instance *_text_create_text_instance(const struct format_ty

 static struct format_handler _text_handler = {
 	.scan = _text_scan,
-	.pv_read = _text_pv_read,
 	.pv_initialise = _text_pv_initialise,
 	.pv_setup = _text_pv_setup,
 	.pv_add_metadata_area = _text_pv_add_metadata_area,
@@ -2480,7 +2518,7 @@ static int _get_config_disk_area(struct cmd_context *cmd,
 		return 0;
 	}

-	if (!(dev_area.dev = lvmcache_device_from_pvid(cmd, &id, NULL, NULL))) {
+	if (!(dev_area.dev = lvmcache_device_from_pvid(cmd, &id, NULL))) {
 		char buffer[64] __attribute__((aligned(8)));

 		if (!id_write_format(&id, buffer, sizeof(buffer)))
--- a/lib/format_text/import-export.h
+++ b/lib/format_text/import-export.h
@@ -49,7 +49,6 @@ struct text_vg_version_ops {
 	int (*check_version) (const struct dm_config_tree * cf);
 	struct volume_group *(*read_vg) (struct format_instance * fid,
 					 const struct dm_config_tree *cf,
-					 unsigned use_cached_pvs,
 					 unsigned allow_lvmetad_extensions);
 	void (*read_desc) (struct dm_pool * mem, const struct dm_config_tree *cf,
 			   time_t *when, char **desc);
@@ -68,23 +67,24 @@ int read_segtype_lvflags(uint64_t *status, char *segtype_str);

 int text_vg_export_file(struct volume_group *vg, const char *desc, FILE *fp);
 size_t text_vg_export_raw(struct volume_group *vg, const char *desc, char **buf);
-struct volume_group *text_vg_import_file(struct format_instance *fid,
+struct volume_group *text_read_metadata_file(struct format_instance *fid,
 					 const char *file,
 					 time_t *when, char **desc);
-struct volume_group *text_vg_import_fd(struct format_instance *fid,
+struct volume_group *text_read_metadata(struct format_instance *fid,
+				       struct device *dev,
 				       const char *file,
+				       struct label_read_data *ld,
 				       struct cached_vg_fmtdata **vg_fmtdata,
 				       unsigned *use_previous_vg,
-				       int single_device,
-				       struct device *dev,
 				       off_t offset, uint32_t size,
 				       off_t offset2, uint32_t size2,
 				       checksum_fn_t checksum_fn,
 				       uint32_t checksum,
 				       time_t *when, char **desc);

-int text_vgsummary_import(const struct format_type *fmt,
+int text_read_metadata_summary(const struct format_type *fmt,
 		       struct device *dev,
+		       struct label_read_data *ld,
 		       off_t offset, uint32_t size,
 		       off_t offset2, uint32_t size2,
 		       checksum_fn_t checksum_fn,
--- a/lib/format_text/import.c
+++ b/lib/format_text/import.c
@@ -35,8 +35,9 @@ static void _init_text_import(void)
 /*
 * Find out vgname on a given device.
 */
-int text_vgsummary_import(const struct format_type *fmt,
+int text_read_metadata_summary(const struct format_type *fmt,
 		       struct device *dev,
+		       struct label_read_data *ld,
 		       off_t offset, uint32_t size,
 		       off_t offset2, uint32_t size2,
 		       checksum_fn_t checksum_fn,
@@ -45,24 +46,57 @@ int text_vgsummary_import(const struct format_type *fmt,
 {
 	struct dm_config_tree *cft;
 	struct text_vg_version_ops **vsn;
+	char *buf = NULL;
 	int r = 0;

+	if (ld) {
+		if (ld->buf_len >= (offset + size))
+			buf = ld->buf;
+		else {
+			/*
+			 * Needs data beyond the end of the ld buffer.
+			 * Will do a new synchronous read to get the data.
+			 * (scan_size could also be made larger.)
+			 */
+			log_debug_metadata("label scan buffer for %s len %u does not include metadata at %llu size %u",
+					   dev_name(dev), ld->buf_len, (unsigned long long)offset, size);
+			buf = NULL;
+		}
+	}
+
 	_init_text_import();

 	if (!(cft = config_open(CONFIG_FILE_SPECIAL, NULL, 0)))
 		return_0;

-	if ((!dev && !config_file_read(cft)) ||
-	    (dev && !config_file_read_fd(cft, dev, offset, size,
+	if (dev) {
+		if (buf)
+			log_debug_metadata("Copying metadata summary for %s at %llu size %d (+%d)",
+					   dev_name(dev), (unsigned long long)offset,
+					   size, size2);
+		else
+			log_debug_metadata("Reading metadata summary from %s at %llu size %d (+%d)",
+					    dev_name(dev), (unsigned long long)offset,
+					    size, size2);
+
+		if (!config_file_read_fd(cft, dev, buf, offset, size,
 					 offset2, size2, checksum_fn,
 					 vgsummary->mda_checksum,
-					 checksum_only, 1))) {
-		log_error("Couldn't read volume group metadata.");
-		goto out;
+					 checksum_only, 1)) {
+			/* FIXME: handle errors */
+			log_error("Couldn't read volume group metadata from %s.", dev_name(dev));
+			goto out;
+		}
+	} else {
+		if (!config_file_read(cft)) {
+			log_error("Couldn't read volume group metadata from file.");
+			goto out;
+		}
 	}

 	if (checksum_only) {
 		/* Checksum matches already-cached content - no need to reparse. */
+		log_debug_metadata("Skipped parsing metadata on %s", dev_name(dev));
 		r = 1;
 		goto out;
 	}
@@ -91,12 +125,12 @@ struct cached_vg_fmtdata {
        size_t cached_mda_size;
 };

-struct volume_group *text_vg_import_fd(struct format_instance *fid,
+struct volume_group *text_read_metadata(struct format_instance *fid,
+				       struct device *dev,
 				       const char *file,
+				       struct label_read_data *ld,
 				       struct cached_vg_fmtdata **vg_fmtdata,
 				       unsigned *use_previous_vg,
-				       int single_device,
-				       struct device *dev,
 				       off_t offset, uint32_t size,
 				       off_t offset2, uint32_t size2,
 				       checksum_fn_t checksum_fn,
@@ -106,8 +140,18 @@ struct volume_group *text_vg_import_fd(struct format_instance *fid,
 	struct volume_group *vg = NULL;
 	struct dm_config_tree *cft;
 	struct text_vg_version_ops **vsn;
+	char *buf = NULL;
 	int skip_parse;

+	/*
+	 * This struct holds the checksum and size of the VG metadata
+	 * that was read from a previous device.  When we read the VG
+	 * metadata from this device, we can skip parsing it into a
+	 * cft (saving time) if the checksum of the metadata buffer
+	 * we read from this device matches the size/checksum saved in
+	 * the mda_header/rlocn struct on this device, and matches the
+	 * size/checksum from the previous device.
+	 */
 	if (vg_fmtdata && !*vg_fmtdata &&
 	    !(*vg_fmtdata = dm_pool_zalloc(fid->mem, sizeof(**vg_fmtdata)))) {
 		log_error("Failed to allocate VG fmtdata for text format.");
@@ -127,15 +171,49 @@ struct volume_group *text_vg_import_fd(struct format_instance *fid,
 		     ((*vg_fmtdata)->cached_mda_checksum == checksum) &&
 		     ((*vg_fmtdata)->cached_mda_size == (size + size2));

-	if ((!dev && !config_file_read(cft)) ||
-	    (dev && !config_file_read_fd(cft, dev, offset, size,
+	if (ld) {
+		if (ld->buf_len >= (offset + size))
+			buf = ld->buf;
+		else {
+			/*
+			 * Needs data beyond the end of the ld buffer.
+			 * Will do a new synchronous read to get the data.
+			 * (scan_size could also be made larger.)
+			 */
+			log_debug_metadata("label scan buffer for %s len %u does not include metadata at %llu size %u",
+					   dev_name(dev), ld->buf_len, (unsigned long long)offset, size);
+			buf = NULL;
+		}
+	}
+
+	if (dev) {
+		if (buf)
+			log_debug_metadata("Copying metadata for %s at %llu size %d (+%d)",
+					   dev_name(dev), (unsigned long long)offset,
+					   size, size2);
+		else
+			log_debug_metadata("Reading metadata from %s at %llu size %d (+%d)",
+				   	   dev_name(dev), (unsigned long long)offset,
+				           size, size2);
+
+		if (!config_file_read_fd(cft, dev, buf, offset, size,
 					 offset2, size2, checksum_fn, checksum,
-					 skip_parse, 1)))
-		goto_out;
+					 skip_parse, 1)) {
+			/* FIXME: handle errors */
+			log_error("Couldn't read volume group metadata from %s.", dev_name(dev));
+			goto out;
+		}
+	} else {
+		if (!config_file_read(cft)) {
+			log_error("Couldn't read volume group metadata from file.");
+			goto out;
+		}
+	}

 	if (skip_parse) {
 		if (use_previous_vg)
 			*use_previous_vg = 1;
+		log_debug_metadata("Skipped parsing metadata on %s", dev_name(dev));
 		goto out;
 	}

@@ -146,7 +224,7 @@ struct volume_group *text_vg_import_fd(struct format_instance *fid,
 		if (!(*vsn)->check_version(cft))
 			continue;

-		if (!(vg = (*vsn)->read_vg(fid, cft, single_device, 0)))
+		if (!(vg = (*vsn)->read_vg(fid, cft, 0)))
 			goto_out;

 		(*vsn)->read_desc(vg->vgmem, cft, when, desc);
@@ -166,17 +244,20 @@ struct volume_group *text_vg_import_fd(struct format_instance *fid,
 	return vg;
 }

-struct volume_group *text_vg_import_file(struct format_instance *fid,
+struct volume_group *text_read_metadata_file(struct format_instance *fid,
 					 const char *file,
 					 time_t *when, char **desc)
 {
-	return text_vg_import_fd(fid, file, NULL, NULL, 0, NULL, (off_t)0, 0, (off_t)0, 0, NULL, 0,
+	return text_read_metadata(fid, NULL, file, NULL, NULL, NULL,
+				 (off_t)0, 0, (off_t)0, 0,
+				 NULL,
+				 0,
 				 when, desc);
 }

 static struct volume_group *_import_vg_from_config_tree(const struct dm_config_tree *cft,
 							struct format_instance *fid,
-							unsigned allow_lvmetad_extensions)
+							unsigned for_lvmetad)
 {
 	struct volume_group *vg = NULL;
 	struct text_vg_version_ops **vsn;
@@ -191,7 +272,7 @@ static struct volume_group *_import_vg_from_config_tree(const struct dm_config_t
 		 * The only path to this point uses cached vgmetadata,
 		 * so it can use cached PV state too.
 		 */
-		if (!(vg = (*vsn)->read_vg(fid, cft, 1, allow_lvmetad_extensions)))
+		if (!(vg = (*vsn)->read_vg(fid, cft, for_lvmetad)))
 			stack;
 		else if ((vg_missing = vg_missing_pv_count(vg))) {
 			log_verbose("There are %d physical volumes missing.",
--- a/lib/format_text/import_vsn1.c
+++ b/lib/format_text/import_vsn1.c
@@ -32,9 +32,7 @@ typedef int (*section_fn) (struct format_instance * fid,
 			   struct volume_group * vg, const struct dm_config_node * pvn,
 			   const struct dm_config_node * vgn,
 			   struct dm_hash_table * pv_hash,
-			   struct dm_hash_table * lv_hash,
-			   unsigned *scan_done_once,
-			   unsigned report_missing_devices);
+			   struct dm_hash_table * lv_hash);

 #define _read_int32(root, path, result) \
 	dm_config_get_uint32(root, path, (uint32_t *) (result))
@@ -180,9 +178,7 @@ static int _read_pv(struct format_instance *fid,
 		    struct volume_group *vg, const struct dm_config_node *pvn,
 		    const struct dm_config_node *vgn __attribute__((unused)),
 		    struct dm_hash_table *pv_hash,
-		    struct dm_hash_table *lv_hash __attribute__((unused)),
-		    unsigned *scan_done_once,
-		    unsigned report_missing_devices)
+		    struct dm_hash_table *lv_hash __attribute__((unused)))
 {
 	struct dm_pool *mem = vg->vgmem;
 	struct physical_volume *pv;
@@ -220,16 +216,16 @@ static int _read_pv(struct format_instance *fid,
 	/*
 	 * Convert the uuid into a device.
 	 */
-	if (!(pv->dev = lvmcache_device_from_pvid(fid->fmt->cmd, &pv->id, scan_done_once,
-                                         &pv->label_sector))) {
+	if (!(pv->dev = lvmcache_device_from_pvid(fid->fmt->cmd, &pv->id, &pv->label_sector))) {
 		char buffer[64] __attribute__((aligned(8)));

 		if (!id_write_format(&pv->id, buffer, sizeof(buffer)))
 			buffer[0] = '\0';
-		if (report_missing_devices)
+
+		if (fid->fmt->cmd && !fid->fmt->cmd->pvscan_cache_single)
 			log_error_once("Couldn't find device with uuid %s.", buffer);
 		else
-			log_very_verbose("Couldn't find device with uuid %s.", buffer);
+			log_debug_metadata("Couldn't find device with uuid %s.", buffer);
 	}

 	if (!(pv->vg_name = dm_pool_strdup(mem, vg->name)))
@@ -574,9 +570,7 @@ static int _read_lvnames(struct format_instance *fid __attribute__((unused)),
 			 struct volume_group *vg, const struct dm_config_node *lvn,
 			 const struct dm_config_node *vgn __attribute__((unused)),
 			 struct dm_hash_table *pv_hash __attribute__((unused)),
-			 struct dm_hash_table *lv_hash,
-			 unsigned *scan_done_once __attribute__((unused)),
-			 unsigned report_missing_devices __attribute__((unused)))
+			 struct dm_hash_table *lv_hash)
 {
 	struct dm_pool *mem = vg->vgmem;
 	struct logical_volume *lv;
@@ -731,9 +725,7 @@ static int _read_historical_lvnames(struct format_instance *fid __attribute__((u
 				     struct volume_group *vg, const struct dm_config_node *hlvn,
 				     const struct dm_config_node *vgn __attribute__((unused)),
 				     struct dm_hash_table *pv_hash __attribute__((unused)),
-				     struct dm_hash_table *lv_hash __attribute__((unused)),
-				     unsigned *scan_done_once __attribute__((unused)),
-				     unsigned report_missing_devices __attribute__((unused)))
+				     struct dm_hash_table *lv_hash __attribute__((unused)))
 {
 	struct dm_pool *mem = vg->vgmem;
 	struct generic_logical_volume *glv;
@@ -802,9 +794,7 @@ static int _read_historical_lvnames_interconnections(struct format_instance *fid
 						 struct volume_group *vg, const struct dm_config_node *hlvn,
 						 const struct dm_config_node *vgn __attribute__((unused)),
 						 struct dm_hash_table *pv_hash __attribute__((unused)),
-						 struct dm_hash_table *lv_hash __attribute__((unused)),
-						 unsigned *scan_done_once __attribute__((unused)),
-						 unsigned report_missing_devices __attribute__((unused)))
+						 struct dm_hash_table *lv_hash __attribute__((unused)))
 {
 	struct dm_pool *mem = vg->vgmem;
 	const char *historical_lv_name, *origin_name = NULL;
@@ -914,9 +904,7 @@ static int _read_lvsegs(struct format_instance *fid,
 			struct volume_group *vg, const struct dm_config_node *lvn,
 			const struct dm_config_node *vgn __attribute__((unused)),
 			struct dm_hash_table *pv_hash,
-			struct dm_hash_table *lv_hash,
-			unsigned *scan_done_once __attribute__((unused)),
-			unsigned report_missing_devices __attribute__((unused)))
+			struct dm_hash_table *lv_hash)
 {
 	struct logical_volume *lv;

@@ -977,12 +965,9 @@ static int _read_sections(struct format_instance *fid,
 			  struct volume_group *vg, const struct dm_config_node *vgn,
 			  struct dm_hash_table *pv_hash,
 			  struct dm_hash_table *lv_hash,
-			  int optional,
-			  unsigned *scan_done_once)
+			  int optional)
 {
 	const struct dm_config_node *n;
-	/* Only report missing devices when doing a scan */
-	unsigned report_missing_devices = scan_done_once ? !*scan_done_once : 1;

 	if (!dm_config_get_section(vgn, section, &n)) {
 		if (!optional) {
@@ -994,8 +979,7 @@ static int _read_sections(struct format_instance *fid,
 	}

 	for (n = n->child; n; n = n->sib) {
-		if (!fn(fid, vg, n, vgn, pv_hash, lv_hash,
-			scan_done_once, report_missing_devices))
+		if (!fn(fid, vg, n, vgn, pv_hash, lv_hash))
 			return_0;
 	}

@@ -1004,15 +988,13 @@ static int _read_sections(struct format_instance *fid,

 static struct volume_group *_read_vg(struct format_instance *fid,
 				     const struct dm_config_tree *cft,
-				     unsigned use_cached_pvs,
-				     unsigned allow_lvmetad_extensions)
+				     unsigned for_lvmetad)
 {
 	const struct dm_config_node *vgn;
 	const struct dm_config_value *cv;
 	const char *str, *format_str, *system_id;
 	struct volume_group *vg;
 	struct dm_hash_table *pv_hash = NULL, *lv_hash = NULL;
-	unsigned scan_done_once = use_cached_pvs;
 	uint64_t vgstatus;

 	/* skip any top-level values */
@@ -1167,15 +1149,15 @@ static struct volume_group *_read_vg(struct format_instance *fid,
 	}

 	if (!_read_sections(fid, "physical_volumes", _read_pv, vg,
-			    vgn, pv_hash, lv_hash, 0, &scan_done_once)) {
+			    vgn, pv_hash, lv_hash, 0)) {
 		log_error("Couldn't find all physical volumes for volume "
 			  "group %s.", vg->name);
 		goto bad;
 	}

-	if (allow_lvmetad_extensions)
+	if (for_lvmetad)
 		_read_sections(fid, "outdated_pvs", _read_pv, vg,
-			       vgn, pv_hash, lv_hash, 1, &scan_done_once);
+			       vgn, pv_hash, lv_hash, 1);
 	else if (dm_config_has_node(vgn, "outdated_pvs"))
 		log_error(INTERNAL_ERROR "Unexpected outdated_pvs section in metadata of VG %s.", vg->name);

@@ -1187,28 +1169,28 @@ static struct volume_group *_read_vg(struct format_instance *fid,
 	}

 	if (!_read_sections(fid, "logical_volumes", _read_lvnames, vg,
-			    vgn, pv_hash, lv_hash, 1, NULL)) {
+			    vgn, pv_hash, lv_hash, 1)) {
 		log_error("Couldn't read all logical volume names for volume "
 			  "group %s.", vg->name);
 		goto bad;
 	}

 	if (!_read_sections(fid, "historical_logical_volumes", _read_historical_lvnames, vg,
-			    vgn, pv_hash, lv_hash, 1, NULL)) {
+			    vgn, pv_hash, lv_hash, 1)) {
 		log_error("Couldn't read all historical logical volumes for volume "
 			  "group %s.", vg->name);
 		goto bad;
 	}

 	if (!_read_sections(fid, "logical_volumes", _read_lvsegs, vg,
-			    vgn, pv_hash, lv_hash, 1, NULL)) {
+			    vgn, pv_hash, lv_hash, 1)) {
 		log_error("Couldn't read all logical volumes for "
 			  "volume group %s.", vg->name);
 		goto bad;
 	}

 	if (!_read_sections(fid, "historical_logical_volumes", _read_historical_lvnames_interconnections,
-			    vg, vgn, pv_hash, lv_hash, 1, NULL)) {
+			    vg, vgn, pv_hash, lv_hash, 1)) {
 		log_error("Couldn't read all removed logical volume interconnections "
 			  "for volume group %s.", vg->name);
 		goto bad;
--- a/lib/format_text/layout.h
+++ b/lib/format_text/layout.h
@@ -81,7 +81,8 @@ struct mda_header {
 } __attribute__ ((packed));

 struct mda_header *raw_read_mda_header(const struct format_type *fmt,
-				       struct device_area *dev_area);
+				       struct device_area *dev_area,
+				       struct label_read_data *ld);

 struct mda_lists {
 	struct dm_list dirs;
@@ -103,7 +104,8 @@ struct mda_context {
 #define LVM2_LABEL "LVM2 001"
 #define MDA_SIZE_MIN (8 * (unsigned) lvm_getpagesize())

-int vgname_from_mda(const struct format_type *fmt, struct mda_header *mdah,
+int read_metadata_location_summary(const struct format_type *fmt, struct mda_header *mdah,
+		    struct label_read_data *ld,
 		    struct device_area *dev_area, struct lvmcache_vgsummary *vgsummary,
 		    uint64_t *mda_free_sectors);

--- a/lib/format_text/text_label.c
+++ b/lib/format_text/text_label.c
@@ -308,14 +308,22 @@ static int _text_initialise_label(struct labeller *l __attribute__((unused)),
 	return 1;
 }

-struct _update_mda_baton {
+struct _mda_baton {
 	struct lvmcache_info *info;
 	struct label *label;
+	struct label_read_data *ld;
 };

-static int _update_mda(struct metadata_area *mda, void *baton)
+/*
+ * FIXME: optimize reads when there is a second mda at the end
+ * of the PV.  For the second mda we should also have a single
+ * large read covering mda_header and metadata, and we should
+ * be able to reuse it in vg_read.
+ */
+
+static int _read_mda_header_and_metadata(struct metadata_area *mda, void *baton)
 {
-	struct _update_mda_baton *p = baton;
+	struct _mda_baton *p = baton;
 	const struct format_type *fmt = p->label->labeller->fmt;
 	struct mda_context *mdac = (struct mda_context *) mda->metadata_locn;
 	struct mda_header *mdah;
@@ -334,7 +342,7 @@ static int _update_mda(struct metadata_area *mda, void *baton)
 		return 1;
 	}

-	if (!(mdah = raw_read_mda_header(fmt, &mdac->area))) {
+	if (!(mdah = raw_read_mda_header(fmt, &mdac->area, p->ld))) {
 		stack;
 		goto close_dev;
 	}
@@ -350,7 +358,7 @@ static int _update_mda(struct metadata_area *mda, void *baton)
 		return 1;
 	}

-	if (vgname_from_mda(fmt, mdah, &mdac->area, &vgsummary,
+	if (read_metadata_location_summary(fmt, mdah, p->ld, &mdac->area, &vgsummary,
 			     &mdac->free_sectors) &&
 	    !lvmcache_update_vgname_and_id(p->info, &vgsummary)) {
 		if (!dev_close(mdac->area.dev))
@@ -365,22 +373,29 @@ close_dev:
 	return 1;
 }

-static int _text_read(struct labeller *l, struct device *dev, void *buf,
-		 struct label **label)
+/*
+ * When label_read_data *ld is set, it means that we have read the first
+ * ld->buf_len bytes of the device and already have that data, so we don't need
+ * to do any dev_read's (as long as the desired dev_read offset+size is less
+ * then ld->buf_len).
+ */
+
+static int _text_read(struct labeller *l, struct device *dev, void *label_buf,
+		      struct label_read_data *ld, struct label **label)
 {
-	struct label_header *lh = (struct label_header *) buf;
+	struct label_header *lh = (struct label_header *) label_buf;
 	struct pv_header *pvhdr;
 	struct pv_header_extension *pvhdr_ext;
 	struct lvmcache_info *info;
 	struct disk_locn *dlocn_xl;
 	uint64_t offset;
 	uint32_t ext_version;
-	struct _update_mda_baton baton;
+	struct _mda_baton baton;

 	/*
 	 * PV header base
 	 */
-	pvhdr = (struct pv_header *) ((char *) buf + xlate32(lh->offset_xl));
+	pvhdr = (struct pv_header *) ((char *) label_buf + xlate32(lh->offset_xl));

 	if (!(info = lvmcache_add(l, (char *)pvhdr->pv_uuid, dev,
 				  FMT_TEXT_ORPHAN_VG_NAME,
@@ -436,9 +451,9 @@ static int _text_read(struct labeller *l, struct device *dev, void *buf,
 out:
 	baton.info = info;
 	baton.label = *label;
+	baton.ld = ld;

-	if (!lvmcache_foreach_mda(info, _update_mda, &baton))
-		return_0;
+	lvmcache_foreach_mda(info, _read_mda_header_and_metadata, &baton);

 	lvmcache_make_valid(info);

--- a/lib/label/label.c
+++ b/lib/label/label.c
--- a/lib/label/label.h
+++ b/lib/label/label.h
@@ -18,6 +18,7 @@

 #include "uuid.h"
 #include "device.h"
+#include "toolcontext.h"

 #define LABEL_ID "LABELONE"
 #define LABEL_SIZE SECTOR_SIZE	/* Think very carefully before changing this */
@@ -28,6 +29,22 @@ struct labeller;

 void allow_reads_with_lvmetad(void);

+struct label_read_data {
+	struct dev_async_io *aio;
+	char *buf; /* points to aio->buf */
+	struct device *dev;
+	struct dm_list list;
+	int buf_len; /* same as aio->buf_len */
+	int result; /* same as aio->result */
+	unsigned io_started:1;
+	unsigned io_done:1;
+	unsigned process_done:1;
+	unsigned try_sync:1;
+	unsigned mem_limit:1;
+	unsigned event_limit:1;
+	unsigned common_buf:1;
+};
+
 /* On disk - 32 bytes */
 struct label_header {
 	int8_t id[8];		/* LABELONE */
@@ -63,7 +80,8 @@ struct label_ops {
 	 * Read a label from a volume.
 	 */
 	int (*read) (struct labeller * l, struct device * dev,
-		     void *buf, struct label ** label);
+		     void *label_buf,
+		     struct label_read_data *ld, struct label ** label);

 	/*
 	 * Additional consistency checks for the paranoid.
@@ -99,11 +117,16 @@ int label_register_handler(struct labeller *handler);
 struct labeller *label_get_handler(const char *name);

 int label_remove(struct device *dev);
-int label_read(struct device *dev, struct label **result,
-		uint64_t scan_sector);
+int label_read(struct device *dev, struct label **label, uint64_t scan_sector);
 int label_write(struct device *dev, struct label *label);
 int label_verify(struct device *dev);
 struct label *label_create(struct labeller *labeller);
 void label_destroy(struct label *label);

+int label_scan_force(struct cmd_context *cmd);
+int label_scan(struct cmd_context *cmd);
+int label_scan_devs(struct cmd_context *cmd, struct dm_list *devs);
+struct label_read_data *get_label_read_data(struct cmd_context *cmd, struct device *dev);
+void label_scan_destroy(struct cmd_context *cmd);
+
 #endif
--- a/lib/metadata/metadata-exported.h
+++ b/lib/metadata/metadata-exported.h
@@ -366,6 +366,20 @@ struct pv_segment {
 */
 #define FMT_INSTANCE_PRIVATE_MDAS	0x00000008U

+
+/*
+ * Each VG has its own fid struct.  The fid for a VG describes where
+ * the metadata for that VG can be found.  The lists hold mda locations.
+ *
+ * label scan finds the metadata locations (devs and offsets) for a VG,
+ * and saves this info in lvmcache vginfo/info lists.
+ *
+ * vg_read() then creates an fid for a given VG, and the mda locations
+ * from lvmcache are copied onto the fid lists.  Those mda locations
+ * are read again by vg_read() to get VG metadata that is used to
+ * create the 'vg' struct.
+ */
+
 struct format_instance {
 	unsigned ref_count;	/* Refs to this fid from VG and PV structs */
 	struct dm_pool *mem;
--- a/lib/metadata/metadata-liblvm.c
+++ b/lib/metadata/metadata-liblvm.c
@@ -314,7 +314,7 @@ struct physical_volume *pvcreate_vol(struct cmd_context *cmd, const char *pv_nam
 	}

 	if (pp->pva.idp) {
-		if ((dev = lvmcache_device_from_pvid(cmd, pp->pva.idp, NULL, NULL)) &&
+		if ((dev = lvmcache_device_from_pvid(cmd, pp->pva.idp, NULL)) &&
 		    (dev != dev_cache_get(pv_name, cmd->full_filter))) {
 			if (!id_write_format((const struct id*)&pp->pva.idp->uuid,
 			    buffer, sizeof(buffer)))
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -34,15 +34,15 @@
 #include "lvmlockd.h"
 #include "time.h"
 #include "lvmnotify.h"
+#include "label.h"

 #include <math.h>
 #include <sys/param.h>

 static struct physical_volume *_pv_read(struct cmd_context *cmd,
-					struct dm_pool *pvmem,
-					const char *pv_name,
-					struct format_instance *fid,
-					uint32_t warn_flags, int scan_label_only);
+					const struct format_type *fmt,
+					struct volume_group *vg,
+					struct lvmcache_info *info);

 static int _alignment_overrides_default(unsigned long data_alignment,
 					unsigned long default_pe_align)
@@ -331,37 +331,6 @@ bad:
 	return NULL;
 }

-int get_pv_from_vg_by_id(const struct format_type *fmt, const char *vg_name,
-			 const char *vgid, const char *pvid,
-			 struct physical_volume *pv)
-{
-	struct volume_group *vg;
-	struct pv_list *pvl;
-	uint32_t warn_flags = WARN_PV_READ | WARN_INCONSISTENT;
-	int r = 0, consistent = 0;
-
-	if (!(vg = vg_read_internal(fmt->cmd, vg_name, vgid, warn_flags, &consistent))) {
-		log_error("get_pv_from_vg_by_id: vg_read_internal failed to read VG %s",
-			  vg_name);
-		return 0;
-	}
-
-	dm_list_iterate_items(pvl, &vg->pvs) {
-		if (id_equal(&pvl->pv->id, (const struct id *) pvid)) {
-			if (!_copy_pv(fmt->cmd->mem, pv, pvl->pv)) {
-				log_error("internal PV duplication failed");
-				r = 0;
-				goto out;
-			}
-			r = 1;
-			goto out;
-		}
-	}
-out:
-	release_vg(vg);
-	return r;
-}
-
 static int _move_pv(struct volume_group *vg_from, struct volume_group *vg_to,
 		    const char *pv_name, int enforce_pv_from_source)
 {
@@ -714,6 +683,10 @@ int check_pv_dev_sizes(struct volume_group *vg)
 * source file.  All the following and more are only used by liblvm:
 *
 * . get_pvs()
+ * . get_vgids()
+ * . get_vgnames()
+ * . lvmcache_get_vgids()
+ * . lvmcache_get_vgnames()
 * . the vg->pvs_to_write list and pv_to_write struct
 */

@@ -3249,9 +3222,7 @@ static int _check_mda_in_use(struct metadata_area *mda, void *_in_use)
 struct _vg_read_orphan_baton {
 	struct cmd_context *cmd;
 	struct volume_group *vg;
-	uint32_t warn_flags;
-	int consistent;
-	int repair;
+	const struct format_type *fmt;
 };

 /*
@@ -3348,8 +3319,7 @@ static int _vg_read_orphan_pv(struct lvmcache_info *info, void *baton)
 	uint32_t ext_version;
 	uint32_t ext_flags;

-	if (!(pv = _pv_read(b->vg->cmd, b->vg->vgmem, dev_name(lvmcache_device(info)),
-			    b->vg->fid, b->warn_flags, 0))) {
+	if (!(pv = _pv_read(b->cmd, b->fmt, b->vg, info))) {
 		stack;
 		return 1;
 	}
@@ -3456,10 +3426,22 @@ static struct volume_group *_vg_read_orphans(struct cmd_context *cmd,
 	vg->free_count = 0;

 	baton.cmd = cmd;
-	baton.warn_flags = warn_flags;
+	baton.fmt = fmt;
 	baton.vg = vg;
-	baton.consistent = 1;
-	baton.repair = *consistent;
+
+	/*
+	 * vg_read for a normal VG will rescan labels for all the devices
+	 * in the VG, in case something changed on disk between the initial
+	 * label scan and acquiring the VG lock.  We don't rescan labels
+	 * here because this is only called in two ways:
+	 *
+	 * 1. for reporting, in which case it doesn't matter if something
+	 *    changed between the label scan and printing the PVs here
+	 *
+	 * 2. pvcreate_each_device() for pvcreate//vgcreate/vgextend,
+	 *    which already does the label rescan after taking the
+	 *    orphan lock.
+	 */

 	while ((pvl = (struct pv_list *) dm_list_first(&head.list))) {
 		dm_list_del(&pvl->list);
@@ -3471,7 +3453,6 @@ static struct volume_group *_vg_read_orphans(struct cmd_context *cmd,
 	if (!lvmcache_foreach_pv(vginfo, _vg_read_orphan_pv, &baton))
 		return_NULL;

-	*consistent = baton.consistent;
 	return vg;
 }

@@ -3792,7 +3773,6 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 	struct dm_list *pvids;
 	struct pv_list *pvl;
 	struct dm_list all_pvs;
-	unsigned seqno = 0;
 	int reappeared = 0;
 	struct cached_vg_fmtdata *vg_fmtdata = NULL;	/* Additional format-specific data about the vg */
 	unsigned use_previous_vg;
@@ -3809,7 +3789,7 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 	}

 	if (lvmetad_used() && !use_precommitted) {
-		if ((correct_vg = lvmcache_get_vg(cmd, vgname, vgid, precommitted))) {
+		if ((correct_vg = lvmetad_vg_lookup(cmd, vgname, vgid))) {
 			dm_list_iterate_items(pvl, &correct_vg->pvs)
 				reappeared += _check_reappeared_pv(correct_vg, pvl->pv, *consistent);
 			if (reappeared && *consistent)
@@ -3840,36 +3820,27 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 	}

 	/*
-	 * If cached metadata was inconsistent and *consistent is set
-	 * then repair it now.  Otherwise just return it.
-	 * Also return if use_precommitted is set due to the FIXME in
-	 * the missing PV logic below.
+	 * Rescan the devices that are associated with this vg in lvmcache.
+	 * This repeats what was done by the command's initial label scan,
+	 * but only the devices associated with this VG.
+	 *
+	 * The lvmcache info about these devs is from the initial label scan
+	 * performed by the command before the vg lock was held.  Now the VG
+	 * lock is held, so we rescan all the info from the devs in case
+	 * something changed between the initial scan and now that the lock
+	 * is held.
 	 */
-	if ((correct_vg = lvmcache_get_vg(cmd, vgname, vgid, precommitted)) &&
-	    (use_precommitted || !*consistent)) {
-		*consistent = 1;
-		return correct_vg;
-	} else {
-		if (correct_vg && correct_vg->seqno > seqno)
-			seqno = correct_vg->seqno;
-		release_vg(correct_vg);
-		correct_vg = NULL;
+	log_debug_metadata("Reading VG rereading labels for %s", vgname);
+
+	if (!lvmcache_label_rescan_vg(cmd, vgname, vgid)) {
+		/* The VG wasn't found, so force a full label scan. */
+		lvmcache_force_next_label_scan();
+		lvmcache_label_scan(cmd);
 	}

-
-	/* Find the vgname in the cache */
-	/* If it's not there we must do full scan to be completely sure */
-	if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 1))) {
-		lvmcache_label_scan(cmd);
-		if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 1))) {
-			/* Independent MDAs aren't supported under low memory */
-			if (!cmd->independent_metadata_areas && critical_section())
-				return_NULL;
-			lvmcache_force_next_label_scan();
-			lvmcache_label_scan(cmd);
-			if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 0)))
-				return_NULL;
-		}
+	if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 0))) {
+		log_debug_metadata("Cache did not find fmt for vgname %s", vgname);
+		return_NULL;
 	}

 	/* Now determine the correct vgname if none was supplied */
@@ -3887,6 +3858,36 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 	if (use_precommitted && !(fmt->features & FMT_PRECOMMIT))
 		use_precommitted = 0;

+	/*
+	 * A "format instance" is an abstraction for a VG location,
+	 * i.e. where a VG's metadata exists on disk.
+	 *
+	 * An fic (format_instance_ctx) is a temporary struct used
+	 * to create an fid (format_instance).  The fid hangs around
+	 * and is used to create a 'vg' to which it connected (vg->fid).
+	 *
+	 * The 'fic' describes a VG in terms of fmt/name/id.
+	 *
+	 * The 'fid' describes a VG in more detail than the fic,
+	 * holding information about where to find the VG metadata.
+	 *
+	 * The 'vg' describes the VG in the most detail representing
+	 * all the VG metadata.
+	 *
+	 * The fic and fid are set up by create_instance() to describe
+	 * the VG location.  This happens before the VG metadata is
+	 * assembled into the more familiar struct volume_group "vg".
+	 *
+	 * The fid has one main purpose: to keep track of the metadata
+	 * locations for a given VG.  It does this by putting 'mda'
+	 * structs on fid->metadata_areas_in_use, which specify where
+	 * metadata is located on disk.  It gets this information
+	 * (metadata locations for a specific VG) from the command's
+	 * initial label scan.  The info is passed indirectly via
+	 * lvmcache info/vginfo structs, which are created by the
+	 * label scan and then copied into fid by create_instance().
+	 */
+
 	/* create format instance with appropriate metadata area */
 	fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
 	fic.context.vg_ref.vg_name = vgname;
@@ -3910,12 +3911,17 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 	/* Ensure contents of all metadata areas match - else do recovery */
 	inconsistent_mda_count=0;
 	dm_list_iterate_items(mda, &fid->metadata_areas_in_use) {
+		struct device *mda_dev = mda_get_device(mda);
+		struct label_read_data *ld;
+
 		use_previous_vg = 0;

-		if ((use_precommitted &&
-		     !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
-		    (!use_precommitted &&
-		     !(vg = mda->ops->vg_read(fid, vgname, mda, &vg_fmtdata, &use_previous_vg, 0)) && !use_previous_vg)) {
+		log_debug_metadata("Reading VG %s from %s", vgname, dev_name(mda_dev));
+
+		ld = get_label_read_data(cmd, mda_dev);
+
+		if ((use_precommitted && !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, ld, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
+		    (!use_precommitted && !(vg = mda->ops->vg_read(fid, vgname, mda, ld, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg)) {
 			inconsistent = 1;
 			vg_fmtdata = NULL;
 			continue;
@@ -4105,9 +4111,9 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
 			use_previous_vg = 0;

 			if ((use_precommitted &&
-			     !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
+			     !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, NULL, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
 			    (!use_precommitted &&
-			     !(vg = mda->ops->vg_read(fid, vgname, mda, &vg_fmtdata, &use_previous_vg, 0)) && !use_previous_vg)) {
+			     !(vg = mda->ops->vg_read(fid, vgname, mda, NULL, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg)) {
 				inconsistent = 1;
 				vg_fmtdata = NULL;
 				continue;
@@ -4500,21 +4506,10 @@ static struct volume_group *_vg_read_by_vgid(struct cmd_context *cmd,
 					    unsigned precommitted)
 {
 	const char *vgname;
-	struct dm_list *vgnames;
 	struct volume_group *vg;
-	struct dm_str_list *strl;
 	uint32_t warn_flags = WARN_PV_READ | WARN_INCONSISTENT;
 	int consistent = 0;

-	/* Is corresponding vgname already cached? */
-	if (lvmcache_vgid_is_cached(vgid)) {
-		if ((vg = _vg_read(cmd, NULL, vgid, warn_flags, &consistent, precommitted)) &&
-		    id_equal(&vg->id, (const struct id *)vgid)) {
-			return vg;
-		}
-		release_vg(vg);
-	}
-
 	/*
 	 * When using lvmlockd we should never reach this point.
 	 * The VG is locked, then vg_read() is done, which gets
@@ -4527,36 +4522,28 @@ static struct volume_group *_vg_read_by_vgid(struct cmd_context *cmd,

 	/* Mustn't scan if memory locked: ensure cache gets pre-populated! */
 	if (critical_section())
-		return_NULL;
+		log_debug_metadata("Reading VG by vgid in critical section pre %d vgid %.8s", precommitted, vgid);

-	/* FIXME Need a genuine read by ID here - don't vg_read_internal by name! */
-	/* FIXME Disabled vgrenames while active for now because we aren't
-	 *       allowed to do a full scan here any more. */
+	if (!(vgname = lvmcache_vgname_from_vgid(cmd->mem, vgid))) {
+		log_debug_metadata("Reading VG by vgid %.8s no VG name found, retrying.", vgid);
+		lvmcache_destroy(cmd, 0, 0);
+		lvmcache_force_next_label_scan();
+		lvmcache_label_scan(cmd);
+	}

-	// The slow way - full scan required to cope with vgrename
-	lvmcache_force_next_label_scan();
-	lvmcache_label_scan(cmd);
-	if (!(vgnames = get_vgnames(cmd, 0))) {
-		log_error("vg_read_by_vgid: get_vgnames failed");
+	if (!(vgname = lvmcache_vgname_from_vgid(cmd->mem, vgid))) {
+		log_debug_metadata("Reading VG by vgid %.8s no VG name found.", vgid);
 		return NULL;
 	}

-	dm_list_iterate_items(strl, vgnames) {
-		vgname = strl->str;
-		if (!vgname)
-			continue;	// FIXME Unnecessary?
-		consistent = 0;
-		if ((vg = _vg_read(cmd, vgname, vgid, warn_flags, &consistent, precommitted)) &&
-		    id_equal(&vg->id, (const struct id *)vgid)) {
-			if (!consistent) {
-				release_vg(vg);
-				return NULL;
-			}
-			return vg;
-		}
-		release_vg(vg);
+	consistent = 0;
+
+	if ((vg = _vg_read(cmd, vgname, vgid, warn_flags, &consistent, precommitted))) {
+		/* Does it matter if consistent is 0 or 1? */
+		return vg;
 	}

+	log_debug_metadata("Reading VG by vgid %.8s not found.", vgid);
 	return NULL;
 }

@@ -4572,7 +4559,7 @@ struct logical_volume *lv_from_lvid(struct cmd_context *cmd, const char *lvid_s,

 	log_very_verbose("Finding %svolume group for uuid %s", precommitted ? "precommitted " : "", lvid_s);
 	if (!(vg = _vg_read_by_vgid(cmd, (const char *)lvid->id[0].uuid, precommitted))) {
-		log_error("Volume group for uuid not found: %s", lvid_s);
+		log_error("Reading VG not found for LVID %s", lvid_s);
 		return NULL;
 	}

@@ -4641,86 +4628,40 @@ const char *find_vgname_from_pvname(struct cmd_context *cmd,
 	return find_vgname_from_pvid(cmd, pvid);
 }

-/* FIXME Use label functions instead of PV functions */
 static struct physical_volume *_pv_read(struct cmd_context *cmd,
-					struct dm_pool *pvmem,
-					const char *pv_name,
-					struct format_instance *fid,
-					uint32_t warn_flags, int scan_label_only)
+					const struct format_type *fmt,
+					struct volume_group *vg,
+					struct lvmcache_info *info)
 {
 	struct physical_volume *pv;
-	struct label *label;
-	struct lvmcache_info *info;
-	struct device *dev;
-	const struct format_type *fmt;
-	int found;
+	struct device *dev = lvmcache_device(info);

-	if (!(dev = dev_cache_get(pv_name, cmd->filter)))
-		return_NULL;
-
-	if (lvmetad_used()) {
-		info = lvmcache_info_from_pvid(dev->pvid, dev, 0);
-		if (!info) {
-			if (!lvmetad_pv_lookup_by_dev(cmd, dev, &found))
-				return_NULL;
-			if (!found) {
-				if (warn_flags & WARN_PV_READ)
-					log_error("No physical volume found in lvmetad cache for %s",
-						  pv_name);
-				return NULL;
-			}
-			if (!(info = lvmcache_info_from_pvid(dev->pvid, dev, 0))) {
-				if (warn_flags & WARN_PV_READ)
-					log_error("No cache info in lvmetad cache for %s.",
-						  pv_name);
-				return NULL;
-			}
-		}
-		label = lvmcache_get_label(info);
-	} else {
-		if (!(label_read(dev, &label, UINT64_C(0)))) {
-			if (warn_flags & WARN_PV_READ)
-				log_error("No physical volume label read from %s",
-					  pv_name);
-			return NULL;
-		}
-		info = (struct lvmcache_info *) label->info;
-	}
-
-	fmt = lvmcache_fmt(info);
-
-	pv = _alloc_pv(pvmem, dev);
-	if (!pv) {
-		log_error("pv allocation for '%s' failed", pv_name);
+	if (!(pv = _alloc_pv(vg->vgmem, NULL))) {
+		log_error("pv allocation failed");
 		return NULL;
 	}

-	pv->label_sector = label->sector;
-
-	/* FIXME Move more common code up here */
-	if (!(lvmcache_fmt(info)->ops->pv_read(lvmcache_fmt(info), pv_name, pv, scan_label_only))) {
-		log_error("Failed to read existing physical volume '%s'",
-			  pv_name);
-		goto bad;
+	if (fmt->ops->pv_read) {
+		/* format1 and pool */
+		if (!(fmt->ops->pv_read(fmt, dev_name(dev), pv, 0))) {
+			log_error("Failed to read existing physical volume '%s'", dev_name(dev));
+			goto bad;
+		}
+	} else {
+		/* format text */
+		if (!lvmcache_populate_pv_fields(info, vg, pv))
+			goto_bad;
 	}

-	if (!pv->size)
-		goto bad;
-
-	if (!alloc_pv_segment_whole_pv(pvmem, pv))
+	if (!alloc_pv_segment_whole_pv(vg->vgmem, pv))
 		goto_bad;

-	if (fid)
-		lvmcache_fid_add_mdas(info, fid, (const char *) &pv->id, ID_LEN);
-	else {
-		lvmcache_fid_add_mdas(info, fmt->orphan_vg->fid, (const char *) &pv->id, ID_LEN);
-		pv_set_fid(pv, fmt->orphan_vg->fid);
-	}
-
+	lvmcache_fid_add_mdas(info, vg->fid, (const char *) &pv->id, ID_LEN);
+	pv_set_fid(pv, vg->fid);
 	return pv;
 bad:
 	free_pv_fid(pv);
-	dm_pool_free(pvmem, pv);
+	dm_pool_free(vg->vgmem, pv);
 	return NULL;
 }

--- a/lib/metadata/metadata.h
+++ b/lib/metadata/metadata.h
@@ -25,6 +25,8 @@
 #include "dev-cache.h"
 #include "lvm-string.h"
 #include "metadata-exported.h"
+#include "lvm-logging.h"
+#include "label.h"

 //#define MAX_STRIPES 128U
 //#define SECTOR_SHIFT 9L
@@ -79,12 +81,13 @@ struct metadata_area_ops {
 	struct volume_group *(*vg_read) (struct format_instance * fi,
 					 const char *vg_name,
 					 struct metadata_area * mda,
+					 struct label_read_data *ld,
 					 struct cached_vg_fmtdata **vg_fmtdata,
-					 unsigned *use_previous_vg,
-					 int single_device);
+					 unsigned *use_previous_vg);
 	struct volume_group *(*vg_read_precommit) (struct format_instance * fi,
 					 const char *vg_name,
 					 struct metadata_area * mda,
+					 struct label_read_data *ld,
 					 struct cached_vg_fmtdata **vg_fmtdata,
 					 unsigned *use_previous_vg);
 	/*
@@ -176,6 +179,11 @@ void mda_set_ignored(struct metadata_area *mda, unsigned mda_ignored);
 unsigned mda_locns_match(struct metadata_area *mda1, struct metadata_area *mda2);
 struct device *mda_get_device(struct metadata_area *mda);

+/*
+ * fic is used to create an fid.  It's used to pass fmt/vgname/vgid args
+ * to create_instance() which creates an fid for the specified vg.
+ */
+
 struct format_instance_ctx {
 	uint32_t type;
 	union {
@@ -360,12 +368,6 @@ uint32_t vg_bad_status_bits(const struct volume_group *vg, uint64_t status);
 int add_pv_to_vg(struct volume_group *vg, const char *pv_name,
 		 struct physical_volume *pv, int new_pv);

-
-/* Find a PV within a given VG */
-int get_pv_from_vg_by_id(const struct format_type *fmt, const char *vg_name,
-			 const char *vgid, const char *pvid,
-			 struct physical_volume *pv);
-
 struct logical_volume *find_lv_in_vg_by_lvid(struct volume_group *vg,
 					     const union lvid *lvid);

--- a/lib/metadata/vg.c
+++ b/lib/metadata/vg.c
@@ -97,11 +97,6 @@ void release_vg(struct volume_group *vg)
 	if (!vg || (vg->fid && vg == vg->fid->fmt->orphan_vg))
 		return;

-	/* Check if there are any vginfo holders */
-	if (vg->vginfo &&
-	    !lvmcache_vginfo_holders_dec_and_test_for_zero(vg->vginfo))
-		return;
-
 	release_vg(vg->vg_committed);
 	release_vg(vg->vg_precommitted);
 	if (vg->cft_precommitted)
--- a/lib/misc/lvm-globals.c
+++ b/lib/misc/lvm-globals.c
@@ -54,8 +54,6 @@ static int _activation_checks = 0;
 static char _sysfs_dir_path[PATH_MAX] = "";
 static int _dev_disable_after_error_count = DEFAULT_DISABLE_AFTER_ERROR_COUNT;
 static uint64_t _pv_min_size = (DEFAULT_PV_MIN_SIZE_KB * 1024L >> SECTOR_SHIFT);
-static int _detect_internal_vg_cache_corruption =
-	DEFAULT_DETECT_INTERNAL_VG_CACHE_CORRUPTION;
 static const char *_unknown_device_name = DEFAULT_UNKNOWN_DEVICE_NAME;

 void init_verbose(int level)
@@ -198,11 +196,6 @@ void init_pv_min_size(uint64_t sectors)
 	_pv_min_size = sectors;
 }

-void init_detect_internal_vg_cache_corruption(int detect)
-{
-	_detect_internal_vg_cache_corruption = detect;
-}
-
 void set_cmd_name(const char *cmd)
 {
 	strncpy(_cmd_name, cmd, sizeof(_cmd_name) - 1);
@@ -387,11 +380,6 @@ uint64_t pv_min_size(void)
 	return _pv_min_size;
 }

-int detect_internal_vg_cache_corruption(void)
-{
-	return _detect_internal_vg_cache_corruption;
-}
-
 const char *unknown_device_name(void)
 {
 	return _unknown_device_name;
--- a/lib/misc/lvm-globals.h
+++ b/lib/misc/lvm-globals.h
@@ -51,7 +51,6 @@ void init_udev_checking(int checking);
 void init_dev_disable_after_error_count(int value);
 void init_pv_min_size(uint64_t sectors);
 void init_activation_checks(int checks);
-void init_detect_internal_vg_cache_corruption(int detect);
 void init_retry_deactivation(int retry);
 void init_unknown_device_name(const char *name);

@@ -85,7 +84,6 @@ int udev_checking(void);
 const char *sysfs_dir_path(void);
 uint64_t pv_min_size(void);
 int activation_checks(void);
-int detect_internal_vg_cache_corruption(void);
 int retry_deactivation(void);
 const char *unknown_device_name(void);

--- a/liblvm/Makefile.in
+++ b/liblvm/Makefile.in
@@ -45,6 +45,10 @@ include $(top_builddir)/make.tmpl
 LDFLAGS += -L$(top_builddir)/lib -L$(top_builddir)/daemons/dmeventd
 LIBS += $(LVMINTERNAL_LIBS) -ldevmapper

+ifeq ("@AIO@", "yes")
+	LIBS += $(AIO_LIBS)
+endif
+
 .PHONY: install_dynamic install_static install_include install_pkgconfig

 INSTALL_TYPE = install_dynamic
--- a/make.tmpl.in
+++ b/make.tmpl.in
@@ -64,6 +64,7 @@ LDDEPS += @LDDEPS@
 LIB_SUFFIX = @LIB_SUFFIX@
 LVMINTERNAL_LIBS = -llvm-internal $(DMEVENT_LIBS) $(DAEMON_LIBS) $(SYSTEMD_LIBS) $(UDEV_LIBS) $(DL_LIBS) $(BLKID_LIBS)
 DL_LIBS = @DL_LIBS@
+AIO_LIBS = @AIO_LIBS@
 RT_LIBS = @RT_LIBS@
 M_LIBS = @M_LIBS@
 PTHREAD_LIBS = @PTHREAD_LIBS@
--- a/scripts/Makefile.in
+++ b/scripts/Makefile.in
@@ -31,6 +31,10 @@ endif
 	LVMLIBS = @LVM2APP_LIB@ -ldevmapper
 endif

+ifeq ("@AIO@", "yes")
+	LVMLIBS += $(AIO_LIBS)
+endif
+
 LVM_SCRIPTS = lvmdump.sh lvmconf.sh
 DM_SCRIPTS =

--- a/tools/Makefile.in
+++ b/tools/Makefile.in
@@ -109,6 +109,10 @@ ifeq ("@CMDLIB@", "yes")
 	INSTALL_LVM_TARGETS += $(INSTALL_CMDLIB_TARGETS)
 endif

+ifeq ("@AIO@", "yes")
+	LVMLIBS += $(AIO_LIBS)
+endif
+
 EXPORTED_HEADER = $(srcdir)/lvm2cmd.h
 EXPORTED_FN_PREFIX = lvm2

--- a/tools/commands.h
+++ b/tools/commands.h
@@ -43,7 +43,7 @@ xx(lastlog,

 xx(lvchange,
   "Change the attributes of logical volume(s)",
-   CACHE_VGMETADATA | PERMITTED_READ_ONLY)
+   PERMITTED_READ_ONLY)

 xx(lvconvert,
   "Change logical volume layout",
@@ -127,7 +127,7 @@ xx(pvdata,

 xx(pvdisplay,
   "Display various attributes of physical volume(s)",
-   CACHE_VGMETADATA | PERMITTED_READ_ONLY | ENABLE_ALL_DEVS | ENABLE_DUPLICATE_DEVS | LOCKD_VG_SH)
+   PERMITTED_READ_ONLY | ENABLE_ALL_DEVS | ENABLE_DUPLICATE_DEVS | LOCKD_VG_SH)

 /* ALL_VGS_IS_DEFAULT is for polldaemon to find pvmoves in-progress using process_each_vg. */

@@ -145,7 +145,7 @@ xx(pvremove,

 xx(pvs,
   "Display information about physical volumes",
-   CACHE_VGMETADATA | PERMITTED_READ_ONLY | ALL_VGS_IS_DEFAULT | ENABLE_ALL_DEVS | ENABLE_DUPLICATE_DEVS | LOCKD_VG_SH)
+   PERMITTED_READ_ONLY | ALL_VGS_IS_DEFAULT | ENABLE_ALL_DEVS | ENABLE_DUPLICATE_DEVS | LOCKD_VG_SH)

 xx(pvscan,
   "List all physical volumes",
@@ -173,7 +173,7 @@ xx(vgcfgrestore,

 xx(vgchange,
   "Change volume group attributes",
-   CACHE_VGMETADATA | PERMITTED_READ_ONLY | ALL_VGS_IS_DEFAULT)
+   PERMITTED_READ_ONLY | ALL_VGS_IS_DEFAULT)

 xx(vgck,
   "Check the consistency of volume group(s)",
--- a/tools/lvmcmdline.c
+++ b/tools/lvmcmdline.c
@@ -2280,7 +2280,6 @@ static int _get_current_settings(struct cmd_context *cmd)

 	cmd->current_settings.archive = arg_int_value(cmd, autobackup_ARG, cmd->current_settings.archive);
 	cmd->current_settings.backup = arg_int_value(cmd, autobackup_ARG, cmd->current_settings.backup);
-	cmd->current_settings.cache_vgmetadata = cmd->cname->flags & CACHE_VGMETADATA ? 1 : 0;

 	if (arg_is_set(cmd, readonly_ARG)) {
 		cmd->current_settings.activation = 0;
--- a/tools/pvscan.c
+++ b/tools/pvscan.c
@@ -300,8 +300,10 @@ static int _pvscan_autoactivate(struct cmd_context *cmd, struct pvscan_aa_params
 static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 {
 	struct pvscan_aa_params pp = { 0 };
+	struct dm_list single_devs;
 	struct dm_list found_vgnames;
 	struct device *dev;
+	struct device_list *devl;
 	const char *pv_name;
 	const char *reason = NULL;
 	int32_t major = -1;
@@ -434,8 +436,12 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 	 *  to drop any devices that have left.)
 	 */

-	if (argc || devno_args)
+	if (argc || devno_args) {
 		log_verbose("Scanning devices on command line.");
+		cmd->pvscan_cache_single = 1;
+	}
+
+	dm_list_init(&single_devs);

 	while (argc--) {
 		pv_name = *argv++;
@@ -453,8 +459,11 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 			} else {
 				/* Add device path to lvmetad. */
 				log_debug("Scanning dev %s for lvmetad cache.", pv_name);
-				if (!lvmetad_pvscan_single(cmd, dev, &found_vgnames, &pp.changed_vgnames))
-					add_errors++;
+
+				if (!(devl = dm_pool_zalloc(cmd->mem, sizeof(*devl))))
+					return_0;
+				devl->dev = dev;
+				dm_list_add(&single_devs, &devl->list);
 			}
 		} else {
 			if (sscanf(pv_name, "%d:%d", &major, &minor) != 2) {
@@ -471,8 +480,11 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 			} else {
 				/* Add major:minor to lvmetad. */
 				log_debug("Scanning dev %d:%d for lvmetad cache.", major, minor);
-				if (!lvmetad_pvscan_single(cmd, dev, &found_vgnames, &pp.changed_vgnames))
-					add_errors++;
+
+				if (!(devl = dm_pool_zalloc(cmd->mem, sizeof(*devl))))
+					return_0;
+				devl->dev = dev;
+				dm_list_add(&single_devs, &devl->list);
 			}
 		}

@@ -482,9 +494,20 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 		}
 	}

+	if (!dm_list_empty(&single_devs)) {
+		label_scan_devs(cmd, &single_devs);
+
+		dm_list_iterate_items(devl, &single_devs) {
+			if (!lvmetad_pvscan_single(cmd, devl->dev, &found_vgnames, &pp.changed_vgnames))
+				add_errors++;
+		}
+	}
+
 	if (!devno_args)
 		goto activate;

+	dm_list_init(&single_devs);
+
 	/* Process any grouped --major --minor args */
 	dm_list_iterate_items(current_group, &cmd->arg_value_groups) {
 		major = grouped_arg_int_value(current_group->arg_values, major_ARG, major);
@@ -503,8 +526,11 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 		} else {
 			/* Add major:minor to lvmetad. */
 			log_debug("Scanning dev %d:%d for lvmetad cache.", major, minor);
-			if (!lvmetad_pvscan_single(cmd, dev, &found_vgnames, &pp.changed_vgnames))
-				add_errors++;
+
+			if (!(devl = dm_pool_zalloc(cmd->mem, sizeof(*devl))))
+				return_0;
+			devl->dev = dev;
+			dm_list_add(&single_devs, &devl->list);
 		}

 		if (sigint_caught()) {
@@ -513,6 +539,15 @@ static int _pvscan_cache(struct cmd_context *cmd, int argc, char **argv)
 		}
 	}

+	if (!dm_list_empty(&single_devs)) {
+		label_scan_devs(cmd, &single_devs);
+
+		dm_list_iterate_items(devl, &single_devs) {
+			if (!lvmetad_pvscan_single(cmd, devl->dev, &found_vgnames, &pp.changed_vgnames))
+				add_errors++;
+		}
+	}
+
 	/*
 	 * In the process of scanning devices, lvmetad may have become
 	 * disabled.  If so, revert to scanning for the autoactivation step.
--- a/tools/toollib.c
+++ b/tools/toollib.c
@@ -2216,14 +2216,10 @@ int process_each_vg(struct cmd_context *cmd,
 	}

 	/*
-	 * First rescan for available devices, then force the next
-	 * label scan to be done.  get_vgnameids() will scan labels
-	 * (when not using lvmetad).
+	 * Scan all devices to populate lvmcache with initial
+	 * list of PVs and VGs.
 	 */
-	if (cmd->cname->flags & REQUIRES_FULL_LABEL_SCAN) {
-		dev_cache_full_scan(cmd->full_filter);
-		lvmcache_force_next_label_scan();
-	}
+	lvmcache_label_scan(cmd);

 	/*
 	 * A list of all VGs on the system is needed when:
@@ -3727,6 +3723,12 @@ int process_each_lv(struct cmd_context *cmd,
 		goto_out;
 	}

+	/*
+	 * Scan all devices to populate lvmcache with initial
+	 * list of PVs and VGs.
+	 */
+	lvmcache_label_scan(cmd);
+
 	/*
 	 * A list of all VGs on the system is needed when:
 	 * . processing all VGs on the system
@@ -4436,7 +4438,12 @@ int process_each_pv(struct cmd_context *cmd,
 	if (!trust_cache() && !orphans_locked) {
 		log_debug("Scanning for available devices");
 		lvmcache_destroy(cmd, 1, 0);
-		dev_cache_full_scan(cmd->full_filter);
+
+		/*
+		 * Scan all devices to populate lvmcache with initial
+		 * list of PVs and VGs.
+		 */
+		lvmcache_label_scan(cmd);
 	}

 	if (!get_vgnameids(cmd, &all_vgnameids, only_this_vgname, 1)) {
@@ -5450,6 +5457,8 @@ int pvcreate_each_device(struct cmd_context *cmd,

 	dev_cache_full_scan(cmd->full_filter);

+	lvmcache_label_scan(cmd);
+
 	/*
 	 * Translate arg names into struct device's.
 	 */
@@ -5604,6 +5613,8 @@ int pvcreate_each_device(struct cmd_context *cmd,
 		goto out;
 	}

+	lvmcache_label_scan(cmd);
+
 	/*
 	 * The device args began on the arg_devices list, then the first check
 	 * loop moved those entries to arg_process as they were found.  Devices
--- a/tools/tools.h
+++ b/tools/tools.h
@@ -113,7 +113,6 @@ struct arg_value_group_list {
 	uint32_t prio;
 };

-#define CACHE_VGMETADATA	0x00000001
 #define PERMITTED_READ_ONLY 	0x00000002
 /* Process all VGs if none specified on the command line. */
 #define ALL_VGS_IS_DEFAULT	0x00000004
--- a/tools/vgcfgrestore.c
+++ b/tools/vgcfgrestore.c
@@ -74,6 +74,8 @@ int vgcfgrestore(struct cmd_context *cmd, int argc, char **argv)
 		return ECMD_FAILED;
 	}

+	lvmcache_label_scan(cmd);
+
 	cmd->handles_unknown_segments = 1;

 	if (!(arg_is_set(cmd, file_ARG) ?
--- a/tools/vgcreate.c
+++ b/tools/vgcreate.c
@@ -26,7 +26,6 @@ int vgcreate(struct cmd_context *cmd, int argc, char **argv)
 	const char *clustered_message = "";
 	char *vg_name;
 	struct arg_value_group_list *current_group;
-	uint32_t rc;

 	if (!argc) {
 		log_error("Please provide volume group name and "
@@ -66,17 +65,30 @@ int vgcreate(struct cmd_context *cmd, int argc, char **argv)
 		return_ECMD_FAILED;
 	cmd->lockd_gl_disable = 1;

-	lvmcache_seed_infos_from_lvmetad(cmd);
-
 	/*
 	 * Check if the VG name already exists.  This should be done before
 	 * creating PVs on any of the devices.
+	 *
+	 * When searching if a VG name exists, acquire the VG lock,
+	 * then do the initial label scan which reads all devices and
+	 * populates lvmcache with any VG name it finds.  If the VG name
+	 * we want to use exists, then the label scan will find it,
+	 * and the fmt_from_vgname call (used to check if the name exists)
+	 * will return non-NULL.
 	 */
-	if ((rc = vg_lock_newname(cmd, vp_new.vg_name)) != SUCCESS) {
-		if (rc == FAILED_EXIST)
-			log_error("A volume group called %s already exists.", vp_new.vg_name);
-		else
-			log_error("Can't get lock for %s.", vp_new.vg_name);
+
+	if (!lock_vol(cmd, vp_new.vg_name, LCK_VG_WRITE, NULL)) {
+		log_error("Can't get lock for %s.", vp_new.vg_name);
+		return ECMD_FAILED;
+	}
+
+	lvmcache_force_next_label_scan();
+	lvmcache_label_scan(cmd);              /* Does nothing when using lvmetad. */
+	lvmcache_seed_infos_from_lvmetad(cmd); /* Does nothing unless using lvmetad. */
+
+	if (lvmcache_fmt_from_vgname(cmd, vp_new.vg_name, NULL, 0)) {
+		unlock_vg(cmd, NULL, vp_new.vg_name);
+		log_error("A volume group called %s already exists.", vp_new.vg_name);
 		return ECMD_FAILED;
 	}
Author	SHA1	Message	Date
David Teigland	d34484556d	lvmcache: simplify metadata cache The copy of VG metadata stored in lvmcache was not being used in general. It pretended to be a generic VG metadata cache, but was not being used except for clvmd activation. There it was used to avoid reading from disk while devices were suspended, i.e. in resume. This removes the code that attempted to make this look like a generic metadata cache, and replaces with with something narrowly targetted to what it's actually used for. This is a way of passing the VG from suspend to resume in clvmd. Since in the case of clvmd one caller can't simply pass the same VG to both suspend and resume, suspend needs to stash the VG somewhere that resume can grab it from. (resume doesn't want to read it from disk since devices are suspended.) The lvmcache vginfo struct is used as a convenient place to stash the VG to pass it from suspend to resume, even though it isn't related to the lvmcache or vginfo. These suspended_vg* vginfo fields should not be used or touched anywhere else, they are only to be used for passing the VG data from suspend to resume in clvmd. The VG data being passed between suspend and resume is never modified, and will only exist in the brief period between suspend and resume in clvmd. suspend has both old (current) and new (precommitted) copies of the VG metadata. It stashes both of these in the vginfo prior to suspending devices. When vg_commit is successful, it sets a flag in vginfo as before, signaling the transition from old to new metadata. resume grabs the VG stashed by suspend. If the vg_commit happened, it grabs the new VG, and if the vg_commit didn't happen it grabs the old VG. The VG is then used to resume LVs. This isolates clvmd-specific code and usage from the normal lvm vg_read code, making the code simpler and the behavior easier to verify. Sequence of operations: - lv_suspend() has both vg_old and vg_new and stashes a copy of each onto the vginfo: lvmcache_save_suspended_vg(vg_old); lvmcache_save_suspended_vg(vg_new); - vg_commit() happens, which causes all clvmd instances to call lvmcache_commit_metadata(vg). A flag is set in the vginfo indicating the transition from the old to new VG: vginfo->suspended_vg_committed = 1; - lv_resume() needs either vg_old or vg_new to use in resuming LVs. It doesn't want to read the VG from disk since devices are suspended, so it gets the VG stashed by lv_suspend: vg = lvmcache_get_suspended_vg(vgid); If the vg_commit did not happen, suspended_vg_committed will not be set, and in this case, lvmcache_get_suspended_vg() will return the old VG instead of the new VG, and it will resume LVs based on the old metadata.	2017-11-10 10:53:57 -06:00
David Teigland	62ec6c3163	label_scan: remove extra label scan and read for orphan PVs When process_each_pv() calls vg_read() on the orphan VG, the internal implementation was doing an unnecessary lvmcache_label_scan() and two unnecessary label_read() calls on each orphan. Some of those unnecessary label scans/reads would sometimes be skipped due to caching, but the code was always doing at least one unnecessary read on each orphan. The common format_text case was also unecessarily calling into the format-specific pv_read() function which actually did nothing. By analyzing each case in which vg_read() was being called on the orphan VG, we can say that all of the label scans/reads in vg_read_orphans are unnecessary: 1. reporting commands: the information saved in lvmcache by the original label scan can be reported. There is no advantage to repeating the label scan on the orphans a second time before reporting it. 2. pvcreate/vgcreate/vgextend: these all share a common implementation in pvcreate_each_device(). That function already rescans labels after acquiring the orphan VG lock, which ensures that the command is using valid lvmcache information.	2017-11-10 10:53:57 -06:00
David Teigland	e30315db97	vgcreate: improve the use of label_scan The old code was doing unnecessary label scans when checking to see if the new VG name exists. A single label_scan is sufficient if it is done after the new VG lock is held.	2017-11-10 10:53:57 -06:00
David Teigland	0f29fd8538	lvmetad: use new label_scan for update from pvscan Take advantage of the common implementation with aio and reduced disk reads.	2017-11-10 10:53:57 -06:00
David Teigland	7ad6c21194	lvmetad: use new label_scan for update from lvmlockd When lvmlockd indicates that the lvmetad cache is out of date because of changes by another node, lvmetad_pvscan_vg() rescans the devices in the VG to update lvmetad. Use the new label_scan in this function to use the common code and take advantage of the new aio and reduced reads.	2017-11-10 10:53:57 -06:00
David Teigland	0e15dfe2b6	label_scan/vg_read: use label_read_data to avoid disk reads The new label_scan() function reads a large buffer of data from the start of the disk, and saves it so that multiple structs can be read from it. Previously, only the label_header was read from this buffer, and the code which needed data structures that immediately followed the label_header would read those from disk separately. This created a large number of small, unnecessary disk reads. In each place that the two read paths (label_scan and vg_read) need to read data from disk, first check if that data is already available from the label_read_data buffer, and if so just copy it from the buffer instead of reading from disk. Code changes ------------ - passing the label_read_data struct down through both read paths to make it available. - before every disk read, first check if the location and size of the desired piece of data exists fully in the label_read_data buffer, and if so copy it from there. Otherwise, use the existing code to read the data from disk. - adding some log_error messages on existing error paths that were already being updated for the reasons above. - using similar naming for parallel functions on the two parallel read paths that are being updated above. label_scan path calls: read_metadata_location_summary, text_read_metadata_summary vg_read path calls: read_metadata_location_vg, text_read_metadata_file Previously, those functions were named: label_scan path calls: vgname_from_mda, text_vgsummary_import vg_read path calls: _find_vg_rlocn, text_vg_import_fd I/O changes ----------- In the label_scan path, the following data is either copied from label_read_data or read from disk for each PV: - label_header and pv_header - mda_header (in _raw_read_mda_header) - vg metadata name (in read_metadata_location_summary) - vg metadata (in config_file_read_fd) Total of 4 reads per PV in the label_scan path. In the vg_read path, the following data is either copied from label_read_data or read from disk for each PV: - mda_header (in _raw_read_mda_header) - vg metadata name (in read_metadata_location_vg) - vg metadata (in config_file_read_fd) Total of 3 reads per PV in the vg_read path. For a common read/reporting command, each PV will be: - read by the command's initial lvmcache_label_scan() - read by lvmcache_label_rescan_vg() at the start of vg_read() - read by vg_read() Previously, this would cause 11 synchronous disk reads per PV: 4 from lvmcache_label_scan(), 4 from lvmcache_label_rescan_vg() and 3 from vg_read(). With this commit's optimization, there are now 2 async disk reads per PV: 1 from lvmcache_label_scan() and 1 from lvmcache_label_rescan_vg(). When a second mda is used on a PV, it is located at the end of the PV. This second mda and copy of metadata will not be found in the label_read_data buffer, and will always require separate disk reads.	2017-11-10 10:53:57 -06:00
David Teigland	83f25c98de	independent metadata areas: fix bogus code Fix mixing bitwise & and logical && which was always 1 in any case.	2017-11-10 10:53:57 -06:00
David Teigland	667ce84c56	label_scan: fix independent metadata areas This fixes the use of lvmcache_label_rescan_vg() in the previous commit for the special case of independent metadata areas. label scan is about discovering VG name to device associations using information from disks, but devices in VGs with independent metadata areas have no information on disk, so the label scan does nothing for these VGs/devices. With independent metadata areas, only the VG metadata found in files is used. This metadata is found and read in vg_read in the processing phase. lvmcache_label_rescan_vg() drops lvmcache info for the VG devices before repeating the label scan on them. In the case of independent metadata areas, there is no metadata on devices, so the label scan of the devices will find nothing, so will not recreate the necessary vginfo/info data in lvmcache for the VG. Fix this by setting a flag in the lvmcache vginfo struct indicating that the VG uses independent metadata areas, and label rescanning should be skipped. In the case of independent metadata areas, it is the metadata processing in the vg_read phase that sets up the lvmcache vginfo/info information, and label scan has no role.	2017-11-10 10:53:57 -06:00
David Teigland	0a9abe6128	label_scan: move to start of command LVM's general design for scanning/reading of metadata from disks is that a command begins with a discovery phase, called "label scan", in which it discovers which devices belong to lvm, what VGs exist on those devices, and which devices are associated with each VG. After this comes the processing phase, which is based around processing specific VGs. In this phase, lvm acquires a lock on the VG, and rescans the devices associated with that VG, i.e. it repeats the label scan steps on the devices in the VG in case something has changed between the initial label scan and taking the VG lock. This ensures that the command is processing the lastest, unchanging data on disk. This commit moves the location of these label scans to make them clearer and avoid unnecessary repeated calls to them. Previously, the initial label scan was called as a side effect from various utility functions. This would lead to it being called unnecessarily. It is an expensive operation, and should only be called when necessary. Also, this is a primary step in the function of the command, and as such it should be called prominently at the top level of command processing, not as a hidden side effect of a utility function. lvm knows exactly where and when the label scan needs to be done. Because of this, move the label scan calls from the internal functions to the top level of processing. Other specific instances of lvmcache_label_scan() are still called unnecessarily or unclearly by specific commands that do not use the common process_each functions. These will be improved in future commits. During the processing phase, rescanning labels for devices in a VG needs to be done after the VG lock is acquired in case things have changed since the initial label scan. This was being done by way of rescanning devices that had the INVALID flag set in lvmcache. This usually approximated the right set of devices, but it was not exact, and obfuscated the real requirement. Correct this by using a new function that rescans the devices in the VG: lvmcache_label_rescan_vg(). Apart from being inexact, the rescanning was extremely well hidden. _vg_read() would call ->create_instance(), _text_create_text_instance(), _create_vg_text_instance() which would call lvmcache_label_scan() which would call _scan_invalid() which repeats the label scan on devices flagged INVALID. lvmcache_label_rescan_vg() is now called prominently by _vg_read() directly.	2017-11-10 10:53:57 -06:00
David Teigland	c3fb5c75f4	label_scan: call new label_scan from lvmcache_label_scan To do label scanning, lvm code calls lvmcache_label_scan(). Change lvmcache_label_scan() to use the new label_scan() which can use async io, rather than implementing its own dev iter loop and calling the synchronous label_read() on each device. Also add lvmcache_label_rescan_vg() which calls the new label_scan_devs() which does label scanning on only the specified devices. This is for a subsequent commit and is not yet used.	2017-11-10 10:53:57 -06:00
David Teigland	60e707806f	label_scan: add new implementation for async and sync This adds the implementation without using it in the code. The code still calls label_read() on each individual device to do scanning. Calling the new label_scan() will use async io if async io is enabled in config settings. If not enabled, or if async io fails, label_scan() will fall back to using synchronous io. If only some aio ops fail, the code will attempt to perform synchronous io on just the ios that failed. Uses linux native aio system calls, not the posix wrappers which are messier and may not have all the latest linux capabilities. Internally, the same functionality is used before: - iterate through each visible device on the system, provided from from dev-cache - call _find_label_header on the dev to find the sector containing the label_header - call _text_read to look at the pv_header and mda locations after the pv_header - for each mda location, read the mda_header and the vg metadata - add info/vginfo structs to lvmcache which associate the device name (info) with the VG name (vginfo) so that vg_read can know which devices to read for a given VG name The new label scanning issues a "large" read beginning at the start of the device, where large is configurable, but intended to cover all the labels/headers/metadata that is located at the start of the device. This large data buffer from each device is saved in a global list using a new 'label_read_data' struct. Currently, this buffer is only used to find the label_header from the first four sectors of the device. In subsequent commits, other functions that read other structs/metadata will first try to find that data in the saved label_read_data buffer. In most common cases, the data they need can simply be copied out of the existing buffer, and they can avoid issuing another disk read to get it.	2017-11-10 10:53:51 -06:00
David Teigland	cf22b4c9f5	command: add settings to enable async io There are config settings to enable aio, and to configure the concurrency and read size.	2017-11-09 15:38:49 -06:00
David Teigland	18400a7e16	io: add low level async io support The interface consists of: - A context struct, one for the entire command. - An io struct, one per io operation (read). - dev_async_context_setup() creates an aio context. - dev_async_context_destroy() destroys an aio context. - dev_async_alloc_ios() allocates a specified number of io structs, along with an associated buffer for the data. - dev_async_free_ios() frees all the allocated io structs+buffers. - dev_async_io_get() gets an available io struct from those allocated in alloc_ios. If none are available, it will allocate a new io struct if under limit. - dev_async_io_put() puts a used io struct back into the set of unused io structs, making it available for get. - dev_async_read_submit() start an async read io. - dev_async_getevents() collect async io completions.	2017-11-09 15:29:20 -06:00