mirror: Add deprecation warning for mirrored log

test: mirrored mirrorlog is not supposed to work in cluster
make: Fix typo
2025-09-15 13:44:18 +03:00 · 2018-02-14 12:53:51 +01:00 · 2018-02-14 12:33:56 +01:00 · 2018-02-13 17:26:39 +01:00 · 2018-02-13 17:26:39 +01:00 · 2018-02-09 11:00:18 +01:00
701 changed files with 21938 additions and 17186 deletions
--- a/13
+++ b/13
@@ -8,10 +8,15 @@ There is no warranty - see COPYING and COPYING.LIB.
 Tarballs are available from:
  ftp://sourceware.org/pub/lvm2/
  ftp://sources.redhat.com/pub/lvm2/
+  https://github.com/lvmteam/lvm2/releases

 The source code is stored in git:
  https://sourceware.org/git/?p=lvm2.git
  git clone git://sourceware.org/git/lvm2.git
+mirrored to:
+  https://github.com/lvmteam/lvm2
+  git clone https://github.com/lvmteam/lvm2.git
+  git clone git@github.com:lvmteam/lvm2.git

 Mailing list for general discussion related to LVM2:
  linux-lvm@redhat.com
@@ -29,6 +34,14 @@ and multipath-tools:
  dm-devel@redhat.com
  Subscribe from https://www.redhat.com/mailman/listinfo/dm-devel

+Website:
+  https://sourceware.org/lvm2/
+
+Report upstream bugs at:
+  https://bugzilla.redhat.com/enter_bug.cgi?product=LVM%20and%20device-mapper
+or open issues at:
+  https://github.com/lvmteam/lvm2/issues
+
 The source code repository used until 7th June 2012 is accessible here:
  http://sources.redhat.com/cgi-bin/cvsweb.cgi/LVM2/?cvsroot=lvm2.

--- a/62
+++ b/62
@@ -0,0 +1,62 @@
+LVM2 Test Suite
+===============
+
+The codebase contains many tests in the test subdirectory.
+
+Before running tests
+--------------------
+
+Keep in mind the testsuite MUST run under root user.
+
+It is recommended not to use LVM on the test machine, especially when running
+tests with udev (`make check_system`.)
+
+You MUST disable (or mask) any LVM daemons:
+
+- lvmetad
+- dmeventd
+- lvmpolld
+- lvmdbusd
+- lvmlockd
+- clvmd
+- cmirrord
+
+For running cluster tests, we are using singlenode locking. Pass
+`--with-clvmd=singlenode` to configure.
+
+NOTE: This is useful only for testing, and should not be used in produciton
+code.
+
+To run D-Bus daemon tests, existing D-Bus session is required.
+
+Running tests
+-------------
+
+As root run:
+
+    make check
+
+To run only tests matching a string:
+
+    make check T=test
+
+To skip tests matching a string:
+
+    make check S=test
+
+There are other targets and many environment variables can be used to tweak the
+testsuite - for full list and description run `make -C test help`.
+
+Installing testsuite
+--------------------
+
+It is possible to install and run a testsuite against installed LVM. Run the
+following:
+
+    make -C test install
+
+Then lvm2-testsuite binary can be executed to test installed binaries.
+
+See `lvm2-testsuite --help` for options. The same environment variables can be
+used as with `make check`.
+
--- a/2
+++ b/2
@@ -1 +1 @@
-2.02.169(2)-git (2016-11-30)
+2.02.178(2)-git (2017-12-18)
--- a/2
+++ b/2
@@ -1 +1 @@
-1.02.138-git (2016-11-30)
+1.02.147-git (2017-12-18)
--- a/223
+++ b/223
@@ -1,9 +1,218 @@
-Version 2.02.169 - 
+Version 2.02.178 - 
 =====================================
-  Lvdisplay [-m] shows more informations for cached volumes.
+  Do not reopen output streams for multithreaded users of liblvm.
+  Use versionsort to fix archive file expiry beyond 100000 files.
+  Add devices/use_aio, aio_max, aio_memory to configure AIO limits.
+  Support asynchronous I/O when scanning devices.
+  Detect asynchronous I/O capability in configure or accept --disable-aio.
+  Add AIO_SUPPORTED_CODE_PATH to indicate whether AIO may be used.
+  Configure ensures /usr/bin dir is checked for dmpd tools.
+  Restore pvmove support for wide-clustered active volumes (2.02.177).
+  Avoid non-exclusive activation of exclusive segment types.
+  Fix trimming sibling PVs when doing a pvmove of raid subLVs.
+  Preserve exclusive activation during thin snaphost merge.
+  Suppress some repeated reads of the same disk data at the device layer.
+  Avoid exceeding array bounds in allocation tag processing.
+  Refactor metadata reading code to use callback functions.
+  Move memory allocation for the key dev_reads into the device layer.
+
+Version 2.02.177 - 18th December 2017
+=====================================
+  When writing text metadata content, use complete 4096 byte blocks.
+  Change text format metadata alignment from 512 to 4096 bytes.
+  When writing metadata, consistently skip mdas marked as failed.
+  Refactor and adjust text format metadata alignment calculation.
+  Fix python3 path in lvmdbusd to use value detected by configure.
+  Reduce checks for active LVs in vgchange before background polling.
+  Ensure _node_send_message always uses clean status of thin pool.
+  Fix lvmlockd to use pool lock when accessing _tmeta volume.
+  Report expected sanlock_convert errors only when retries fail.
+  Avoid blocking in sanlock_convert on SH to EX lock conversion.
+  Deactivate missing raid LV legs (_rimage_X-missing_Y_Z) on decativation.
+  Skip read-modify-write when entire block is replaced.
+  Categorise I/O with reason annotations in debug messages.
+  Allow extending of raid LVs created with --nosync after a failed repair.
+  Command will lock memory only when suspending volumes.
+  Merge segments when pvmove is finished.
+  Remove label_verify that has never been used.
+  Ensure very large numbers used as arguments are not casted to lower values. 
+  Enhance reading and validation of options stripes and stripes_size.
+  Fix printing of default stripe size when user is not using stripes.
+  Activation code for pvmove automatically discovers holding LVs for resume.
+  Make a pvmove LV locking holder.
+  Do not change critical section counter on resume path without real resume.
+  Enhance activation code to automatically suspend pvmove participants.
+  Prevent conversion of thin volumes to snapshot origin when lvmlockd is used.
+  Correct the steps to change lock type in lvmlockd man page.
+  Retry lock acquisition on recognized sanlock errors.
+  Fix lock manager error codes in lvmlockd.
+  Remove unnecessary single read from lvmdiskscan.
+  Check raid reshape flags in vg_validate().
+  Add support for pvmove of cache and snapshot origins.
+  Avoid using precommitted metadata for suspending pvmove tree.
+  Ehnance pvmove locking.
+  Deactivate activated LVs on error path when pvmove activation fails.
+  Add "io" to log/debug_classes for logging low-level I/O.
+  Eliminate redundant nested VG metadata in VG struct.
+  Avoid importing persistent filter in vgscan/pvscan/vgrename.
+  Fix memleak of string buffer when vgcfgbackup runs in secure mode.
+  Do not print error when clvmd cannot find running clvmd.
+  Prevent start of new merge of snapshot if origin is already being merged.
+  Fix offered type for raid6_n_6 to raid5 conversion (raid5_n).
+  Deactivate sub LVs when removing unused cache-pool.
+  Do not take backup with suspended devices.
+  Avoid RAID4 activation on incompatible kernels under all circumstances.
+  Reject conversion request to striped/raid0 on 2-legged raid4/5.
+
+Version 2.02.176 - 3rd November 2017
+====================================
+  Keep Install section only in lvm2-{lvmetad,lvmpolld}.socket systemd unit.
+  Fix segfault in lvm_pv_remove in liblvm. (2.02.173)
+  Do not allow storing VG metadata with LV without any segment.
+  Fix printed message when thin snapshot was already merged.
+  Remove created spare LV when creation of thin-pool failed.
+  Avoid reading ignored metadata when mda gets used again.
+  Fix detection of moved PVs in vgsplit. (2.02.175)
+  Ignore --stripes/--stripesize on RAID takeover
+  Improve used paths for generated systemd units and init shells.
+  Disallow creation of snapshot of mirror/raid subLV (was never supported).
+  Fix regression in more advanced vgname extraction in lvconvert (2.02.169).
+  Allow lvcreate to be used for caching of _tdata LV.
+  Avoid internal error when resizing cache type _tdata LV (not yet supported).
+  Show original converted names when lvconverting LV to pool volume.
+  Move lib code used only by liblvm into metadata-liblvm.c.
+  Distinguish between device not found and excluded by filter.
+  Monitor external origin LVs.
+  Remove the replicator code, including configure --with-replicators.
+  Allow lvcreate --type mirror to work with 100%FREE.
+  Improve selection of resource name for complex volume activation lock.
+  Avoid cutting first character of resource name for activation lock.
+  Support for encrypted devices in fsadm.
+  Improve thin pool overprovisioning and repair warning messages.
+  Fix incorrect adjustment of region size on striped RaidLVs.
+
+Version 2.02.175 - 6th October 2017
+===================================
+  Use --help with blockdev when checking for --getsize64 support in fsadm.
+  Dump lvmdbusd debug information with SIGUSR1.
+  Fix metadata corruption in vgsplit and vgmerge intermediate states.
+  Add PV_MOVED_VG PV status flag to mark PVs moving between VGs.
+  Fix lvmdbus hang and recognise unknown VG correctly.
+  Improve error messages when command rules fail.
+  Require LV name with pvmove in a shared VG.
+  Allow shared active mirror LVs with lvmlockd, dlm, and cmirrord.
+  Support lvconvert --repair with cache and cachepool volumes.
+  lvconvert --repair respects --poolmetadataspare option.
+  Mark that we don't plan to develop liblvm2app and python bindings any further.
+  Fix thin pool creation in shared VG. (2.02.173)
+
+Version 2.02.174 - 13th September 2017
+======================================
+  Prevent raid1 split with trackchanges in a shared VG.
+  Avoid double unlocking of client & lockspace mutexes in lvmlockd.
+  Fix leaking of file descriptor for non-blocking filebased locking.
+  Fix check for 2nd mda at end of disk fits if using pvcreate --restorefile.
+  Use maximum metadataarea size that fits with pvcreate --restorefile.
+  Always clear cached bootloaderarea when wiping label e.g. in pvcreate.
+  Disallow --bootloaderareasize with pvcreate --restorefile.
+  Fix lvmlockd check for running lock managers during lock adoption.
+  Add --withgeneralpreamble and --withlocalpreamble to lvmconfig.
+  Improve makefiles' linking.
+  Fix some paths in generated makefiles to respected configured settings.
+  Add warning when creating thin-pool with zeroing and chunk size >= 512KiB.
+  Introduce exit code 4 EINIT_FAILED to replace -1 when initialisation fails.
+  Add synchronization points with udev during reshape of raid LVs.
+
+Version 2.02.173 - 20th July 2017
+=================================
+  Add synchronization points with udev during conversion of raid LVs.
+  Improve --size args validation and report more detailed error message.
+  Initialize debugging mutex before any debug message in clvmd.
+  Log error instead of warn when noticing connection problem with lvmetad.
+  Fix memory leak in lvmetad when working with duplicates.
+  Remove restrictions on reshaping open and clustered raid devices.
+  Add incompatible data_offset to raid metadata to fix reshape activation.
+  Accept 'lvm -h' and 'lvm --help' as well as 'lvm help' for help.
+  Suppress error message from accept() on clean lvmetad shutdown.
+  Tidy clvmd client list processing and fix segfaults.
+  Protect clvmd debug log messages with mutex and add client id.
+  Fix shellcheck reported issues for script files.
+
+Version 2.02.172 - 28th June 2017
+=================================
+  Add missing NULL to argv array when spliting cmdline arguments.
+  Add display_percent helper function for printing percent values.
+  lvconvert --repair handles failing raid legs (present but marked 'D'ead).
+  Do not lvdisplay --maps unset settings of cache pool.
+  Fix lvdisplay --maps for cache pool without policy settings.
+  Support aborting of flushing cache LV.
+  Reenable conversion of data and metadata thin-pool volumes to raid.
+  Improve raid status reporting with lvs.
+  No longer necessary to '--force' a repair for RAID1.
+  Linear to RAID1 upconverts now use "recover" sync action, not "resync".
+  Improve lvcreate --cachepool arg validation.
+  Limit maximum size of thin-pool for specific chunk size.
+  Print a warning about in-use PVs with no VG using them.
+  Disable automatic clearing of PVs that look like in-use orphans.
+  Cache format2 flag is now using segment name type field.
+  Support storing status flags via segtype name field.
+  Stop using '--yes' mode when fsadm runs without terminal.
+  Extend validation of filesystems resized by fsadm.
+  Enhance lvconvert automatic settings of possible (raid) LV types.
+  Allow lvchange to change properties on a thin pool data sub LV.
+  Fix lvcreate extent percentage calculation for mirrors.
+  Don't reinstate still-missing devices when correcting inconsistent metadata.
+  Properly handle subshell return codes in fsadm.
+  Disallow cachepool creation with policy cleaner and mode writeback.
+
+Version 2.02.171 - 3rd May 2017
+===============================
+  Fix memory warnings by using mempools for command definition processing.
+  Fix running commands from a script file.
+  Add pvcreate prompt when device size doesn't match setphysicalvolumesize.
+  lvconvert - preserve region size on raid1 image count changes
+  Adjust pvresize/pvcreate messages and prompt if underlying dev size differs.
+  raid - sanely handle insufficient space on takeover.
+  Fix configure --enable-notify-dbus status message.
+  Change configure option name prefix from --enable-lockd to --enable-lvmlockd.
+  lvcreate - raise mirror/raid default regionsize to 2MiB
+  Add missing configurable prefix to configuration file installation directory.
+
+Version 2.02.170 - 13th April 2017
+==================================
+  Introduce global/fsadm_executable to make fsadm path configurable.
+  Look for limited thin pool metadata size when using 16G metadata.
+  Add lvconvert pool creation rule disallowing options with poolmetadata.
+  Fix lvconvert when the same LV is incorrectly reused in options.
+  Fix lvconvert VG name validation in option values.
+  Fix missing lvmlockd LV locks in lvchange and lvconvert.
+  Fix dmeventd setup for lvchange --poll.
+  Fix use of --poll and --monitor with lvchange and vgchange.
+  Disallow lvconvert of hidden LV to a pool.
+  Ignore --partial option when not used for activation.
+  Allow --activationmode option with lvchange --refresh.
+  Better message on lvconvert --regionsize
+  Allow valid lvconvert --regionsize change
+  Add raid10 alias raid10_near
+  Handle insufficient PVs on lvconvert takeover
+  Fix SIGINT blocking to prevent corrupted metadata
+  Fix systemd unit existence check for lvmconf --services --startstopservices.
+  Check and use PATH_MAX buffers when creating vgrename device paths.
+
+Version 2.02.169 - 28th March 2017
+==================================
+  Automatically decide whether '-' in a man page is a hyphen or a minus sign.
+  Add build-time configuration command line to 'lvm version' output.
+  Handle known table line parameter order change in specific raid target vsns.
+  Conditionally reject raid convert to striped/raid0* after reshape.
+  Ensure raid6 upconversion restrictions.
+  Adjust mirror & raid dmeventd plugins for new lvconvert --repair behaviour.
+  Disable lvmetad when lvconvert --repair is run.
+  Remove obsolete lvmchange binary - convert to built-in command.
+  Show more information for cached volumes in lvdisplay [-m].
  Add option for lvcreate/lvconvert --cachemetadataformat auto|1|2.
  Support cache segment with configurable metadata format.
-  Add allocation/cache_metadata_format profilable setttings.
+  Add allocation/cache_metadata_format profilable settings.
  Use function cache_set_params() for both lvcreate and lvconvert.
  Skip rounding on cache chunk size boudary when create cache LV.
  Improve cache_set_params support for chunk_size selection.
@@ -13,7 +222,7 @@ Version 2.02.169 -
  Support conversion of raid type, stripesize and number of disks
  Reject writemostly/writebehind in lvchange during resynchronization.
  Deactivate active origin first before removal for improved workflow.
-  Fix regression of accepting options --type and -m with lvresize (2.02.158).
+  Fix regression of accepting both --type and -m with lvresize. (2.02.158)
  Add lvconvert --swapmetadata, new specific way to swap pool metadata LVs.
  Add lvconvert --startpoll, new specific way to start polling conversions.
  Add lvconvert --mergethin, new specific way to merge thin snapshots.
@@ -26,9 +235,9 @@ Version 2.02.169 -
  Match every command run to one command definition.
  Specify every allowed command definition/syntax in command-lines.in.
  Add extra memory page when limiting pthread stack size in clvmd.
-  Support striped/raid0* <-> raid10_near conversions
-  Support shrinking of RaidLvs
-  Support region size changes on existing RaidLVs
+  Support striped/raid0* <-> raid10_near conversions.
+  Support shrinking of RaidLVs.
+  Support region size changes on existing RaidLVs.
  Avoid parallel usage of cpg_mcast_joined() in clvmd with corosync.
  Support raid6_{ls,rs,la,ra}_6 segment types and conversions from/to it.
  Support raid6_n_6 segment type and conversions from/to it.
--- a/78
+++ b/78
@@ -1,23 +1,87 @@
-Version 1.02.138 - 
+Version 1.02.147 - 
 =====================================
+  Parsing mirror status accepts 'userspace' keyword in status.
+  Introduce dm_malloc_aligned for page alignment of buffers.
+
+Version 1.02.146 - 18th December 2017
+=====================================
+  Activation tree of thin pool skips duplicated check of pool status.
+  Remove code supporting replicator target.
+  Do not ignore failure of _info_by_dev().
+  Propagate delayed resume for pvmove subvolumes.
+  Suppress integrity encryption keys in 'table' output unless --showkeys supplied.
+
+Version 1.02.145 - 3rd November 2017
+====================================
+  Keep Install section only in dm-event.socket systemd unit.
+  Issue a specific error with dmsetup status if device is unknown.
+  Fix RT_LIBS reference in generated libdevmapper.pc for pkg-config
+
+Version 1.02.144 - 6th October 2017
+===================================
+  Schedule exit when received SIGTERM in dmeventd.
+  Also try to unmount /boot on blkdeactivate -u if on top of supported device.
+  Use blkdeactivate -r wait in blk-availability systemd service/initscript.
+  Add blkdeactivate -r wait option to wait for MD resync/recovery/reshape.
+  Fix blkdeactivate regression with failing DM/MD devs deactivation (1.02.142).
+  Fix typo in blkdeactivate's '--{dm,lvm,mpath}options' option name.
+  Correct return value testing when get reserved values for reporting.
+  Take -S with dmsetup suspend/resume/clear/wipe_table/remove/deps/status/table.
+
+Version 1.02.143 - 13th September 2017
+======================================
+  Restore umask when creation of node fails.
+  Add --concise to dmsetup create for many devices with tables in one command.
+  Accept minor number without major in library when it knows dm major number.
+  Introduce single-line concise table output format: dmsetup table --concise
+
+Version 1.02.142 - 20th July 2017
+=================================
+  Create /dev/disk/by-part{uuid,label} and gpt-auto-root symlinks with udev.
+
+Version 1.02.141 - 28th June 2017
+=================================
+  Fix reusing of dm_task structure for status reading (used by dmeventd).
+  Add dm_percent_to_round_float for adjusted percentage rounding.
+  Reset array with dead rimage devices once raid gets in sync.
+  Drop unneeded --config option from raid dmeventd plugin.
+  dm_get_status_raid() handle better some incosistent md statuses.
+  Accept truncated files in calls to dm_stats_update_regions_from_fd().
+  Restore Warning by 5% increment when thin-pool is over 80% (1.02.138).
+
+Version 1.02.140 - 3rd May 2017
+===============================
+  Add missing configure --enable-dmfilemapd status message and fix --disable.
+
+Version 1.02.139 - 13th April 2017
+==================================
+  Fix assignment in _target_version() when dm task can't run.
+  Flush stdout on each iteration when using --count or --interval.
+  Show detailed error message when execvp fails while starting dmfilemapd.
+  Fix segmentation fault when dmfilemapd is run with no arguments.
+  Numerous minor dmfilemapd fixes from coverity.
+
+Version 1.02.138 - 28th March 2017
+==================================
+  Support additional raid5/6 configurations.
  Provide dm_tree_node_add_cache_target@base compatible symbol.
  Support DM_CACHE_FEATURE_METADATA2, new cache metadata format 2.
  Improve code to handle mode mask for cache nodes.
  Cache status check for passthrough also require trailing space.
  Add extra memory page when limiting pthread stack size in dmeventd.
  Avoids immediate resume when preloaded device is smaller.
-  Do not suppress kernel key description in dmsetup table output.
+  Do not suppress kernel key description in dmsetup table output for dm-crypt.
  Support configurable command executed from dmeventd thin plugin.
  Support new R|r human readable units output format.
  Thin dmeventd plugin reacts faster on lvextend failure path with umount.
  Add dm_stats_bind_from_fd() to bind a stats handle from a file descriptor.
  Do not try call callback when reverting activation on error path.
-  Fix file mapping for extents with physically adjacent extents.
+  Fix file mapping for extents with physically adjacent extents in dmstats.
  Validation vsnprintf result in runtime translate of dm_log (1.02.136).
-  Separate filemap extent allocation from region table.
-  Fix segmentation fault when filemap region creation fails.
-  Fix performance of region cleanup for failed filemap creation.
-  Fix very slow region deletion with many regions.
+  Separate filemap extent allocation from region table in dmstats.
+  Fix segmentation fault when filemap region creation fails in dmstats.
+  Fix performance of region cleanup for failed filemap creation in dmstats.
+  Fix very slow region deletion with many regions in dmstats.

 Version 1.02.137 - 30th November 2016
 =====================================
--- a/conf/Makefile.in
+++ b/conf/Makefile.in
@@ -32,8 +32,8 @@ include $(top_builddir)/make.tmpl
 .PHONY: install_conf install_localconf install_profiles

 generate:
-	(cat $(top_srcdir)/conf/example.conf.base && LD_LIBRARY_PATH=$(top_builddir)/libdm:$(LD_LIBRARY_PATH) $(top_builddir)/tools/lvm dumpconfig --type default --unconfigured --withcomments --ignorelocal --withspaces) > example.conf.in
-	(cat $(top_srcdir)/conf/lvmlocal.conf.base && LD_LIBRARY_PATH=$(top_builddir)/libdm:$(LD_LIBRARY_PATH) $(top_builddir)/tools/lvm dumpconfig --type default --unconfigured --withcomments --withspaces local) > lvmlocal.conf.in
+	LD_LIBRARY_PATH=$(top_builddir)/libdm:$(LD_LIBRARY_PATH) $(top_builddir)/tools/lvm dumpconfig --type default --unconfigured --withgeneralpreamble --withcomments --ignorelocal --withspaces > example.conf.in
+	LD_LIBRARY_PATH=$(top_builddir)/libdm:$(LD_LIBRARY_PATH) $(top_builddir)/tools/lvm dumpconfig --type default --unconfigured --withlocalpreamble --withcomments --withspaces local > lvmlocal.conf.in

 install_conf: $(CONFSRC)
 	@if [ ! -e $(confdir)/$(CONFDEST) ]; then \
@@ -48,8 +48,8 @@ install_localconf: $(CONFLOCAL)
 	fi

 install_profiles: $(PROFILES)
-	$(INSTALL_DIR) $(DESTDIR)$(DEFAULT_PROFILE_DIR)
-	$(INSTALL_DATA) $(PROFILES) $(DESTDIR)$(DEFAULT_PROFILE_DIR)/
+	$(INSTALL_DIR) $(profiledir)
+	$(INSTALL_DATA) $(PROFILES) $(profiledir)/

 install_lvm2: install_conf install_localconf install_profiles

--- a/conf/example.conf.base
+++ b/conf/example.conf.base
@@ -1,23 +0,0 @@
-# This is an example configuration file for the LVM2 system.
-# It contains the default settings that would be used if there was no
-# @DEFAULT_SYS_DIR@/lvm.conf file.
-#
-# Refer to 'man lvm.conf' for further information including the file layout.
-#
-# Refer to 'man lvm.conf' for information about how settings configured in
-# this file are combined with built-in values and command line options to
-# arrive at the final values used by LVM.
-#
-# Refer to 'man lvmconfig' for information about displaying the built-in
-# and configured values used by LVM.
-#
-# If a default value is set in this file (not commented out), then a
-# new version of LVM using this file will continue using that value,
-# even if the new version of LVM changes the built-in default value.
-#
-# To put this file in a different directory and override @DEFAULT_SYS_DIR@ set
-# the environment variable LVM_SYSTEM_DIR before running the tools.
-#
-# N.B. Take care that each setting only appears once if uncommenting
-# example settings in this file.
-
--- a/conf/example.conf.in
+++ b/conf/example.conf.in
@@ -59,6 +59,22 @@ devices {
 	# This configuration option is advanced.
 	scan = [ "/dev" ]

+	# Configuration option devices/use_aio.
+	# Use linux asynchronous I/O for parallel device access where possible.
+	# This configuration option has an automatic default value.
+	# use_aio = 1
+
+	# Configuration option devices/aio_max.
+	# Maximum number of asynchronous I/Os to issue concurrently.
+	# This configuration option has an automatic default value.
+	# aio_max = 128
+
+	# Configuration option devices/aio_memory.
+	# Approximate maximum total amount of memory (in MB) used
+	# for asynchronous I/O buffers.
+	# This configuration option has an automatic default value.
+	# aio_memory = 10
+
 	# Configuration option devices/obtain_device_list_from_udev.
 	# Obtain the list of available devices from udev.
 	# This avoids opening or using any inapplicable non-block devices or
@@ -114,8 +130,8 @@ devices {
 	# device path names. Each regex is delimited by a vertical bar '|'
 	# (or any character) and is preceded by 'a' to accept the path, or
 	# by 'r' to reject the path. The first regex in the list to match the
-	# path is used, producing the 'a' or 'r' result for the device.
-	# When multiple path names exist for a block device, if any path name
+	# path is used, producing the 'a' or 'r' result for that path.
+	# If any of multiple existing path names for a block device
 	# matches an 'a' pattern before an 'r' pattern, then the device is
 	# accepted. If all the path names match an 'r' pattern first, then the
 	# device is rejected. Unmatching path names do not affect the accept
@@ -379,8 +395,9 @@ allocation {

 	# Configuration option allocation/raid_stripe_all_devices.
 	# Stripe across all PVs when RAID stripes are not specified.
-	# If enabled, all PVs in the VG or on the command line are used for raid0/4/5/6/10
-	# when the command does not specify the number of stripes to use.
+	# If enabled, all PVs in the VG or on the command line are used for
+	# raid0/4/5/6/10 when the command does not specify the number of
+	# stripes to use.
 	# This was the default behaviour until release 2.02.162.
 	# This configuration option has an automatic default value.
 	# raid_stripe_all_devices = 0
@@ -389,6 +406,17 @@ allocation {
 	# Cache pool metadata and data will always use different PVs.
 	cache_pool_metadata_require_separate_pvs = 0

+	# Configuration option allocation/cache_metadata_format.
+	# Sets default metadata format for new cache.
+	# 
+	# Accepted values:
+	#   0  Automatically detected best available format
+	#   1  Original format
+	#   2  Improved 2nd. generation format
+	# 
+	# This configuration option has an automatic default value.
+	# cache_metadata_format = 0
+
 	# Configuration option allocation/cache_mode.
 	# The default cache mode used for new cache.
 	# 
@@ -403,17 +431,6 @@ allocation {
 	# This configuration option has an automatic default value.
 	# cache_mode = "writethrough"

-	# Configuration option allocation/cache_metadata_format.
-	# Sets default metadata format for new cache.
-	# 
-	# Accepted values:
-	#   0  Automatically detected best available format
-	#   1  Original format
-	#   2  Improved 2nd. generation format
-	# 
-	# This configuration option has an automatic default value.
-	# cache_metadata_format = 0
-
 	# Configuration option allocation/cache_policy.
 	# The default cache policy used for new cache volume.
 	# Since kernel 4.2 the default policy is smq (Stochastic multiqueue),
@@ -610,9 +627,9 @@ log {
 	# Select log messages by class.
 	# Some debugging messages are assigned to a class and only appear in
 	# debug output if the class is listed here. Classes currently
-	# available: memory, devices, activation, allocation, lvmetad,
+	# available: memory, devices, io, activation, allocation, lvmetad,
 	# metadata, cache, locking, lvmpolld. Use "all" to see everything.
-	debug_classes = [ "memory", "devices", "activation", "allocation", "lvmetad", "metadata", "cache", "locking", "lvmpolld", "dbus" ]
+	debug_classes = [ "memory", "devices", "io", "activation", "allocation", "lvmetad", "metadata", "cache", "locking", "lvmpolld", "dbus" ]
 }

 # Configuration section backup.
@@ -939,7 +956,7 @@ global {
 	use_lvmetad = @DEFAULT_USE_LVMETAD@

 	# Configuration option global/lvmetad_update_wait_time.
-	# The number of seconds a command will wait for lvmetad update to finish.
+	# Number of seconds a command will wait for lvmetad update to finish.
 	# After waiting for this period, a command will not use lvmetad, and
 	# will revert to disk scanning.
 	# This configuration option has an automatic default value.
@@ -1069,6 +1086,12 @@ global {
 	# This configuration option has an automatic default value.
 	# cache_repair_options = [ "" ]

+	# Configuration option global/fsadm_executable.
+	# The full path to the fsadm command.
+	# LVM uses this command to help with lvresize -r operations.
+	# This configuration option has an automatic default value.
+	# fsadm_executable = "@FSADM_PATH@"
+
 	# Configuration option global/system_id_source.
 	# The method LVM uses to set the local system ID.
 	# Volume Groups can also be given a system ID (by vgcreate, vgchange,
@@ -1291,7 +1314,7 @@ activation {
 	# The clean/dirty state of data is tracked for each region.
 	# The value is rounded down to a power of two if necessary, and
 	# is ignored if it is not a multiple of the machine memory page size.
-	raid_region_size = 512
+	raid_region_size = 2048

 	# Configuration option activation/error_when_full.
 	# Return errors if a thin pool runs out of space.
--- a/conf/lvmlocal.conf.base
+++ b/conf/lvmlocal.conf.base
@@ -1,19 +0,0 @@
-# This is a local configuration file template for the LVM2 system
-# which should be installed as @DEFAULT_SYS_DIR@/lvmlocal.conf .
-#
-# Refer to 'man lvm.conf' for information about the file layout.
-#
-# To put this file in a different directory and override
-# @DEFAULT_SYS_DIR@ set the environment variable LVM_SYSTEM_DIR before
-# running the tools.
-#
-# The lvmlocal.conf file is normally expected to contain only the
-# "local" section which contains settings that should not be shared or
-# repeated among different hosts.  (But if other sections are present,
-# they *will* get processed.  Settings in this file override equivalent
-# ones in lvm.conf and are in turn overridden by ones in any enabled
-# lvm_<tag>.conf files.)
-#
-# Please take care that each setting only appears once if uncommenting
-# example settings in this file and never copy this file between hosts.
-
--- a/505
+++ b/505
--- a/configure.in
+++ b/configure.in
@@ -15,6 +15,7 @@ AC_PREREQ(2.69)
 ################################################################################
 dnl -- Process this file with autoconf to produce a configure script.
 AC_INIT
+CONFIGURE_LINE="$0 $@"
 AC_CONFIG_SRCDIR([lib/device/dev-cache.h])
 AC_CONFIG_HEADERS([include/configure.h])

@@ -30,6 +31,7 @@ AS_IF([test -z "$CFLAGS"], [COPTIMISE_FLAG="-O2"])
 case "$host_os" in
 	linux*)
 		CLDFLAGS="$CLDFLAGS -Wl,--version-script,.export.sym"
+		# equivalent to -rdynamic
 		ELDFLAGS="-Wl,--export-dynamic"
 		# FIXME Generate list and use --dynamic-list=.dlopen.sym
 		CLDWHOLEARCHIVE="-Wl,-whole-archive"
@@ -37,6 +39,7 @@ case "$host_os" in
 		LDDEPS="$LDDEPS .export.sym"
 		LIB_SUFFIX=so
 		DEVMAPPER=yes
+		AIO=yes
 		BUILD_LVMETAD=no
 		BUILD_LVMPOLLD=no
 		LOCKDSANLOCK=no
@@ -56,6 +59,7 @@ case "$host_os" in
 		CLDNOWHOLEARCHIVE=
 		LIB_SUFFIX=dylib
 		DEVMAPPER=yes
+		AIO=no
 		ODIRECT=no
 		DM_IOCTLS=no
 		SELINUX=no
@@ -75,6 +79,7 @@ AC_PROG_CC
 AC_PROG_CXX
 CFLAGS=$save_CFLAGS
 CXXFLAGS=$save_CXXFLAGS
+PATH_SBIN="$PATH:/usr/sbin:/sbin"

 dnl probably no longer needed in 2008, but...
 AC_PROG_GCC_TRADITIONAL
@@ -83,6 +88,7 @@ AC_PROG_LN_S
 AC_PROG_MAKE_SET
 AC_PROG_MKDIR_P
 AC_PROG_RANLIB
+AC_CHECK_TOOL(AR, ar)
 AC_PATH_TOOL(CFLOW_CMD, cflow)
 AC_PATH_TOOL(CSCOPE_CMD, cscope)
 AC_PATH_TOOL(CHMOD, chmod)
@@ -105,7 +111,7 @@ AC_CHECK_HEADERS([assert.h ctype.h dirent.h errno.h fcntl.h float.h \
  sys/time.h sys/types.h sys/utsname.h sys/wait.h time.h \
  unistd.h], , [AC_MSG_ERROR(bailing out)])

-AC_CHECK_HEADERS(termios.h sys/statvfs.h sys/timerfd.h linux/magic.h linux/fiemap.h)
+AC_CHECK_HEADERS(termios.h sys/statvfs.h sys/timerfd.h sys/vfs.h linux/magic.h linux/fiemap.h)

 case "$host_os" in
 	linux*)
@@ -120,6 +126,7 @@ AC_C_CONST
 AC_C_INLINE
 AC_CHECK_MEMBERS([struct stat.st_rdev])
 AC_CHECK_TYPES([ptrdiff_t])
+AC_STRUCT_ST_BLOCKS
 AC_STRUCT_TM
 AC_TYPE_OFF_T
 AC_TYPE_PID_T
@@ -187,9 +194,15 @@ AC_SUBST(HAVE_FULL_RELRO)
 ################################################################################
 dnl -- Prefix is /usr by default, the exec_prefix default is setup later
 AC_PREFIX_DEFAULT(/usr)
-if test "$prefix" = NONE; then
-  datarootdir=${ac_default_prefix}/share
-fi
+
+################################################################################
+dnl -- Clear default exec_prefix - install into /sbin rather than /usr/sbin
+test "$exec_prefix" = NONE -a "$prefix" = NONE && exec_prefix=""
+
+test "x$prefix" = xNONE && prefix=$ac_default_prefix
+# Let make expand exec_prefix.
+test "x$exec_prefix" = xNONE && exec_prefix='${prefix}'
+

 ################################################################################
 dnl -- Setup the ownership of the files
@@ -404,22 +417,6 @@ AC_DEFINE_UNQUOTED([DEFAULT_RAID10_SEGTYPE], ["$DEFAULT_RAID10_SEGTYPE"],
 		   [Default segtype used for raid10 volumes.])

 ################################################################################
-dnl -- asynchronous volume replicator inclusion type
-AC_MSG_CHECKING(whether to include replicators)
-AC_ARG_WITH(replicators,
-	    AC_HELP_STRING([--with-replicators=TYPE],
-			   [replicator support: internal/shared/none [none]]),
-	    REPLICATORS=$withval, REPLICATORS=none)
-AC_MSG_RESULT($REPLICATORS)
-
-case "$REPLICATORS" in
-  none|shared) ;;
-  internal) AC_DEFINE([REPLICATOR_INTERNAL], 1,
-		[Define to 1 to include built-in support for replicators.]) ;;
-  *) AC_MSG_ERROR([--with-replicators parameter invalid ($REPLICATORS)]) ;;
-esac
-
-
 AC_ARG_WITH(default-sparse-segtype,
 	    AC_HELP_STRING([--with-default-sparse-segtype=TYPE],
 			   [default sparse segtype: thin/snapshot [thin]]),
@@ -474,7 +471,7 @@ case "$THIN" in
  internal|shared)
 	# Empty means a config way to ignore thin checking
 	if test "$THIN_CHECK_CMD" = "autodetect"; then
-		AC_PATH_TOOL(THIN_CHECK_CMD, thin_check)
+		AC_PATH_TOOL(THIN_CHECK_CMD, thin_check, [], [$PATH_SBIN])
 		if test -z "$THIN_CHECK_CMD"; then
 			AC_MSG_WARN([thin_check not found in path $PATH])
 			THIN_CHECK_CMD=/usr/sbin/thin_check
@@ -498,7 +495,7 @@ case "$THIN" in
 	fi
 	# Empty means a config way to ignore thin dumping
 	if test "$THIN_DUMP_CMD" = "autodetect"; then
-		AC_PATH_TOOL(THIN_DUMP_CMD, thin_dump)
+		AC_PATH_TOOL(THIN_DUMP_CMD, thin_dump, [], [$PATH_SBIN])
 		test -z "$THIN_DUMP_CMD" && {
 			AC_MSG_WARN(thin_dump not found in path $PATH)
 			THIN_DUMP_CMD=/usr/sbin/thin_dump
@@ -507,7 +504,7 @@ case "$THIN" in
 	fi
 	# Empty means a config way to ignore thin repairing
 	if test "$THIN_REPAIR_CMD" = "autodetect"; then
-		AC_PATH_TOOL(THIN_REPAIR_CMD, thin_repair)
+		AC_PATH_TOOL(THIN_REPAIR_CMD, thin_repair, [], [$PATH_SBIN])
 		test -z "$THIN_REPAIR_CMD" && {
 			AC_MSG_WARN(thin_repair not found in path $PATH)
 			THIN_REPAIR_CMD=/usr/sbin/thin_repair
@@ -516,7 +513,7 @@ case "$THIN" in
 	fi
 	# Empty means a config way to ignore thin restoring
 	if test "$THIN_RESTORE_CMD" = "autodetect"; then
-		AC_PATH_TOOL(THIN_RESTORE_CMD, thin_restore)
+		AC_PATH_TOOL(THIN_RESTORE_CMD, thin_restore, [], [$PATH_SBIN])
 		test -z "$THIN_RESTORE_CMD" && {
 			AC_MSG_WARN(thin_restore not found in path $PATH)
 			THIN_RESTORE_CMD=/usr/sbin/thin_restore
@@ -588,7 +585,7 @@ case "$CACHE" in
  internal|shared)
 	# Empty means a config way to ignore cache checking
 	if test "$CACHE_CHECK_CMD" = "autodetect"; then
-		AC_PATH_TOOL(CACHE_CHECK_CMD, cache_check)
+		AC_PATH_TOOL(CACHE_CHECK_CMD, cache_check, [], [$PATH_SBIN])
 		if test -z "$CACHE_CHECK_CMD"; then
 			AC_MSG_WARN([cache_check not found in path $PATH])
 			CACHE_CHECK_CMD=/usr/sbin/cache_check
@@ -615,11 +612,15 @@ case "$CACHE" in
 				CACHE_CHECK_VERSION_WARN=y
 				CACHE_CHECK_NEEDS_CHECK=no
 			fi
+			if test "$CACHE_CHECK_VSN_MINOR" -lt 7 ; then
+				AC_MSG_WARN([$CACHE_CHECK_CMD: Old version "$CACHE_CHECK_VSN" does not support new cache format V2])
+				CACHE_CHECK_VERSION_WARN=y
+			fi
 		fi
 	fi
 	# Empty means a config way to ignore cache dumping
 	if test "$CACHE_DUMP_CMD" = "autodetect"; then
-		AC_PATH_TOOL(CACHE_DUMP_CMD, cache_dump)
+		AC_PATH_TOOL(CACHE_DUMP_CMD, cache_dump, [], [$PATH_SBIN])
 		test -z "$CACHE_DUMP_CMD" && {
 			AC_MSG_WARN(cache_dump not found in path $PATH)
 			CACHE_DUMP_CMD=/usr/sbin/cache_dump
@@ -628,7 +629,7 @@ case "$CACHE" in
 	fi
 	# Empty means a config way to ignore cache repairing
 	if test "$CACHE_REPAIR_CMD" = "autodetect"; then
-		AC_PATH_TOOL(CACHE_REPAIR_CMD, cache_repair)
+		AC_PATH_TOOL(CACHE_REPAIR_CMD, cache_repair, [], [$PATH_SBIN])
 		test -z "$CACHE_REPAIR_CMD" && {
 			AC_MSG_WARN(cache_repair not found in path $PATH)
 			CACHE_REPAIR_CMD=/usr/sbin/cache_repair
@@ -637,7 +638,7 @@ case "$CACHE" in
 	fi
 	# Empty means a config way to ignore cache restoring
 	if test "$CACHE_RESTORE_CMD" = "autodetect"; then
-		AC_PATH_TOOL(CACHE_RESTORE_CMD, cache_restore)
+		AC_PATH_TOOL(CACHE_RESTORE_CMD, cache_restore, [], [$PATH_SBIN])
 		test -z "$CACHE_RESTORE_CMD" && {
 			AC_MSG_WARN(cache_restore not found in path $PATH)
 			CACHE_RESTORE_CMD=/usr/sbin/cache_restore
@@ -668,11 +669,9 @@ AC_DEFINE_UNQUOTED([CACHE_RESTORE_CMD], ["$CACHE_RESTORE_CMD"],

 ################################################################################
 dnl -- Disable readline
-AC_MSG_CHECKING(whether to enable readline)
 AC_ARG_ENABLE([readline],
 	      AC_HELP_STRING([--disable-readline], [disable readline support]),
 	      READLINE=$enableval, READLINE=maybe)
-AC_MSG_RESULT($READLINE)

 ################################################################################
 dnl -- Disable realtime clock support
@@ -1125,6 +1124,24 @@ if test "$DEVMAPPER" = yes; then
 	AC_DEFINE([DEVMAPPER_SUPPORT], 1, [Define to 1 to enable LVM2 device-mapper interaction.])
 fi

+################################################################################
+dnl -- Disable aio
+AC_MSG_CHECKING(whether to use asynchronous I/O)
+AC_ARG_ENABLE(aio,
+	      AC_HELP_STRING([--disable-aio],
+			     [disable asynchronous I/O]),
+	      AIO=$enableval)
+AC_MSG_RESULT($AIO)
+
+if test "$AIO" = yes; then
+	AC_CHECK_LIB(aio, io_setup,
+		[AC_DEFINE([AIO_SUPPORT], 1, [Define to 1 if aio is available.])
+		AIO_LIBS="-laio"
+		AIO_SUPPORT=yes],
+		[AIO_LIBS=
+		AIO_SUPPORT=no ])
+fi
+
 ################################################################################
 dnl -- Build lvmetad
 AC_MSG_CHECKING(whether to build LVMetaD)
@@ -1148,10 +1165,10 @@ AC_MSG_RESULT($BUILD_LVMPOLLD)
 ################################################################################
 BUILD_LVMLOCKD=no

-dnl -- Build lockdsanlock
-AC_MSG_CHECKING(whether to build lockdsanlock)
-AC_ARG_ENABLE(lockd-sanlock,
-	      AC_HELP_STRING([--enable-lockd-sanlock],
+dnl -- Build lvmlockdsanlock
+AC_MSG_CHECKING(whether to build lvmlockdsanlock)
+AC_ARG_ENABLE(lvmlockd-sanlock,
+	      AC_HELP_STRING([--enable-lvmlockd-sanlock],
 			     [enable the LVM lock daemon using sanlock]),
 	      LOCKDSANLOCK=$enableval)
 AC_MSG_RESULT($LOCKDSANLOCK)
@@ -1166,10 +1183,10 @@ if test "$BUILD_LOCKDSANLOCK" = yes; then
 fi

 ################################################################################
-dnl -- Build lockddlm
-AC_MSG_CHECKING(whether to build lockddlm)
-AC_ARG_ENABLE(lockd-dlm,
-	      AC_HELP_STRING([--enable-lockd-dlm],
+dnl -- Build lvmlockddlm
+AC_MSG_CHECKING(whether to build lvmlockddlm)
+AC_ARG_ENABLE(lvmlockd-dlm,
+	      AC_HELP_STRING([--enable-lvmlockd-dlm],
 			     [enable the LVM lock daemon using dlm]),
 	      LOCKDDLM=$enableval)
 AC_MSG_RESULT($LOCKDDLM)
@@ -1276,13 +1293,12 @@ dnl -- Check dmfilemapd
 AC_MSG_CHECKING(whether to build dmfilemapd)
 AC_ARG_ENABLE(dmfilemapd, AC_HELP_STRING([--enable-dmfilemapd],
 					 [enable the dmstats filemap daemon]),
-	      DMFILEMAPD=$enableval)
-AC_MSG_RESULT($DMFILEMAPD)
-BUILD_DMFILEMAPD=$DMFILEMAPD
-AC_DEFINE([DMFILEMAPD], 1, [Define to 1 to enable the device-mapper filemap daemon.])
+	      BUILD_DMFILEMAPD=$enableval, BUILD_DMFILEMAPD=no)
+AC_MSG_RESULT($BUILD_DMFILEMAPD)
+AC_DEFINE([DMFILEMAPD], $BUILD_DMFILEMAPD, [Define to 1 to enable the device-mapper filemap daemon.])

 dnl -- dmfilemapd requires FIEMAP
-if test "$DMFILEMAPD" = yes; then
+if test "$BUILD_DMFILEMAPD" = yes; then
   AC_CHECK_HEADER([linux/fiemap.h], , [AC_MSG_ERROR(--enable-dmfilemapd requires fiemap.h)])
 fi

@@ -1292,69 +1308,60 @@ AC_MSG_CHECKING(whether to build notifydbus)
 AC_ARG_ENABLE(notify-dbus,
 	      AC_HELP_STRING([--enable-notify-dbus],
 			     [enable LVM notification using dbus]),
-	      NOTIFYDBUS=$enableval)
-AC_MSG_RESULT($NOTIFYDBUS)
+	      NOTIFYDBUS_SUPPORT=$enableval, NOTIFYDBUS_SUPPORT=no)
+AC_MSG_RESULT($NOTIFYDBUS_SUPPORT)

-BUILD_NOTIFYDBUS=$NOTIFYDBUS
-
-if test "$BUILD_NOTIFYDBUS" = yes; then
+if test "$NOTIFYDBUS_SUPPORT" = yes; then
 	AC_DEFINE([NOTIFYDBUS_SUPPORT], 1, [Define to 1 to include code that uses dbus notification.])
-	LIBS="-lsystemd $LIBS"
+	SYSTEMD_LIBS="-lsystemd"
 fi

 ################################################################################
 dnl -- Look for dbus libraries
-if test "$BUILD_NOTIFYDBUS" = yes; then
+if test "$NOTIFYDBUS_SUPPORT" = yes; then
 	PKG_CHECK_MODULES(NOTIFY_DBUS, systemd >= 221, [HAVE_NOTIFY_DBUS=yes], $bailout)
 fi

 ################################################################################

 dnl -- Enable blkid wiping functionality
-AC_MSG_CHECKING(whether to enable libblkid detection of signatures when wiping)
 AC_ARG_ENABLE(blkid_wiping,
 	      AC_HELP_STRING([--disable-blkid_wiping],
 			     [disable libblkid detection of signatures when wiping and use native code instead]),
 	      BLKID_WIPING=$enableval, BLKID_WIPING=maybe)
-AC_MSG_RESULT($BLKID_WIPING)

+DEFAULT_USE_BLKID_WIPING=0
 if test "$BLKID_WIPING" != no; then
 	pkg_config_init
 	PKG_CHECK_MODULES(BLKID, blkid >= 2.24,
-			  [test "$BLKID_WIPING" = maybe && BLKID_WIPING=yes],
-			  [if test "$BLKID_WIPING" = maybe; then
+			  [ BLKID_WIPING=yes
+			    BLKID_PC="blkid"
+			    DEFAULT_USE_BLKID_WIPING=1
+			    AC_DEFINE([BLKID_WIPING_SUPPORT], 1, [Define to 1 to use libblkid detection of signatures when wiping.])
+			  ], [if test "$BLKID_WIPING" = maybe; then
 				BLKID_WIPING=no
 			   else
 			        AC_MSG_ERROR([bailing out... blkid library >= 2.24 is required])
 			   fi])
-	if test "$BLKID_WIPING" = yes; then
-		BLKID_PC="blkid"
-		DEFAULT_USE_BLKID_WIPING=1
-		AC_DEFINE([BLKID_WIPING_SUPPORT], 1, [Define to 1 to use libblkid detection of signatures when wiping.])
-	else
-		DEFAULT_USE_BLKID_WIPING=0
-	fi
-else
-	DEFAULT_USE_BLKID_WIPING=0
 fi
+AC_MSG_CHECKING([whether to enable libblkid detection of signatures when wiping])
+AC_MSG_RESULT($BLKID_WIPING)
 AC_DEFINE_UNQUOTED(DEFAULT_USE_BLKID_WIPING, [$DEFAULT_USE_BLKID_WIPING],
 		   [Use blkid wiping by default.])

 ################################################################################
 dnl -- Enable udev-systemd protocol to instantiate a service for background jobs
 dnl -- Requires systemd version 205 at least (including support for systemd-run)
-AC_MSG_CHECKING(whether to use udev-systemd protocol for jobs in background)
 AC_ARG_ENABLE(udev-systemd-background-jobs,
 	      AC_HELP_STRING([--disable-udev-systemd-background-jobs],
 			     [disable udev-systemd protocol to instantiate a service for background job]),
 	      UDEV_SYSTEMD_BACKGROUND_JOBS=$enableval,
 	      UDEV_SYSTEMD_BACKGROUND_JOBS=maybe)
-AC_MSG_RESULT($UDEV_SYSTEMD_BACKGROUND_JOBS)

 if test "$UDEV_SYSTEMD_BACKGROUND_JOBS" != no; then
 	pkg_config_init
 	PKG_CHECK_MODULES(SYSTEMD, systemd >= 205,
-			  [test "$UDEV_SYSTEMD_BACKGROUND_JOBS" = maybe && UDEV_SYSTEMD_BACKGROUND_JOBS=yes],
+			  [UDEV_SYSTEMD_BACKGROUND_JOBS=yes],
 			  [if test "$UDEV_SYSTEMD_BACKGROUND_JOBS" = maybe; then
 				UDEV_SYSTEMD_BACKGROUND_JOBS=no
 			   else
@@ -1362,6 +1369,9 @@ if test "$UDEV_SYSTEMD_BACKGROUND_JOBS" != no; then
 			   fi])
 fi

+AC_MSG_CHECKING(whether to use udev-systemd protocol for jobs in background)
+AC_MSG_RESULT($UDEV_SYSTEMD_BACKGROUND_JOBS)
+
 ################################################################################
 dnl -- Enable udev synchronisation
 AC_MSG_CHECKING(whether to enable synchronisation with udev processing)
@@ -1465,6 +1475,8 @@ AC_SUBST([LVM2APP_LIB])
 test "$APPLIB" = yes \
  && LVM2APP_LIB=-llvm2app \
  || LVM2APP_LIB=
+AS_IF([test "$APPLIB"],
+      [AC_MSG_WARN([liblvm2app is deprecated. Use D-Bus API])])

 ################################################################################
 dnl -- Enable cmdlib
@@ -1485,6 +1497,8 @@ AC_ARG_ENABLE(dbus-service,
 	      AC_HELP_STRING([--enable-dbus-service], [install D-Bus support]),
 	      BUILD_LVMDBUSD=$enableval, BUILD_LVMDBUSD=no)
 AC_MSG_RESULT($BUILD_LVMDBUSD)
+AS_IF([test "$NOTIFYDBUS_SUPPORT" = yes && test "BUILD_LVMDBUSD" = yes],
+      [AC_MSG_WARN([Building D-Bus support without D-Bus notifications.])])

 ################################################################################
 dnl -- Enable Python liblvm2app bindings
@@ -1537,7 +1551,7 @@ if test "$PYTHON3_BINDINGS" = yes -o "$BUILD_LVMDBUSD" = yes; then
 	PYTHON3_INCDIRS=`"$PYTHON3_CONFIG" --includes`
 	PYTHON3_LIBDIRS=`"$PYTHON3_CONFIG" --libs`
 	PYTHON3DIR=$pythondir
-	PYTHON_BINDINGS=yes
+	test "$PYTHON3_BINDINGS" = yes && PYTHON_BINDINGS=yes
 fi

 if test "$BUILD_LVMDBUSD" = yes; then
@@ -1547,6 +1561,7 @@ if test "$BUILD_LVMDBUSD" = yes; then
 fi

 if test "$PYTHON_BINDINGS" = yes -o "$PYTHON2_BINDINGS" = yes -o "$PYTHON3_BINDINGS" = yes; then
+	AC_MSG_WARN([Python bindings are deprecated. Use D-Bus API])
 	test "$APPLIB" != yes && AC_MSG_ERROR([Python_bindings require --enable-applib])
 fi

@@ -1582,13 +1597,11 @@ dnl -- enable dmeventd handling
 AC_MSG_CHECKING(whether to use dmeventd)
 AC_ARG_ENABLE(dmeventd, AC_HELP_STRING([--enable-dmeventd],
 				       [enable the device-mapper event daemon]),
-	      DMEVENTD=$enableval)
-AC_MSG_RESULT($DMEVENTD)
-
-BUILD_DMEVENTD=$DMEVENTD
+	      BUILD_DMEVENTD=$enableval, BUILD_DMEVENTD=no)
+AC_MSG_RESULT($BUILD_DMEVENTD)

 dnl -- dmeventd currently requires internal mirror support
-if test "$DMEVENTD" = yes; then
+if test "$BUILD_DMEVENTD" = yes; then
   if test "$MIRRORS" != internal; then
      AC_MSG_ERROR([--enable-dmeventd currently requires --with-mirrors=internal])
   fi
@@ -1612,10 +1625,6 @@ AC_CHECK_LIB(c, canonicalize_file_name,
  AC_DEFINE([HAVE_CANONICALIZE_FILE_NAME], 1,
    [Define to 1 if canonicalize_file_name is available.]))

-################################################################################
-dnl -- Clear default exec_prefix - install into /sbin rather than /usr/sbin
-test "$exec_prefix" = NONE -a "$prefix" = NONE && exec_prefix=""
-
 ################################################################################
 dnl -- Check for dlopen
 AC_CHECK_LIB(dl, dlopen,
@@ -1672,13 +1681,16 @@ fi

 ################################################################################
 dnl -- Check for realtime clock support
+RT_LIBS=
+HAVE_REALTIME=no
 if test "$REALTIME" = yes; then
-	AC_CHECK_LIB(rt, clock_gettime, HAVE_REALTIME=yes, HAVE_REALTIME=no)
+	AC_CHECK_FUNCS([clock_gettime], HAVE_REALTIME=yes)
+
+	AS_IF([test "$HAVE_REALTIME" != yes], [ # try again with -lrt
+	      AC_CHECK_LIB([rt], [clock_gettime], RT_LIBS="-lrt"; HAVE_REALTIME=yes)])

 	if test "$HAVE_REALTIME" = yes; then
 		AC_DEFINE([HAVE_REALTIME], 1, [Define to 1 to include support for realtime clock.])
-		LIBS="-lrt $LIBS"
-		RT_LIB="-lrt"
 	else
 		AC_MSG_WARN(Disabling realtime clock)
 	fi
@@ -1722,6 +1734,7 @@ Note: (n)curses also seems to work as a substitute for termcap.  This was
 		AC_DEFINE([READLINE_SUPPORT], 1,
 			[Define to 1 to include the LVM readline shell.])
 		dnl -- Try only with -lreadline and check for different symbol
+		READLINE=yes
 		LIBS=$lvm_saved_libs
 		AC_CHECK_LIB([readline], [rl_line_buffer],
 			[ READLINE_LIBS="-lreadline" ], [
@@ -1828,13 +1841,16 @@ dnl -- Ensure additional headers required
 if test "$READLINE" = yes; then
 	AC_CHECK_HEADERS(readline/readline.h readline/history.h,,hard_bailout)
 fi
+AC_MSG_CHECKING(whether to enable readline)
+AC_MSG_RESULT($READLINE)

 if test "$BUILD_CMIRRORD" = yes; then
 	AC_CHECK_FUNCS(atexit,,hard_bailout)
 fi

 if test "$BUILD_LVMLOCKD" = yes; then
-	AC_CHECK_FUNCS(clock_gettime strtoull,,hard_bailout)
+	AS_IF([test "$HAVE_REALTIME" != yes], [AC_MSG_ERROR([Realtime clock support is mandatory for lvmlockd.])])
+	AC_CHECK_FUNCS(strtoull,,hard_bailout)
 fi

 if test "$BUILD_LVMPOLLD" = yes; then
@@ -1854,7 +1870,7 @@ if test "$CLUSTER" != none; then
 	AC_CHECK_FUNCS(socket,,hard_bailout)
 fi

-if test "$DMEVENTD" = yes; then
+if test "$BUILD_DMEVENTD" = yes; then
 	AC_CHECK_HEADERS(arpa/inet.h,,hard_bailout)
 fi

@@ -1870,29 +1886,30 @@ if test "$UDEV_SYNC" = yes; then
 	AC_CHECK_HEADERS(sys/ipc.h sys/sem.h,,hard_bailout)
 fi

-if test "$DMFILEMAPD" = yes; then
+if test "$BUILD_DMFILEMAPD" = yes; then
 	AC_CHECK_HEADERS([sys/inotify.h],,hard_bailout)
 fi

 ################################################################################
-AC_PATH_TOOL(MODPROBE_CMD, modprobe)
+AC_PATH_TOOL(MODPROBE_CMD, modprobe, [], [$PATH_SBIN])

 if test -n "$MODPROBE_CMD"; then
 	AC_DEFINE_UNQUOTED([MODPROBE_CMD], ["$MODPROBE_CMD"], [The path to 'modprobe', if available.])
 fi

+SYSCONFDIR="$(eval echo $(eval echo $sysconfdir))"

-lvm_exec_prefix=$exec_prefix
-test "$lvm_exec_prefix" = NONE && lvm_exec_prefix=$prefix
-test "$lvm_exec_prefix" = NONE && lvm_exec_prefix=$ac_default_prefix
-LVM_PATH="$lvm_exec_prefix/sbin/lvm"
+SBINDIR="$(eval echo $(eval echo $sbindir))"
+LVM_PATH="$SBINDIR/lvm"
 AC_DEFINE_UNQUOTED(LVM_PATH, ["$LVM_PATH"], [Path to lvm binary.])

-clvmd_prefix=$ac_default_prefix
-test "$prefix" != NONE && clvmd_prefix=$prefix
-CLVMD_PATH="$clvmd_prefix/sbin/clvmd"
+USRSBINDIR="$(eval echo $(eval echo $usrsbindir))"
+CLVMD_PATH="$USRSBINDIR/clvmd"
 AC_DEFINE_UNQUOTED(CLVMD_PATH, ["$CLVMD_PATH"], [Path to clvmd binary.])

+FSADM_PATH="$SBINDIR/fsadm"
+AC_DEFINE_UNQUOTED(FSADM_PATH, ["$FSADM_PATH"], [Path to fsadm binary.])
+
 ################################################################################
 dnl -- dmeventd pidfile and executable path
 if test "$BUILD_DMEVENTD" = yes; then
@@ -1910,7 +1927,7 @@ if test "$BUILD_DMEVENTD" = yes; then
 		    AC_HELP_STRING([--with-dmeventd-path=PATH],
 				   [dmeventd path [EPREFIX/sbin/dmeventd]]),
 		    DMEVENTD_PATH=$withval,
-		    DMEVENTD_PATH="$lvm_exec_prefix/sbin/dmeventd")
+		    DMEVENTD_PATH="$SBINDIR/dmeventd")
 	AC_DEFINE_UNQUOTED(DMEVENTD_PATH, ["$DMEVENTD_PATH"],
 			   [Path to dmeventd binary.])
 fi
@@ -1953,13 +1970,17 @@ AC_ARG_WITH(default-cache-subdir,
 AC_DEFINE_UNQUOTED(DEFAULT_CACHE_SUBDIR, ["$DEFAULT_CACHE_SUBDIR"],
 		   [Name of default metadata cache subdirectory.])

+# Select default system locking dir, prefer /run/lock over /var/lock
+DEFAULT_SYS_LOCK_DIR="$RUN_DIR/lock"
+test -d "$DEFAULT_SYS_LOCK_DIR" || DEFAULT_SYS_LOCK_DIR="/var/lock"
+
+# Support configurable locking subdir for lvm
 AC_ARG_WITH(default-locking-dir,
 	    AC_HELP_STRING([--with-default-locking-dir=DIR],
 			   [default locking directory [autodetect_lock_dir/lvm]]),
 	    DEFAULT_LOCK_DIR=$withval,
 	    [AC_MSG_CHECKING(for default lock directory)
-	     DEFAULT_LOCK_DIR="$RUN_DIR/lock/lvm"
-	     test -d "$RUN_DIR/lock" || DEFAULT_LOCK_DIR="/var/lock/lvm"
+	     DEFAULT_LOCK_DIR="$DEFAULT_SYS_LOCK_DIR/lvm"
 	     AC_MSG_RESULT($DEFAULT_LOCK_DIR)])
 AC_DEFINE_UNQUOTED(DEFAULT_LOCK_DIR, ["$DEFAULT_LOCK_DIR"],
 		   [Name of default locking directory.])
@@ -2001,6 +2022,8 @@ LVM_MINOR=`echo "$VER" | $AWK -F '.' '{print $2}'`
 LVM_PATCHLEVEL=`echo "$VER" | $AWK -F '[[(.]]' '{print $3}'`
 LVM_LIBAPI=`echo "$VER" | $AWK -F '[[()]]' '{print $2}'`

+AC_DEFINE_UNQUOTED(LVM_CONFIGURE_LINE, "$CONFIGURE_LINE", [configure command line used])
+
 ################################################################################
 AC_SUBST(APPLIB)
 AC_SUBST(AWK)
@@ -2014,7 +2037,6 @@ AC_SUBST(BUILD_LVMLOCKD)
 AC_SUBST(BUILD_LOCKDSANLOCK)
 AC_SUBST(BUILD_LOCKDDLM)
 AC_SUBST(BUILD_DMFILEMAPD)
-AC_SUBST(BUILD_NOTIFYDBUS)
 AC_SUBST(CACHE)
 AC_SUBST(CFLAGS)
 AC_SUBST(CFLOW_CMD)
@@ -2053,20 +2075,22 @@ AC_SUBST(DEFAULT_RAID10_SEGTYPE)
 AC_SUBST(DEFAULT_RUN_DIR)
 AC_SUBST(DEFAULT_SPARSE_SEGTYPE)
 AC_SUBST(DEFAULT_SYS_DIR)
+AC_SUBST(DEFAULT_SYS_LOCK_DIR)
 AC_SUBST(DEFAULT_USE_BLKID_WIPING)
 AC_SUBST(DEFAULT_USE_LVMETAD)
 AC_SUBST(DEFAULT_USE_LVMPOLLD)
 AC_SUBST(DEFAULT_USE_LVMLOCKD)
 AC_SUBST(DEVMAPPER)
+AC_SUBST(AIO)
 AC_SUBST(DLM_CFLAGS)
 AC_SUBST(DLM_LIBS)
 AC_SUBST(DL_LIBS)
-AC_SUBST(DMEVENTD)
+AC_SUBST(AIO_LIBS)
 AC_SUBST(DMEVENTD_PATH)
-AC_SUBST(DMFILEMAPD)
 AC_SUBST(DM_LIB_PATCHLEVEL)
 AC_SUBST(ELDFLAGS)
 AC_SUBST(FSADM)
+AC_SUBST(FSADM_PATH)
 AC_SUBST(BLKDEACTIVATE)
 AC_SUBST(HAVE_LIBDL)
 AC_SUBST(HAVE_REALTIME)
@@ -2111,15 +2135,18 @@ AC_SUBST(PYTHON3DIR)
 AC_SUBST(QUORUM_CFLAGS)
 AC_SUBST(QUORUM_LIBS)
 AC_SUBST(RAID)
-AC_SUBST(RT_LIB)
+AC_SUBST(RT_LIBS)
 AC_SUBST(READLINE_LIBS)
 AC_SUBST(REPLICATORS)
 AC_SUBST(SACKPT_CFLAGS)
 AC_SUBST(SACKPT_LIBS)
 AC_SUBST(SALCK_CFLAGS)
 AC_SUBST(SALCK_LIBS)
+AC_SUBST(SBINDIR)
 AC_SUBST(SELINUX_LIBS)
 AC_SUBST(SELINUX_PC)
+AC_SUBST(SYSCONFDIR)
+AC_SUBST(SYSTEMD_LIBS)
 AC_SUBST(SNAPSHOTS)
 AC_SUBST(STATICDIR)
 AC_SUBST(STATIC_LINK)
@@ -2141,6 +2168,7 @@ AC_SUBST(UDEV_SYSTEMD_BACKGROUND_JOBS)
 AC_SUBST(UDEV_RULE_EXEC_DETECTION)
 AC_SUBST(UDEV_HAS_BUILTIN_BLKID)
 AC_SUBST(USE_TRACKING)
+AC_SUBST(USRSBINDIR)
 AC_SUBST(VALGRIND_POOL)
 AC_SUBST(WRITE_INSTALL)
 AC_SUBST(DMEVENTD_PIDFILE)
@@ -2181,6 +2209,9 @@ daemons/dmeventd/plugins/snapshot/Makefile
 daemons/dmeventd/plugins/thin/Makefile
 daemons/dmfilemapd/Makefile
 daemons/lvmdbusd/Makefile
+daemons/lvmdbusd/lvmdbusd
+daemons/lvmdbusd/lvmdb.py
+daemons/lvmdbusd/lvm_shell_proxy.py
 daemons/lvmdbusd/path.py
 daemons/lvmetad/Makefile
 daemons/lvmpolld/Makefile
@@ -2197,7 +2228,6 @@ lib/format1/Makefile
 lib/format_pool/Makefile
 lib/locking/Makefile
 lib/mirror/Makefile
-lib/replicator/Makefile
 include/lvm-version.h
 lib/raid/Makefile
 lib/snapshot/Makefile
@@ -2256,10 +2286,14 @@ AS_IF([test -n "$THIN_CONFIGURE_WARN"],
      [AC_MSG_WARN([Support for thin provisioning is limited since some thin provisioning tools are missing!])])

 AS_IF([test -n "$THIN_CHECK_VERSION_WARN"],
-      [AC_MSG_WARN([You should also install thin_check vsn 0.3.2 (or later) to use lvm2 thin provisioning])])
+      [AC_MSG_WARN([You should also install latest thin_check vsn 0.7.0 (or later) for lvm2 thin provisioning])])

 AS_IF([test -n "$CACHE_CONFIGURE_WARN"],
      [AC_MSG_WARN([Support for cache is limited since some cache tools are missing!])])

+AS_IF([test -n "$CACHE_CHECK_VERSION_WARN"],
+      [AC_MSG_WARN([You should install latest cache_check vsn 0.7.0 to use lvm2 cache metadata format 2])])
+
+
 AS_IF([test "$ODIRECT" != yes],
      [AC_MSG_WARN([O_DIRECT disabled: low-memory pvmove may lock up])])
--- a/coverity/coverity_model.c
+++ b/coverity/coverity_model.c
@@ -41,6 +41,19 @@ struct lv_segment *last_seg(const struct logical_volume *lv)
 	return ((struct lv_segment **)lv)[0];
 }

+const char *find_config_tree_str(struct cmd_context *cmd, int id, struct profile *profile)
+{
+	return "STRING";
+}
+
+struct logical_volume *origin_from_cow(const struct logical_volume *lv)
+{
+	if (lv)
+		return lv;
+
+	__coverity_panic__();
+}
+
 /* simple_memccpy() from glibc */
 void *memccpy(void *dest, const void *src, int c, size_t n)
 {
@@ -71,6 +84,17 @@ void model_FD_ZERO(void *fdset)
 		((long*)fdset)[i] = 0;
 }

+
+/* Resent Coverity reports quite weird errors... */
+int *__errno_location(void)
+{
+}
+const unsigned short **__ctype_b_loc (void)
+{
+}
+
+
+
 /*
 * Added extra pointer check to not need these models,
 * for now just keep then in file
--- a/daemons/clvmd/Makefile.in
+++ b/daemons/clvmd/Makefile.in
@@ -31,9 +31,9 @@ SALCK_LIBS = @SALCK_LIBS@
 SALCK_CFLAGS = @SALCK_CFLAGS@

 SOURCES = \
-	clvmd-command.c  \
-	clvmd.c          \
-	lvm-functions.c  \
+	clvmd-command.c\
+	clvmd.c\
+	lvm-functions.c\
 	refresh_clvmd.c

 ifneq (,$(findstring cman,, "@CLVMD@,"))
@@ -72,26 +72,17 @@ endif
 TARGETS = \
 	clvmd

-LVMLIBS = $(LVMINTERNAL_LIBS)
-
-ifeq ("@DMEVENTD@", "yes")
-	LVMLIBS += -ldevmapper-event
-endif
- 
 include $(top_builddir)/make.tmpl

-LVMLIBS += -ldevmapper
-LIBS += $(PTHREAD_LIBS)
-
+LIBS += $(LVMINTERNAL_LIBS) -ldevmapper $(PTHREAD_LIBS)
 CFLAGS += -fno-strict-aliasing $(EXTRA_EXEC_CFLAGS)
-LDFLAGS += $(EXTRA_EXEC_LDFLAGS)

 INSTALL_TARGETS = \
 	install_clvmd

 clvmd: $(OBJECTS) $(top_builddir)/lib/liblvm-internal.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o clvmd $(OBJECTS) \
-		$(LVMLIBS) $(LMLIBS) $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS) \
+	      -o clvmd $(OBJECTS) $(LMLIBS) $(LIBS)

 .PHONY: install_clvmd

--- a/daemons/clvmd/clvmd-command.c
+++ b/daemons/clvmd/clvmd-command.c
@@ -171,8 +171,10 @@ int do_command(struct local_client *client, struct clvm_header *msg, int msglen,

 	/* Check the status of the command and return the error text */
 	if (status) {
-		*retlen = 1 + ((*buf) ? dm_snprintf(*buf, buflen, "%s",
-						    strerror(status)) : -1);
+		if (*buf)
+			*retlen = dm_snprintf(*buf, buflen, "%s", strerror(status)) + 1;
+		else
+			*retlen = 0;
 	}

 	return status;
@@ -206,7 +208,7 @@ static int lock_vg(struct local_client *client)
 	lock_mode = ((int) lock_cmd & LCK_TYPE_MASK);
 	/* lock_flags = args[1]; */
 	lockname = &args[2];
-	DEBUGLOG("doing PRE command LOCK_VG '%s' at %x (client=%p)\n", lockname, lock_cmd, client);
+	DEBUGLOG("(%p) doing PRE command LOCK_VG '%s' at %x\n", client, lockname, lock_cmd);

 	if (lock_mode == LCK_UNLOCK) {
 		if (!(lkid = (int) (long) dm_hash_lookup(lock_hash, lockname)))
@@ -323,7 +325,7 @@ void cmd_client_cleanup(struct local_client *client)
 	int lkid;
 	char *lockname;

-	DEBUGLOG("Client thread cleanup (%p)\n", client);
+	DEBUGLOG("(%p) Client thread cleanup\n", client);
 	if (!client->bits.localsock.private)
 		return;

@@ -332,7 +334,7 @@ void cmd_client_cleanup(struct local_client *client)
 	dm_hash_iterate(v, lock_hash) {
 		lkid = (int)(long)dm_hash_get_data(lock_hash, v);
 		lockname = dm_hash_get_key(lock_hash, v);
-		DEBUGLOG("Cleanup (%p): Unlocking lock %s %x\n", client, lockname, lkid);
+		DEBUGLOG("(%p) Cleanup: Unlocking lock %s %x\n", client, lockname, lkid);
 		(void) sync_unlock(lockname, lkid);
 	}

--- a/daemons/clvmd/clvmd-openais.c
+++ b/daemons/clvmd/clvmd-openais.c
@@ -425,8 +425,6 @@ static void _add_up_node(const char *csid)
 	DEBUGLOG("openais_add_up_node %d\n", ninfo->nodeid);

 	ninfo->state = NODE_CLVMD;
-
-	return;
 }

 /* Call a callback for each node, so the caller knows whether it's up or down */
--- a/daemons/clvmd/clvmd.c
+++ b/daemons/clvmd/clvmd.c
@@ -58,6 +58,7 @@
 /* Head of the fd list. Also contains
   the cluster_socket details */
 static struct local_client local_client_head;
+static int _local_client_count = 0;

 static unsigned short global_xid = 0;	/* Last transaction ID issued */

@@ -68,6 +69,37 @@ static unsigned max_csid_len;
 static unsigned max_cluster_message;
 static unsigned max_cluster_member_name_len;

+static void _add_client(struct local_client *new_client, struct local_client *existing_client)
+{
+	_local_client_count++;
+	DEBUGLOG("(%p) Adding listener for fd %d. (Now %d monitored fds.)\n", new_client, new_client->fd, _local_client_count);
+	new_client->next = existing_client->next;
+	existing_client->next = new_client;
+}
+
+int add_client(struct local_client *new_client)
+{
+	_add_client(new_client, &local_client_head);
+
+	return 0;
+}
+
+/* Returns 0 if delfd is found and removed from list */
+static int _del_client(struct local_client *delfd)
+{
+	struct local_client *lastfd, *thisfd;
+
+	for (lastfd = &local_client_head; (thisfd = lastfd->next); lastfd = thisfd)
+		if (thisfd == delfd) {
+			DEBUGLOG("(%p) Removing listener for fd %d\n", thisfd, thisfd->fd);
+			lastfd->next = delfd->next;
+			_local_client_count--;
+			return 0;
+		}
+
+	return 1;
+}
+
 /* Structure of items on the LVM thread list */
 struct lvm_thread_cmd {
 	struct dm_list list;
@@ -92,6 +124,7 @@ static const size_t STACK_SIZE = 128 * 1024;
 static pthread_attr_t stack_attr;
 static int lvm_thread_exit = 0;
 static pthread_mutex_t lvm_thread_mutex;
+static pthread_mutex_t _debuglog_mutex = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t lvm_thread_cond;
 static pthread_barrier_t lvm_start_barrier;
 static struct dm_list lvm_cmd_head;
@@ -218,14 +251,17 @@ void debuglog(const char *fmt, ...)

 	switch (clvmd_get_debug()) {
 	case DEBUG_STDERR:
+		pthread_mutex_lock(&_debuglog_mutex);
 		va_start(ap,fmt);
 		time(&P);
 		fprintf(stderr, "CLVMD[%x]: %.15s ", (int)pthread_self(), ctime_r(&P, buf_ctime) + 4);
 		vfprintf(stderr, fmt, ap);
 		va_end(ap);
 		fflush(stderr);
+		pthread_mutex_unlock(&_debuglog_mutex);
 		break;
 	case DEBUG_SYSLOG:
+		pthread_mutex_lock(&_debuglog_mutex);
 		if (!syslog_init) {
 			openlog("clvmd", LOG_PID, LOG_DAEMON);
 			syslog_init = 1;
@@ -234,6 +270,7 @@ void debuglog(const char *fmt, ...)
 		va_start(ap,fmt);
 		vsyslog(LOG_DEBUG, fmt, ap);
 		va_end(ap);
+		pthread_mutex_unlock(&_debuglog_mutex);
 		break;
 	case DEBUG_OFF:
 		break;
@@ -584,6 +621,7 @@ int main(int argc, char *argv[])
 	local_client_head.fd = clops->get_main_cluster_fd();
 	local_client_head.type = CLUSTER_MAIN_SOCK;
 	local_client_head.callback = clops->cluster_fd_callback;
+	_local_client_count++;

 	/* Add the local socket to the list */
 	if (!(newfd = dm_zalloc(sizeof(struct local_client)))) {
@@ -594,14 +632,14 @@ int main(int argc, char *argv[])
 	newfd->fd = local_sock;
 	newfd->type = LOCAL_RENDEZVOUS;
 	newfd->callback = local_rendezvous_callback;
-	newfd->next = local_client_head.next;
-	local_client_head.next = newfd;
+
+	(void) add_client(newfd);

 	/* This needs to be started after cluster initialisation
 	   as it may need to take out locks */
 	DEBUGLOG("Starting LVM thread\n");
-	DEBUGLOG("Main cluster socket fd %d (%p) with local socket %d (%p)\n",
-		 local_client_head.fd, &local_client_head, newfd->fd, newfd);
+	DEBUGLOG("(%p) Main cluster socket fd %d with local socket %d (%p)\n",
+		 &local_client_head, local_client_head.fd, newfd->fd, newfd);

 	/* Don't let anyone else to do work until we are started */
 	if (pthread_create(&lvm_thread, &stack_attr, lvm_thread_fn, &lvm_params)) {
@@ -637,6 +675,7 @@ int main(int argc, char *argv[])

 	while ((delfd = local_client_head.next)) {
 		local_client_head.next = delfd->next;
+		_local_client_count--;
 		/* Failing cleanup_zombie leaks... */
 		if (delfd->type == LOCAL_SOCK && !cleanup_zombie(delfd))
 			cmd_client_cleanup(delfd); /* calls sync_unlock */
@@ -698,13 +737,13 @@ static int local_rendezvous_callback(struct local_client *thisfd, char *buf,
 		pthread_mutex_init(&newfd->bits.localsock.mutex, NULL);

 		if (fcntl(client_fd, F_SETFD, 1))
-			DEBUGLOG("Setting CLOEXEC on client fd failed: %s\n", strerror(errno));
+			DEBUGLOG("(%p) Setting CLOEXEC on client fd %d failed: %s\n", thisfd, client_fd, strerror(errno));

 		newfd->fd = client_fd;
 		newfd->type = LOCAL_SOCK;
 		newfd->callback = local_sock_callback;
 		newfd->bits.localsock.all_success = 1;
-		DEBUGLOG("Got new connection on fd %d (%p)\n", newfd->fd, newfd);
+		DEBUGLOG("(%p) Got new connection on fd %d\n", newfd, newfd->fd);
 		*new_client = newfd;
 	}
 	return 1;
@@ -726,8 +765,8 @@ static int local_pipe_callback(struct local_client *thisfd, char *buf,
 	if (len == sizeof(int))
 		memcpy(&status, buffer, sizeof(int));

-	DEBUGLOG("Read on pipe %d, %d bytes, status %d\n",
-		 thisfd->fd, len, status);
+	DEBUGLOG("(%p) Read on pipe %d, %d bytes, status %d\n",
+		 thisfd, thisfd->fd, len, status);

 	/* EOF on pipe or an error, close it */
 	if (len <= 0) {
@@ -750,11 +789,11 @@ static int local_pipe_callback(struct local_client *thisfd, char *buf,
 		}
 		return -1;
 	} else {
-		DEBUGLOG("Background routine status was %d, sock_client (%p)\n",
-			 status, sock_client);
+		DEBUGLOG("(%p) Background routine status was %d, sock_client %p\n",
+			 thisfd, status, sock_client);
 		/* But has the client gone away ?? */
 		if (!sock_client) {
-			DEBUGLOG("Got pipe response for dead client, ignoring it\n");
+			DEBUGLOG("(%p) Got pipe response for dead client, ignoring it\n", thisfd);
 		} else {
 			/* If error then just return that code */
 			if (status)
@@ -794,7 +833,7 @@ static void timedout_callback(struct local_client *client, const char *csid,
 		return;

 	clops->name_from_csid(csid, nodename);
-	DEBUGLOG("Checking for a reply from %s\n", nodename);
+	DEBUGLOG("(%p) Checking for a reply from %s\n", client, nodename);
 	pthread_mutex_lock(&client->bits.localsock.mutex);

 	reply = client->bits.localsock.replies;
@@ -804,7 +843,7 @@ static void timedout_callback(struct local_client *client, const char *csid,
 	pthread_mutex_unlock(&client->bits.localsock.mutex);

 	if (!reply) {
-		DEBUGLOG("Node %s timed-out\n", nodename);
+		DEBUGLOG("(%p) Node %s timed-out\n", client, nodename);
 		add_reply_to_list(client, ETIMEDOUT, csid,
 				  "Command timed out", 18);
 	}
@@ -819,7 +858,7 @@ static void timedout_callback(struct local_client *client, const char *csid,
 */
 static void request_timed_out(struct local_client *client)
 {
-	DEBUGLOG("Request timed-out. padding\n");
+	DEBUGLOG("(%p) Request timed-out. padding\n", client);
 	clops->cluster_do_node_callback(client, timedout_callback);

 	if (!client->bits.localsock.threadid)
@@ -853,13 +892,11 @@ static void main_loop(int cmd_timeout)
 	while (!quit) {
 		fd_set in;
 		int select_status;
-		struct local_client *thisfd;
+		struct local_client *thisfd, *nextfd;
 		struct timeval tv = { cmd_timeout, 0 };
 		int quorate = clops->is_quorate();
 		int client_count = 0;
 		int max_fd = 0;
-		struct local_client *lastfd = &local_client_head;
-		struct local_client *nextfd = local_client_head.next;

 		/* Wait on the cluster FD and all local sockets/pipes */
 		local_client_head.fd = clops->get_main_cluster_fd();
@@ -875,21 +912,22 @@ static void main_loop(int cmd_timeout)
 			fprintf(stderr, "WARNING: Your cluster may freeze up if the number of clvmd file descriptors (%d) exceeds %d.\n", max_fd + 1, FD_SETSIZE);
 		}

-		for (thisfd = &local_client_head; thisfd; thisfd = nextfd, nextfd = thisfd ? thisfd->next : NULL) {
+		for (thisfd = &local_client_head; thisfd; thisfd = nextfd) {
+			nextfd = thisfd->next;

 			if (thisfd->removeme && !cleanup_zombie(thisfd)) {
-				struct local_client *free_fd = thisfd;
-				lastfd->next = nextfd;
-				DEBUGLOG("removeme set for %p with %d monitored fds remaining\n", free_fd, client_count - 1);
+				/* cleanup_zombie might have removed the next list element */
+				nextfd = thisfd->next;
+
+				(void) _del_client(thisfd);
+
+				DEBUGLOG("(%p) removeme set with %d monitored fds remaining\n", thisfd, _local_client_count);

 				/* Queue cleanup, this also frees the client struct */
-				add_to_lvmqueue(free_fd, NULL, 0, NULL);
-
+				add_to_lvmqueue(thisfd, NULL, 0, NULL);
 				continue;
 			}

-			lastfd = thisfd;
-
 			if (thisfd->removeme)
 				continue;

@@ -939,16 +977,15 @@ static void main_loop(int cmd_timeout)
 						    type == CLUSTER_INTERNAL)
 							goto closedown;

-						DEBUGLOG("ret == %d, errno = %d. removing client\n",
-							 ret, errno);
+						DEBUGLOG("(%p) ret == %d, errno = %d. removing client\n",
+							 thisfd, ret, errno);
 						thisfd->removeme = 1;
 						continue;
 					}

 					/* New client...simply add it to the list */
 					if (newfd) {
-						newfd->next = thisfd->next;
-						thisfd->next = newfd;
+						_add_client(newfd, thisfd);
 						thisfd = newfd;
 					}
 				}
@@ -966,8 +1003,8 @@ static void main_loop(int cmd_timeout)
 				    thisfd->bits.localsock.expected_replies !=
 				    thisfd->bits.localsock.num_replies) {
 					/* Send timed out message + replies we already have */
-					DEBUGLOG("Request timed-out (send: %ld, now: %ld)\n",
-						 thisfd->bits.localsock.sent_time, the_time);
+					DEBUGLOG("Request to client %p timed-out (send: %ld, now: %ld)\n",
+						 thisfd, thisfd->bits.localsock.sent_time, the_time);

 					thisfd->bits.localsock.all_success = 0;

@@ -1068,31 +1105,31 @@ static void be_daemon(int timeout)
 		break;

 	default:       /* Parent */
+		(void) close(devnull);
 		(void) close(child_pipe[1]);
-		wait_for_child(child_pipe[0], timeout);
+		wait_for_child(child_pipe[0], timeout); /* noreturn */
 	}

 	/* Detach ourself from the calling environment */
-	if (close(0) || close(1) || close(2)) {
-		perror("Error closing terminal FDs");
-		exit(4);
-	}
-	setsid();
-
-	if (dup2(devnull, 0) < 0 || dup2(devnull, 1) < 0
-	    || dup2(devnull, 2) < 0) {
+	if ((dup2(devnull, STDIN_FILENO) == -1) ||
+	    (dup2(devnull, STDOUT_FILENO) == -1) ||
+	    (dup2(devnull, STDERR_FILENO) == -1)) {
 		perror("Error setting terminal FDs to /dev/null");
 		log_error("Error setting terminal FDs to /dev/null: %m");
 		exit(5);
 	}
+
 	if ((devnull > STDERR_FILENO) && close(devnull)) {
 		log_sys_error("close", "/dev/null");
 		exit(7);
 	}
+
 	if (chdir("/")) {
 		log_error("Error setting current directory to /: %m");
 		exit(6);
 	}
+
+	setsid();
 }

 static int verify_message(char *buf, int len)
@@ -1179,8 +1216,8 @@ static int cleanup_zombie(struct local_client *thisfd)
 	if (!thisfd->bits.localsock.cleanup_needed)
 		return 0;

-	DEBUGLOG("EOF on local socket: inprogress=%d\n",
-		 thisfd->bits.localsock.in_progress);
+	DEBUGLOG("(%p) EOF on local socket %d: inprogress=%d\n",
+		 thisfd, thisfd->fd, thisfd->bits.localsock.in_progress);

 	if ((pipe_client = thisfd->bits.localsock.pipe_client))
 		pipe_client = pipe_client->bits.pipe.client;
@@ -1202,7 +1239,7 @@ static int cleanup_zombie(struct local_client *thisfd)

 	/* Kill the subthread & free resources */
 	if (thisfd->bits.localsock.threadid) {
-		DEBUGLOG("Waiting for pre&post thread (%p)\n", pipe_client);
+		DEBUGLOG("(%p) Waiting for pre&post thread\n", pipe_client);
 		pthread_mutex_lock(&thisfd->bits.localsock.mutex);
 		thisfd->bits.localsock.state = PRE_COMMAND;
 		thisfd->bits.localsock.finished = 1;
@@ -1213,26 +1250,22 @@ static int cleanup_zombie(struct local_client *thisfd)
 					  (void **) &status)))
 			log_sys_error("pthread_join", "");

-		DEBUGLOG("Joined pre&post thread\n");
+		DEBUGLOG("(%p) Joined pre&post thread\n", pipe_client);

 		thisfd->bits.localsock.threadid = 0;

 		/* Remove the pipe client */
 		if (thisfd->bits.localsock.pipe_client) {
-			struct local_client *delfd;
-			struct local_client *lastfd;
+			struct local_client *delfd = thisfd->bits.localsock.pipe_client;

-			(void) close(thisfd->bits.localsock.pipe_client->fd);	/* Close pipe */
+			(void) close(delfd->fd);	/* Close pipe */
 			(void) close(thisfd->bits.localsock.pipe);

 			/* Remove pipe client */
-			for (lastfd = &local_client_head; (delfd = lastfd->next); lastfd = delfd)
-				if (thisfd->bits.localsock.pipe_client == delfd) {
-					thisfd->bits.localsock.pipe_client = NULL;
-					lastfd->next = delfd->next;
-					dm_free(delfd);
-					break;
-				}
+			if (!_del_client(delfd)) {
+				dm_free(delfd);
+				thisfd->bits.localsock.pipe_client = NULL;
+			}
 		}
 	}

@@ -1263,7 +1296,7 @@ static int read_from_local_sock(struct local_client *thisfd)
 	if (len == -1 && errno == EINTR)
 		return 1;

-	DEBUGLOG("Read on local socket %d, len = %d\n", thisfd->fd, len);
+	DEBUGLOG("(%p) Read on local socket %d, len = %d\n", thisfd, thisfd->fd, len);

 	if (len && verify_message(buffer, len) < 0) {
 		log_error("read_from_local_sock from %d len %d bad verify.",
@@ -1337,15 +1370,15 @@ static int read_from_local_sock(struct local_client *thisfd)
 		char *argptr = inheader->node + strlen(inheader->node) + 1;

 		while (missing_len > 0) {
-			DEBUGLOG("got %d bytes, need another %d (total %d)\n",
-				 argslen, missing_len, inheader->arglen);
+			DEBUGLOG("(%p) got %d bytes, need another %d (total %d)\n",
+				 thisfd, argslen, missing_len, inheader->arglen);
 			len = read(thisfd->fd, argptr + argslen, missing_len);
 			if (len == -1 && errno == EINTR)
 				continue;

 			if (len <= 0) {
 				/* EOF or error on socket */
-				DEBUGLOG("EOF on local socket\n");
+				DEBUGLOG("(%p) EOF on local socket\n", thisfd);
 				dm_free(thisfd->bits.localsock.cmd);
 				thisfd->bits.localsock.cmd = NULL;
 				return 0;
@@ -1373,7 +1406,7 @@ static int read_from_local_sock(struct local_client *thisfd)
 			.status = ENOENT
 		};

-		DEBUGLOG("Unknown node: '%s'\n", inheader->node);
+		DEBUGLOG("(%p) Unknown node: '%s'\n", thisfd, inheader->node);
 		send_message(&reply, sizeof(reply), our_csid, thisfd->fd,
 			     "Error sending ENOENT reply to local user");
 		thisfd->bits.localsock.expected_replies = 0;
@@ -1399,7 +1432,7 @@ static int read_from_local_sock(struct local_client *thisfd)
 			.status = EBUSY
 		};

-		DEBUGLOG("Creating pipe failed: %s\n", strerror(errno));
+		DEBUGLOG("(%p) Creating pipe failed: %s\n", thisfd, strerror(errno));
 		send_message(&reply, sizeof(reply), our_csid, thisfd->fd,
 			     "Error sending EBUSY reply to local user");
 		return len;
@@ -1419,7 +1452,7 @@ static int read_from_local_sock(struct local_client *thisfd)
 		return len;
 	}

-	DEBUGLOG("Creating pipe, [%d, %d]\n", comms_pipe[0], comms_pipe[1]);
+	DEBUGLOG("(%p) Creating pipe, [%d, %d]\n", thisfd, comms_pipe[0], comms_pipe[1]);

 	if (fcntl(comms_pipe[0], F_SETFD, 1))
 		DEBUGLOG("setting CLOEXEC on pipe[0] failed: %s\n", strerror(errno));
@@ -1430,8 +1463,8 @@ static int read_from_local_sock(struct local_client *thisfd)
 	newfd->type = THREAD_PIPE;
 	newfd->callback = local_pipe_callback;
 	newfd->bits.pipe.client = thisfd;
-	newfd->next = thisfd->next;
-	thisfd->next = newfd;
+
+	_add_client(newfd, thisfd);

 	/* Store a cross link to the pipe */
 	thisfd->bits.localsock.pipe_client = newfd;
@@ -1444,10 +1477,10 @@ static int read_from_local_sock(struct local_client *thisfd)
 	thisfd->bits.localsock.in_progress = TRUE;
 	thisfd->bits.localsock.state = PRE_COMMAND;
 	thisfd->bits.localsock.cleanup_needed = 1;
-	DEBUGLOG("Creating pre&post thread for pipe fd %d (%p)\n", newfd->fd, newfd);
+	DEBUGLOG("(%p) Creating pre&post thread for pipe fd %d\n", newfd, newfd->fd);
 	status = pthread_create(&thisfd->bits.localsock.threadid,
 				&stack_attr, pre_and_post_thread, thisfd);
-	DEBUGLOG("Created pre&post thread, state = %d\n", status);
+	DEBUGLOG("(%p) Created pre&post thread, state = %d\n", newfd, status);

 	return len;
 }
@@ -1455,13 +1488,6 @@ static int read_from_local_sock(struct local_client *thisfd)
 /* Add a file descriptor from the cluster or comms interface to
   our list of FDs for select
 */
-int add_client(struct local_client *new_client)
-{
-	new_client->next = local_client_head.next;
-	local_client_head.next = new_client;
-
-	return 0;
-}

 /* Called when the pre-command has completed successfully - we
   now execute the real command on all the requested nodes */
@@ -1472,8 +1498,8 @@ static int distribute_command(struct local_client *thisfd)
 	int len = thisfd->bits.localsock.cmd_len;

 	thisfd->xid = global_xid++;
-	DEBUGLOG("distribute command: XID = %d, flags=0x%x (%s%s)\n",
-		 thisfd->xid, inheader->flags,
+	DEBUGLOG("(%p) distribute command: XID = %d, flags=0x%x (%s%s)\n",
+		 thisfd, thisfd->xid, inheader->flags,
 		(inheader->flags & CLVMD_FLAG_LOCAL) ? "LOCAL" : "",
 		(inheader->flags & CLVMD_FLAG_REMOTE) ? "REMOTE" : "");

@@ -1495,7 +1521,7 @@ static int distribute_command(struct local_client *thisfd)
 			 */
 			add_to_lvmqueue(thisfd, inheader, len, NULL);

-			DEBUGLOG("Sending message to all cluster nodes\n");
+			DEBUGLOG("(%p) Sending message to all cluster nodes\n", thisfd);
 			inheader->xid = thisfd->xid;
 			send_message(inheader, len, NULL, -1,
 				     "Error forwarding message to cluster");
@@ -1514,11 +1540,11 @@ static int distribute_command(struct local_client *thisfd)

 			/* Are we the requested node ?? */
 			if (memcmp(csid, our_csid, max_csid_len) == 0) {
-				DEBUGLOG("Doing command on local node only\n");
+				DEBUGLOG("(%p) Doing command on local node only\n", thisfd);
 				add_to_lvmqueue(thisfd, inheader, len, NULL);
 			} else {
-				DEBUGLOG("Sending message to single node: %s\n",
-					 inheader->node);
+				DEBUGLOG("(%p) Sending message to single node: %s\n",
+					 thisfd, inheader->node);
 				inheader->xid = thisfd->xid;
 				send_message(inheader, len, csid, -1,
 					     "Error forwarding message to cluster node");
@@ -1529,7 +1555,7 @@ static int distribute_command(struct local_client *thisfd)
 		thisfd->bits.localsock.in_progress = TRUE;
 		thisfd->bits.localsock.expected_replies = 1;
 		thisfd->bits.localsock.num_replies = 0;
-		DEBUGLOG("Doing command explicitly on local node only\n");
+		DEBUGLOG("(%p) Doing command explicitly on local node only\n", thisfd);
 		add_to_lvmqueue(thisfd, inheader, len, NULL);
 	}

@@ -1655,7 +1681,7 @@ static void add_reply_to_list(struct local_client *client, int status,

 	reply->status = status;
 	clops->name_from_csid(csid, reply->node);
-	DEBUGLOG("Reply from node %s: %d bytes\n", reply->node, len);
+	DEBUGLOG("(%p) Reply from node %s: %d bytes\n", client, reply->node, len);

 	if (len > 0) {
 		if (!(reply->replymsg = dm_malloc(len)))
@@ -1682,8 +1708,8 @@ static void add_reply_to_list(struct local_client *client, int status,
 			client->bits.localsock.state = POST_COMMAND;
 			pthread_cond_signal(&client->bits.localsock.cond);
 		}
-		DEBUGLOG("Got %d replies, expecting: %d\n",
-			 client->bits.localsock.num_replies,
+		DEBUGLOG("(%p) Got %d replies, expecting: %d\n",
+			 client, client->bits.localsock.num_replies,
 			 client->bits.localsock.expected_replies);
 	}
 	pthread_mutex_unlock(&client->bits.localsock.mutex);
@@ -1698,7 +1724,7 @@ static __attribute__ ((noreturn)) void *pre_and_post_thread(void *arg)
 	sigset_t ss;
 	int pipe_fd = client->bits.localsock.pipe;

-	DEBUGLOG("Pre&post thread (%p), pipe fd %d\n", client, pipe_fd);
+	DEBUGLOG("(%p) Pre&post thread pipe fd %d\n", client, pipe_fd);
 	pthread_mutex_lock(&client->bits.localsock.mutex);

 	/* Ignore SIGUSR1 (handled by master process) but enable
@@ -1718,7 +1744,7 @@ static __attribute__ ((noreturn)) void *pre_and_post_thread(void *arg)
 		if ((status = do_pre_command(client)))
 			client->bits.localsock.all_success = 0;

-		DEBUGLOG("Pre&post thread (%p) writes status %d down to pipe fd %d\n",
+		DEBUGLOG("(%p) Pre&post thread writes status %d down to pipe fd %d\n",
 			 client, status, pipe_fd);

 		/* Tell the parent process we have finished this bit */
@@ -1736,13 +1762,13 @@ static __attribute__ ((noreturn)) void *pre_and_post_thread(void *arg)
 		/* We may need to wait for the condition variable before running the post command */
 		if (client->bits.localsock.state != POST_COMMAND &&
 		    !client->bits.localsock.finished) {
-			DEBUGLOG("Pre&post thread (%p) waiting to do post command, state = %d\n",
+			DEBUGLOG("(%p) Pre&post thread waiting to do post command, state = %d\n",
 				 client, client->bits.localsock.state);
 			pthread_cond_wait(&client->bits.localsock.cond,
 					  &client->bits.localsock.mutex);
 		}

-		DEBUGLOG("Pre&post thread (%p) got post command condition...\n", client);
+		DEBUGLOG("(%p) Pre&post thread got post command condition...\n", client);

 		/* POST function must always run, even if the client aborts */
 		status = 0;
@@ -1756,15 +1782,15 @@ static __attribute__ ((noreturn)) void *pre_and_post_thread(void *arg)
 next_pre:
 		if (client->bits.localsock.state != PRE_COMMAND &&
 		    !client->bits.localsock.finished) {
-			DEBUGLOG("Pre&post thread (%p) waiting for next pre command\n", client);
+			DEBUGLOG("(%p) Pre&post thread waiting for next pre command\n", client);
 			pthread_cond_wait(&client->bits.localsock.cond,
 					  &client->bits.localsock.mutex);
 		}

-		DEBUGLOG("Pre&post thread (%p) got pre command condition...\n", client);
+		DEBUGLOG("(%p) Pre&post thread got pre command condition...\n", client);
 	}
 	pthread_mutex_unlock(&client->bits.localsock.mutex);
-	DEBUGLOG("Pre&post thread (%p) finished\n", client);
+	DEBUGLOG("(%p) Pre&post thread finished\n", client);

 	pthread_exit(NULL);
 }
@@ -1782,8 +1808,8 @@ static int process_local_command(struct clvm_header *msg, int msglen,
 	if (!(replybuf = dm_malloc(max_cluster_message)))
 		return -1;

-	DEBUGLOG("process_local_command: %s msg=%p, msglen =%d, client=%p\n",
-		 decode_cmd(msg->cmd), msg, msglen, client);
+	DEBUGLOG("(%p) process_local_command: %s msg=%p, msglen =%d\n",
+		 client, decode_cmd(msg->cmd), msg, msglen);

 	/* If remote flag is set, just set a successful status code. */
 	if (msg->flags & CLVMD_FLAG_REMOTE)
@@ -1798,8 +1824,8 @@ static int process_local_command(struct clvm_header *msg, int msglen,
 	if (xid == client->xid)
 		add_reply_to_list(client, status, our_csid, replybuf, replylen);
 	else
-		DEBUGLOG("Local command took too long, discarding xid %d, current is %d\n",
-			 xid, client->xid);
+		DEBUGLOG("(%p) Local command took too long, discarding xid %d, current is %d\n",
+			 client, xid, client->xid);

 	dm_free(replybuf);

@@ -1841,7 +1867,7 @@ static void send_local_reply(struct local_client *client, int status, int fd)
 	char *ptr;
 	int message_len = 0;

-	DEBUGLOG("Send local reply\n");
+	DEBUGLOG("(%p) Send local reply\n", client);

 	/* Work out the total size of the reply */
 	while (thisreply) {
@@ -1858,7 +1884,7 @@ static void send_local_reply(struct local_client *client, int status, int fd)
 	/* Add in the size of our header */
 	message_len = message_len + sizeof(struct clvm_header);
 	if (!(replybuf = dm_malloc(message_len))) {
-		DEBUGLOG("Memory allocation fails\n");
+		DEBUGLOG("(%p) Memory allocation fails\n", client);
 		return;
 	}

@@ -1987,6 +2013,7 @@ static int send_message(void *buf, int msglen, const char *csid, int fd,
 				(void) nanosleep (&delay, &remtime);
 				continue;
 			}
+			DEBUGLOG("%s", errtext);
 			log_error("%s", errtext);
 			break;
 		}
@@ -2000,7 +2027,7 @@ static int process_work_item(struct lvm_thread_cmd *cmd)
 {
 	/* If msg is NULL then this is a cleanup request */
 	if (cmd->msg == NULL) {
-		DEBUGLOG("process_work_item: free %p\n", cmd->client);
+		DEBUGLOG("(%p) process_work_item: free\n", cmd->client);
 		cmd_client_cleanup(cmd->client);
 		pthread_mutex_destroy(&cmd->client->bits.localsock.mutex);
 		pthread_cond_destroy(&cmd->client->bits.localsock.cond);
@@ -2009,11 +2036,11 @@ static int process_work_item(struct lvm_thread_cmd *cmd)
 	}

 	if (!cmd->remote) {
-		DEBUGLOG("process_work_item: local\n");
+		DEBUGLOG("(%p) process_work_item: local\n", cmd->client);
 		process_local_command(cmd->msg, cmd->msglen, cmd->client,
 				      cmd->xid);
 	} else {
-		DEBUGLOG("process_work_item: remote\n");
+		DEBUGLOG("(%p) process_work_item: remote\n", cmd->client);
 		process_remote_command(cmd->msg, cmd->msglen, cmd->client->fd,
 				       cmd->csid);
 	}
@@ -2107,8 +2134,8 @@ static int add_to_lvmqueue(struct local_client *client, struct clvm_header *msg,
 	} else
 		cmd->remote = 0;

-	DEBUGLOG("add_to_lvmqueue: cmd=%p. client=%p, msg=%p, len=%d, csid=%p, xid=%d\n",
-		 cmd, client, msg, msglen, csid, cmd->xid);
+	DEBUGLOG("(%p) add_to_lvmqueue: cmd=%p, msg=%p, len=%d, csid=%p, xid=%d\n",
+		 client, cmd, msg, msglen, csid, cmd->xid);
 	pthread_mutex_lock(&lvm_thread_mutex);
 	if (lvm_thread_exit) {
 		pthread_mutex_unlock(&lvm_thread_mutex);
@@ -2124,6 +2151,14 @@ static int add_to_lvmqueue(struct local_client *client, struct clvm_header *msg,
 }

 /* Return 0 if we can talk to an existing clvmd */
+/*
+ * FIXME:
+ *
+ * This function returns only -1 or 0, but there are
+ * different levels of errors, some of them should stop
+ * further execution of clvmd thus another state is needed
+ * and some error message need to be only informational.
+ */
 static int check_local_clvmd(void)
 {
 	int local_socket;
@@ -2143,7 +2178,11 @@ static int check_local_clvmd(void)

 	if (connect(local_socket,(struct sockaddr *) &sockaddr,
 		    sizeof(sockaddr))) {
-		log_sys_error("connect", "local socket");
+		/* connection failure is expected state */
+		if (errno == ENOENT)
+			log_sys_debug("connect", "local socket");
+		else
+			log_sys_error("connect", "local socket");
 		ret = -1;
 	}

@@ -2244,7 +2283,8 @@ static void check_all_callback(struct local_client *client, const char *csid,
   If not, returns -1 and prints out a list of errant nodes */
 static int check_all_clvmds_running(struct local_client *client)
 {
-	DEBUGLOG("check_all_clvmds_running\n");
+	DEBUGLOG("(%p) check_all_clvmds_running\n", client);
+
 	return clops->cluster_do_node_callback(client, check_all_callback);
 }

@@ -2283,13 +2323,11 @@ static void ntoh_clvm(struct clvm_header *hdr)
 static void sigusr2_handler(int sig)
 {
 	DEBUGLOG("SIGUSR2 received\n");
-	return;
 }

 static void sigterm_handler(int sig)
 {
 	quit = 1;
-	return;
 }

 static void sighup_handler(int sig)
--- a/daemons/cmirrord/Makefile.in
+++ b/daemons/cmirrord/Makefile.in
@@ -29,7 +29,7 @@ include $(top_builddir)/make.tmpl
 LIBS += -ldevmapper
 LMLIBS += $(CPG_LIBS) $(SACKPT_LIBS)
 CFLAGS += $(CPG_CFLAGS) $(SACKPT_CFLAGS) $(EXTRA_EXEC_CFLAGS)
-LDFLAGS += $(EXTRA_EXEC_LDFLAGS)
+LDFLAGS += $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS)

 cmirrord: $(OBJECTS) $(top_builddir)/lib/liblvm-internal.a
 	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) \
--- a/daemons/cmirrord/cluster.c
+++ b/daemons/cmirrord/cluster.c
@@ -182,7 +182,7 @@ int cluster_send(struct clog_request *rq)
 	}

 	/*
-	 * Once the request heads for the cluster, the luid looses
+	 * Once the request heads for the cluster, the luid loses
 	 * all its meaning.
 	 */
 	rq->u_rq.luid = 0;
--- a/daemons/cmirrord/functions.c
+++ b/daemons/cmirrord/functions.c
@@ -377,7 +377,7 @@ static int _clog_ctr(char *uuid, uint64_t luid,
 	uint32_t block_on_error = 0;

 	int disk_log;
-	char disk_path[128];
+	char disk_path[PATH_MAX];
 	int unlink_path = 0;
 	long page_size;
 	int pages;
--- a/daemons/dmeventd/Makefile.in
+++ b/daemons/dmeventd/Makefile.in
@@ -56,18 +56,16 @@ include $(top_builddir)/make.tmpl
 all: device-mapper
 device-mapper: $(TARGETS)

-LIBS += -ldevmapper
-LVMLIBS += -ldevmapper-event $(PTHREAD_LIBS)
-
 CFLAGS_dmeventd.o += $(EXTRA_EXEC_CFLAGS)
+LIBS += -ldevmapper $(PTHREAD_LIBS)

 dmeventd: $(LIB_SHARED) dmeventd.o
-	$(CC) $(CFLAGS) $(LDFLAGS) $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS) -L. -o $@ dmeventd.o \
-	$(DL_LIBS) $(LVMLIBS) $(LIBS) -rdynamic
+	$(CC) $(CFLAGS) -L. $(LDFLAGS) $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS) dmeventd.o \
+		-o $@ $(DL_LIBS) $(DMEVENT_LIBS) $(LIBS)

 dmeventd.static: $(LIB_STATIC) dmeventd.o $(interfacebuilddir)/libdevmapper.a
-	$(CC) $(CFLAGS) $(LDFLAGS) $(ELDFLAGS) -static -L. -L$(interfacebuilddir) -o $@ \
-	dmeventd.o $(DL_LIBS) $(LVMLIBS) $(LIBS) $(STATIC_LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) -static -L. -L$(interfacebuilddir) dmeventd.o \
+		-o $@ $(DL_LIBS) $(DMEVENT_LIBS) $(LIBS) $(STATIC_LIBS)

 ifeq ("@PKGCONFIG@", "yes")
  INSTALL_LIB_TARGETS += install_pkgconfig
--- a/daemons/dmeventd/dmeventd.c
+++ b/daemons/dmeventd/dmeventd.c
@@ -62,6 +62,8 @@

 #include <syslog.h>

+#define DM_SIGNALED_EXIT  1
+#define DM_SCHEDULED_EXIT 2
 static volatile sig_atomic_t _exit_now = 0;	/* set to '1' when signal is given to exit */

 /* List (un)link macros. */
@@ -752,6 +754,7 @@ static void *_timeout_thread(void *unused __attribute__((unused)))
 	struct thread_status *thread;
 	struct timespec timeout;
 	time_t curr_time;
+	int ret;

 	DEBUGLOG("Timeout thread starting.");
 	pthread_cleanup_push(_exit_timeout, NULL);
@@ -773,7 +776,10 @@ static void *_timeout_thread(void *unused __attribute__((unused)))
 				} else {
 					DEBUGLOG("Sending SIGALRM to Thr %x for timeout.",
 						 (int) thread->thread);
-					pthread_kill(thread->thread, SIGALRM);
+					ret = pthread_kill(thread->thread, SIGALRM);
+					if (ret && (ret != ESRCH))
+						log_error("Unable to wakeup Thr %x for timeout: %s.",
+							  (int) thread->thread, strerror(ret));
 				}
 				_unlock_mutex();
 			}
@@ -863,6 +869,7 @@ static int _event_wait(struct thread_status *thread)
 	 * This is so that you can break out of waiting on an event,
 	 * either for a timeout event, or to cancel the thread.
 	 */
+	sigemptyset(&old);
 	sigemptyset(&set);
 	sigaddset(&set, SIGALRM);
 	if (pthread_sigmask(SIG_UNBLOCK, &set, &old) != 0) {
@@ -1750,7 +1757,7 @@ static void _init_thread_signals(void)
 */
 static void _exit_handler(int sig __attribute__((unused)))
 {
-	_exit_now = 1;
+	_exit_now = DM_SIGNALED_EXIT;
 }

 #ifdef __linux__
@@ -2248,11 +2255,14 @@ int main(int argc, char *argv[])
 	for (;;) {
 		if (_idle_since) {
 			if (_exit_now) {
+				if (_exit_now == DM_SCHEDULED_EXIT)
+					break; /* Only prints shutdown message */
 				log_info("dmeventd detected break while being idle "
 					 "for %ld second(s), exiting.",
 					 (long) (time(NULL) - _idle_since));
 				break;
-			} else if (idle_exit_timeout) {
+			}
+			if (idle_exit_timeout) {
 				now = time(NULL);
 				if (now < _idle_since)
 					_idle_since = now; /* clock change? */
@@ -2263,15 +2273,14 @@ int main(int argc, char *argv[])
 					break;
 				}
 			}
-		} else if (_exit_now) {
-			_exit_now = 0;
+		} else if (_exit_now == DM_SIGNALED_EXIT) {
+			_exit_now = DM_SCHEDULED_EXIT;
 			/*
 			 * When '_exit_now' is set, signal has been received,
 			 * but can not simply exit unless all
 			 * threads are done processing.
 			 */
-			log_warn("WARNING: There are still devices being monitored.");
-			log_warn("WARNING: Refusing to exit.");
+			log_info("dmeventd received break, scheduling exit.");
 		}
 		_process_request(&fifos);
 		_cleanup_unused_threads();
--- a/daemons/dmeventd/libdevmapper-event.c
+++ b/daemons/dmeventd/libdevmapper-event.c
@@ -250,10 +250,9 @@ static int _daemon_read(struct dm_event_fifos *fifos,
 		if (ret < 0) {
 			if ((errno == EINTR) || (errno == EAGAIN))
 				continue;
-			else {
-				log_error("Unable to read from event server.");
-				return 0;
-			}
+
+			log_error("Unable to read from event server.");
+			return 0;
 		}

 		bytes += ret;
@@ -329,10 +328,9 @@ static int _daemon_write(struct dm_event_fifos *fifos,
 		if (ret < 0) {
 			if ((errno == EINTR) || (errno == EAGAIN))
 				continue;
-			else {
-				log_error("Unable to talk to event daemon.");
-				return 0;
-			}
+
+			log_error("Unable to talk to event daemon.");
+			return 0;
 		}

 		bytes += ret;
@@ -454,7 +452,8 @@ static int _start_daemon(char *dmeventd_path, struct dm_event_fifos *fifos)
 		if (close(fifos->client))
 			log_sys_debug("close", fifos->client_path);
 		return 1;
-	} else if (errno != ENXIO && errno != ENOENT)  {
+	}
+	if (errno != ENXIO && errno != ENOENT)  {
 		/* problem */
 		log_sys_error("open", fifos->client_path);
 		return 0;
--- a/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
+++ b/daemons/dmeventd/plugins/mirror/dmeventd_mirror.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2005-2015 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2005-2017 Red Hat, Inc. All rights reserved.
 *
 * This file is part of LVM2.
 *
@@ -25,7 +25,6 @@

 struct dso_state {
 	struct dm_pool *mem;
-	char cmd_lvscan[512];
 	char cmd_lvconvert[512];
 };

@@ -99,21 +98,14 @@ static int _get_mirror_event(struct dso_state *state, char *params)
 	return r;
 }

-static int _remove_failed_devices(const char *cmd_lvscan, const char *cmd_lvconvert,
-				  const char *device)
+static int _remove_failed_devices(const char *cmd_lvconvert, const char *device)
 {
-	if (!dmeventd_lvm2_run_with_lock(cmd_lvscan))
-		log_warn("WARNING: Re-scan of mirrored device %s failed.", device);
-
 	/* if repair goes OK, report success even if lvscan has failed */
 	if (!dmeventd_lvm2_run_with_lock(cmd_lvconvert)) {
 		log_error("Repair of mirrored device %s failed.", device);
 		return 0;
 	}

-	if (!dmeventd_lvm2_run_with_lock(cmd_lvscan))
-		log_warn("WARNING: Re-scan of mirrored device %s failed.", device);
-
 	log_info("Repair of mirrored device %s finished successfully.", device);

 	return 1;
@@ -154,9 +146,7 @@ void process_event(struct dm_task *dmt,
 			break;
 		case ME_FAILURE:
 			log_error("Device failure in %s.", device);
-			if (!_remove_failed_devices(state->cmd_lvscan,
-						    state->cmd_lvconvert,
-						    device))
+			if (!_remove_failed_devices(state->cmd_lvconvert, device))
 				/* FIXME Why are all the error return codes unused? Get rid of them? */
 				log_error("Failed to remove faulty devices in %s.",
 					  device);
@@ -186,12 +176,9 @@ int register_device(const char *device,
 	if (!dmeventd_lvm2_init_with_pool("mirror_state", state))
 		goto_bad;

-	if (!dmeventd_lvm2_command(state->mem, state->cmd_lvscan, sizeof(state->cmd_lvscan),
-				   "lvscan --cache", device))
-		goto_bad;
-
+        /* CANNOT use --config as this disables cached content */
 	if (!dmeventd_lvm2_command(state->mem, state->cmd_lvconvert, sizeof(state->cmd_lvconvert),
-				   "lvconvert --config global{use_lvmetad = 0}' --repair --use-policies", device))
+				   "lvconvert --repair --use-policies", device))
 		goto_bad;

 	*user = state;
--- a/daemons/dmeventd/plugins/raid/dmeventd_raid.c
+++ b/daemons/dmeventd/plugins/raid/dmeventd_raid.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2005-2016 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2005-2017 Red Hat, Inc. All rights reserved.
 *
 * This file is part of LVM2.
 *
@@ -22,7 +22,6 @@

 struct dso_state {
 	struct dm_pool *mem;
-	char cmd_lvscan[512];
 	char cmd_lvconvert[512];
 	uint64_t raid_devs[RAID_DEVS_ELEMS];
 	int failed;
@@ -59,6 +58,23 @@ static int _process_raid_event(struct dso_state *state, char *params, const char
 		dead = 1;
 	}

+	/*
+	 * if we are converting from non-RAID to RAID (e.g. linear -> raid1)
+	 * and too many original devices die, such that we cannot continue
+	 * the "recover" operation, the sync action will go to "idle", the
+	 * unsynced devs will remain at 'a', and the original devices will
+	 * NOT SWITCH TO 'D', but will remain at 'A' - hoping to be revived.
+	 *
+	 * This is simply the way the kernel works...
+	 */
+	if (!strcmp(status->sync_action, "idle") &&
+	    (status->dev_health[0] == 'a') &&
+	    (status->insync_regions < status->total_regions)) {
+		log_error("Primary sources for new RAID, %s, have failed.",
+			  device);
+		dead = 1; /* run it through LVM repair */
+	}
+
 	if (dead) {
 		if (status->insync_regions < status->total_regions) {
 			if (!state->warned) {
@@ -74,8 +90,6 @@ static int _process_raid_event(struct dso_state *state, char *params, const char
 			goto out; /* already reported */

 		state->failed = 1;
-		if (!dmeventd_lvm2_run_with_lock(state->cmd_lvscan))
-			log_warn("WARNING: Re-scan of RAID device %s failed.", device);

 		/* if repair goes OK, report success even if lvscan has failed */
 		if (!dmeventd_lvm2_run_with_lock(state->cmd_lvconvert)) {
@@ -84,6 +98,8 @@ static int _process_raid_event(struct dso_state *state, char *params, const char
 		}
 	} else {
 		state->failed = 0;
+		if (status->insync_regions == status->total_regions)
+			memset(&state->raid_devs, 0, sizeof(state->raid_devs));
 		log_info("%s array, %s, is %s in-sync.",
 			 status->raid_type, device,
 			 (status->insync_regions == status->total_regions) ? "now" : "not");
@@ -136,11 +152,8 @@ int register_device(const char *device,
 	if (!dmeventd_lvm2_init_with_pool("raid_state", state))
 		goto_bad;

-	if (!dmeventd_lvm2_command(state->mem, state->cmd_lvscan, sizeof(state->cmd_lvscan),
-				   "lvscan --cache", device) ||
-	    !dmeventd_lvm2_command(state->mem, state->cmd_lvconvert, sizeof(state->cmd_lvconvert),
-				   "lvconvert --config devices{ignore_suspended_devices=1} "
-				   "--repair --use-policies", device))
+	if (!dmeventd_lvm2_command(state->mem, state->cmd_lvconvert, sizeof(state->cmd_lvconvert),
+				   "lvconvert --repair --use-policies", device))
 		goto_bad;

 	*user = state;
--- a/daemons/dmeventd/plugins/snapshot/dmeventd_snapshot.c
+++ b/daemons/dmeventd/plugins/snapshot/dmeventd_snapshot.c
@@ -231,7 +231,7 @@ void process_event(struct dm_task *dmt,

 		if (percent >= WARNING_THRESH) /* Print a warning to syslog. */
 			log_warn("WARNING: Snapshot %s is now %.2f%% full.",
-				 device, dm_percent_to_float(percent));
+				 device, dm_percent_to_round_float(percent, 2));

 		/* Try to extend the snapshot, in accord with user-set policies */
 		if (!_extend(state->cmd_lvextend))
--- a/daemons/dmeventd/plugins/thin/dmeventd_thin.c
+++ b/daemons/dmeventd/plugins/thin/dmeventd_thin.c
@@ -47,10 +47,8 @@ struct dso_state {
 	struct dm_pool *mem;
 	int metadata_percent_check;
 	int metadata_percent;
-	int metadata_warn_once;
 	int data_percent_check;
 	int data_percent;
-	int data_warn_once;
 	uint64_t known_metadata_size;
 	uint64_t known_data_size;
 	unsigned fails;
@@ -64,8 +62,6 @@ struct dso_state {

 DM_EVENT_LOG_FN("thin")

-#define UUID_PREFIX "LVM-"
-
 static int _run_command(struct dso_state *state)
 {
 	char val[3][36];
@@ -174,8 +170,8 @@ void process_event(struct dm_task *dmt,

 #if THIN_DEBUG
 	log_debug("Watch for tp-data:%.2f%%  tp-metadata:%.2f%%.",
-		  dm_percent_to_float(state->data_percent_check),
-		  dm_percent_to_float(state->metadata_percent_check));
+		  dm_percent_to_round_float(state->data_percent_check, 2),
+		  dm_percent_to_round_float(state->metadata_percent_check, 2));
 #endif
 	if (!_wait_for_pid(state)) {
 		log_warn("WARNING: Skipping event, child %d is still running (%s).",
@@ -253,11 +249,10 @@ void process_event(struct dm_task *dmt,
 	 * action is called for:  >50%, >55% ... >95%, 100%
 	 */
 	state->metadata_percent = dm_make_percent(tps->used_metadata_blocks, tps->total_metadata_blocks);
-	if (state->metadata_percent <= WARNING_THRESH)
-		state->metadata_warn_once = 0; /* Dropped bellow threshold, reset warn once */
-	else if (!state->metadata_warn_once++) /* Warn once when raised above threshold */
+	if ((state->metadata_percent > WARNING_THRESH) &&
+	    (state->metadata_percent > state->metadata_percent_check))
 		log_warn("WARNING: Thin pool %s metadata is now %.2f%% full.",
-			 device, dm_percent_to_float(state->metadata_percent));
+			 device, dm_percent_to_round_float(state->metadata_percent, 2));
 	if (state->metadata_percent > CHECK_MINIMUM) {
 		/* Run action when usage raised more than CHECK_STEP since the last time */
 		if (state->metadata_percent > state->metadata_percent_check)
@@ -269,11 +264,10 @@ void process_event(struct dm_task *dmt,
 		state->metadata_percent_check = CHECK_MINIMUM;

 	state->data_percent = dm_make_percent(tps->used_data_blocks, tps->total_data_blocks);
-	if (state->data_percent <= WARNING_THRESH)
-		state->data_warn_once = 0;
-	else if (!state->data_warn_once++)
+	if ((state->data_percent > WARNING_THRESH) &&
+	    (state->data_percent > state->data_percent_check))
 		log_warn("WARNING: Thin pool %s data is now %.2f%% full.",
-			 device, dm_percent_to_float(state->data_percent));
+			 device, dm_percent_to_round_float(state->data_percent, 2));
 	if (state->data_percent > CHECK_MINIMUM) {
 		/* Run action when usage raised more than CHECK_STEP since the last time */
 		if (state->data_percent > state->data_percent_check)
--- a/daemons/dmfilemapd/Makefile.in
+++ b/daemons/dmfilemapd/Makefile.in
@@ -34,17 +34,16 @@ include $(top_builddir)/make.tmpl
 all: device-mapper
 device-mapper: $(TARGETS)

+CFLAGS_dmfilemapd.o += $(EXTRA_EXEC_CFLAGS)
 LIBS += -ldevmapper

-CFLAGS_dmfilemapd.o += $(EXTRA_EXEC_CFLAGS)
-
 dmfilemapd: $(LIB_SHARED) dmfilemapd.o
-	$(CC) $(CFLAGS) $(LDFLAGS) $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS) -L. -o $@ dmfilemapd.o \
-	$(DL_LIBS) $(LVMLIBS) $(LIBS) -rdynamic
+	$(CC) $(CFLAGS) $(LDFLAGS) $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS) \
+		-o $@ dmfilemapd.o $(DL_LIBS) $(LIBS)

 dmfilemapd.static: $(LIB_STATIC) dmfilemapd.o $(interfacebuilddir)/libdevmapper.a
-	$(CC) $(CFLAGS) $(LDFLAGS) $(ELDFLAGS) -static -L. -L$(interfacebuilddir) -o $@ \
-	dmfilemapd.o $(DL_LIBS) $(LVMLIBS) $(LIBS) $(STATIC_LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) $(ELDFLAGS) -static -L$(interfacebuilddir) \
+		-o $@ dmfilemapd.o $(DL_LIBS) $(LIBS) $(STATIC_LIBS)

 ifneq ("$(CFLOW_CMD)", "")
 CFLOW_SOURCES = $(addprefix $(srcdir)/, $(SOURCES))
@@ -60,9 +59,8 @@ install_dmfilemapd_dynamic: dmfilemapd
 install_dmfilemapd_static: dmfilemapd.static
 	$(INSTALL_PROGRAM) -D $< $(staticdir)/$(<F)

-install_dmfilemapd: $(INSTALL_DMEVENTD_TARGETS)
+install_dmfilemapd: $(INSTALL_DMFILEMAPD_TARGETS)

 install: install_dmfilemapd

 install_device-mapper: install_dmfilemapd
-
--- a/daemons/dmfilemapd/dmfilemapd.c
+++ b/daemons/dmfilemapd/dmfilemapd.c
@@ -55,7 +55,7 @@ struct filemap_monitor {

 	/* monitoring heuristics */
 	int64_t blocks; /* allocated blocks, from stat.st_blocks */
-	int64_t nr_regions;
+	uint64_t nr_regions;
 	int deleted;
 };

@@ -151,7 +151,7 @@ static int _is_open_in_pid(pid_t pid, const char *path)
 	if (dm_snprintf(path_buf, sizeof(path_buf),
 			DEFAULT_PROC_DIR "%d/fd", pid) < 0) {
 		log_error("Could not format pid path.");
-		goto bad;
+		return 0;
 	}

 	/*
@@ -160,12 +160,13 @@ static int _is_open_in_pid(pid_t pid, const char *path)
 	if (dm_snprintf(deleted_path, sizeof(deleted_path), "%s %s",
 			path, PROC_FD_DELETED_STR) < 0) {
 		log_error("Could not format check path.");
+		return 0;
 	}

 	pid_d = opendir(path_buf);
 	if (!pid_d) {
 		log_error("Could not open proc path: %s.", path_buf);
-		goto bad;
+		return 0;
 	}

 	while ((pid_dp = readdir(pid_d)) != NULL) {
@@ -179,13 +180,16 @@ static int _is_open_in_pid(pid_t pid, const char *path)
 		}
 		link_buf[len] = '\0';
 		if (!strcmp(deleted_path, link_buf)) {
-			closedir(pid_d);
+			if (closedir(pid_d))
+				log_sys_error("closedir", path_buf);
 			return 1;
 		}
 	}

 bad:
-	closedir(pid_d);
+	if (closedir(pid_d))
+		log_sys_error("closedir", path_buf);
+
 	return 0;
 }

@@ -221,15 +225,20 @@ static int _is_open(const char *path)
 	while ((proc_dp = readdir(proc_d)) != NULL) {
 		if (!isdigit(proc_dp->d_name[0]))
 			continue;
-		pid = strtol(proc_dp->d_name, NULL, 10);
-		if (!pid)
+		errno = 0;
+		pid = (pid_t) strtol(proc_dp->d_name, NULL, 10);
+		if (errno || !pid)
 			continue;
 		if (_is_open_in_pid(pid, path)) {
-			closedir(proc_d);
+			if (closedir(proc_d))
+				log_sys_error("closedir", DEFAULT_PROC_DIR);
 			return 1;
 		}
 	}
-	closedir(proc_d);
+
+	if (closedir(proc_d))
+		log_sys_error("closedir", DEFAULT_PROC_DIR);
+
 	return 0;
 }

@@ -258,8 +267,6 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 		return 0;
 	}

-	memset(fm, 0, sizeof(*fm));
-
 	/*
 	 * We don't know the true nr_regions at daemon start time,
 	 * and it is not worth a dm_stats_list()/group walk to count:
@@ -272,8 +279,9 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 	fm->nr_regions = 1;

 	/* parse <fd> */
-	fm->fd = strtol(argv[0], &endptr, 10);
-	if (*endptr) {
+	errno = 0;
+	fm->fd = (int) strtol(argv[0], &endptr, 10);
+	if (errno || *endptr) {
 		_early_log("Could not parse file descriptor: %s", argv[0]);
 		return 0;
 	}
@@ -282,8 +290,9 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 	argv++;

 	/* parse <group_id> */
+	errno = 0;
 	fm->group_id = strtoull(argv[0], &endptr, 10);
-	if (*endptr) {
+	if (*endptr || errno) {
 		_early_log("Could not parse group identifier: %s", argv[0]);
 		return 0;
 	}
@@ -297,7 +306,7 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 		return 0;
 	}

-	if (argv[0] != '/') {
+	if (*argv[0] != '/') {
 		_early_log("Path argument must specify an absolute path.");
 		return 0;
 	}
@@ -326,8 +335,9 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)

 	/* parse [<foreground>[<verbose>]] */
 	if (argc) {
-		_foreground = strtol(argv[0], &endptr, 10);
-		if (*endptr) {
+		errno = 0;
+		_foreground = (int) strtol(argv[0], &endptr, 10);
+		if (errno || *endptr) {
 			_early_log("Could not parse debug argument: %s.",
 				   argv[0]);
 			return 0;
@@ -335,8 +345,9 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 		argc--;
 		argv++;
 		if (argc) {
-			_verbose = strtol(argv[0], &endptr, 10);
-			if (*endptr) {
+			errno = 0;
+			_verbose = (int) strtol(argv[0], &endptr, 10);
+			if (errno || *endptr) {
 				_early_log("Could not parse verbose "
 					   "argument: %s", argv[0]);
 				return 0;
@@ -351,30 +362,33 @@ static int _parse_args(int argc, char **argv, struct filemap_monitor *fm)
 	return 1;
 }

-static int _filemap_fd_check_changed(struct filemap_monitor *fm)
+static int _filemap_fd_update_blocks(struct filemap_monitor *fm)
 {
-	int64_t blocks, old_blocks;
 	struct stat buf;

 	if (fm->fd < 0) {
 		log_error("Filemap fd is not open.");
-		return -1;
+		return 0;
 	}

 	if (fstat(fm->fd, &buf)) {
 		log_error("Failed to fstat filemap file descriptor.");
-		return -1;
+		return 0;
 	}

-	blocks = buf.st_blocks;
+	fm->blocks = buf.st_blocks;

-	/* first check? */
-	if (fm->blocks < 0)
-		old_blocks = buf.st_blocks;
-	else
-		old_blocks = fm->blocks;
+	return 1;
+}

-	fm->blocks = blocks;
+static int _filemap_fd_check_changed(struct filemap_monitor *fm)
+{
+	int64_t old_blocks;
+
+	old_blocks = fm->blocks;
+
+	if (!_filemap_fd_update_blocks(fm))
+		return -1;

 	return (fm->blocks != old_blocks);
 }
@@ -401,13 +415,13 @@ static int _filemap_monitor_set_notify(struct filemap_monitor *fm)
 	 * and does not fork or exec.
 	 */
 	if ((inotify_fd = inotify_init1(IN_NONBLOCK)) < 0) {
-		_early_log("Failed to initialise inotify.");
+		log_sys_error("inotify_init1", "IN_NONBLOCK");
 		return 0;
 	}

 	if ((watch_fd = inotify_add_watch(inotify_fd, fm->path,
 					  IN_MODIFY | IN_DELETE_SELF)) < 0) {
-		_early_log("Failed to add inotify watch.");
+		log_sys_error("inotify_add_watch", fm->path);
 		return 0;
 	}
 	fm->inotify_fd = inotify_fd;
@@ -525,6 +539,7 @@ static void _filemap_monitor_destroy(struct filemap_monitor *fm)
 		_filemap_monitor_close_fd(fm);
 	}
 	dm_free((void *) fm->program_id);
+	dm_free(fm->path);
 }

 static int _filemap_monitor_check_same_file(int fd1, int fd2)
@@ -551,19 +566,23 @@ static int _filemap_monitor_check_file_unlinked(struct filemap_monitor *fm)
 {
 	char path_buf[PATH_MAX];
 	char link_buf[PATH_MAX];
-	int same, fd, len;
+	int same, fd;
+	ssize_t len;

 	fm->deleted = 0;
+	same = 0;

 	if ((fd = open(fm->path, O_RDONLY)) < 0)
 		goto check_unlinked;

-	if ((same = _filemap_monitor_check_same_file(fm->fd, fd)) < 0)
-		return 0;
+	same = _filemap_monitor_check_same_file(fm->fd, fd);

 	if (close(fd))
 		log_error("Error closing fd %d", fd);

+	if (same < 0)
+		return 0;
+
 	if (same)
 		return 1;

@@ -578,24 +597,27 @@ check_unlinked:
 		log_error("Could not format pid path.");
 		return 0;
 	}
-	if ((len = readlink(path_buf, link_buf, sizeof(link_buf))) < 0) {
+	if ((len = readlink(path_buf, link_buf, sizeof(link_buf) - 1)) < 0) {
 		log_error("readlink failed for " DEFAULT_PROC_DIR "/%d/fd/%d.",
 			  getpid(), fm->fd);
 		return 0;
 	}
+	link_buf[len] = '\0';

 	/*
 	 * Try to re-open the file, from the path now reported in /proc/pid/fd.
 	 */
 	if ((fd = open(link_buf, O_RDONLY)) < 0)
 		fm->deleted = 1;
+	else
+		same = _filemap_monitor_check_same_file(fm->fd, fd);

-	if ((same = _filemap_monitor_check_same_file(fm->fd, fd)) < 0)
-		return 0;
-
-	if ((fd > 0) && close(fd))
+	if ((fd >= 0) && close(fd))
 		log_error("Error closing fd %d", fd);

+	if (same < 0)
+		return 0;
+
 	/* Should not happen with normal /proc. */
 	if ((fd > 0) && !same) {
 		log_error("File descriptor mismatch: %d and %s (read from %s) "
@@ -645,11 +667,11 @@ static int _daemonise(struct filemap_monitor *fm)
 			return 0;
 		}
 	}
-
-	for (fd = sysconf(_SC_OPEN_MAX) - 1; fd > STDERR_FILENO; fd--) {
+	/* TODO: Use libdaemon/server/daemon-server.c _daemonise() */
+	for (fd = (int) sysconf(_SC_OPEN_MAX) - 1; fd > STDERR_FILENO; fd--) {
 		if (fd == fm->fd)
 			continue;
-		close(fd);
+		(void) close(fd);
 	}

 	return 1;
@@ -669,12 +691,15 @@ static int _update_regions(struct dm_stats *dms, struct filemap_monitor *fm)
 	for (region = regions; *region != DM_STATS_REGIONS_ALL; region++)
 		nr_regions++;

-	if (regions[0] != fm->group_id) {
+	if (!nr_regions)
+		log_warn("File contains no extents: exiting.");
+
+	if (nr_regions && (regions[0] != fm->group_id)) {
 		log_warn("group_id changed from " FMTu64 " to " FMTu64,
 			 fm->group_id, regions[0]);
 		fm->group_id = regions[0];
 	}
-
+	dm_free(regions);
 	fm->nr_regions = nr_regions;
 	return 1;
 }
@@ -689,7 +714,8 @@ static int _dmfilemapd(struct filemap_monitor *fm)
 	 * The correct program_id is retrieved from the group leader
 	 * following the call to dm_stats_list().
 	 */
-	dms = dm_stats_create(NULL);
+	if (!(dms = dm_stats_create(NULL)))
+		goto_bad;

 	if (!dm_stats_bind_from_fd(dms, fm->fd)) {
 		log_error("Could not bind dm_stats handle to file descriptor "
@@ -700,6 +726,9 @@ static int _dmfilemapd(struct filemap_monitor *fm)
 	if (!_filemap_monitor_set_notify(fm))
 		goto bad;

+	if (!_filemap_fd_update_blocks(fm))
+		goto bad;
+
 	if (!dm_stats_list(dms, DM_STATS_ALL_PROGRAMS)) {
 		log_error("Failed to list stats handle.");
 		goto bad;
@@ -733,17 +762,16 @@ static int _dmfilemapd(struct filemap_monitor *fm)
 		if ((check = _filemap_fd_check_changed(fm)) < 0)
 			goto bad;

-		if (!check)
-			goto wait;
-
-		if (!_update_regions(dms, fm))
+		if (check && !_update_regions(dms, fm))
 			goto bad;

+		running = !!fm->nr_regions;
+		if (!running)
+			continue;
+
 wait:
 		_filemap_monitor_wait(FILEMAPD_WAIT_USECS);

-		running = !!fm->nr_regions;
-
 		/* mode=inode termination condions */
 		if (fm->mode == DM_FILEMAPD_FOLLOW_INODE) {
 			if (!_filemap_monitor_check_file_unlinked(fm))
@@ -786,8 +814,12 @@ int main(int argc, char **argv)
 {
 	struct filemap_monitor fm;

-	if (!_parse_args(argc, argv, &fm))
+	memset(&fm, 0, sizeof(fm));
+
+	if (!_parse_args(argc, argv, &fm)) {
+		dm_free(fm.path);
 		return 1;
+	}

 	_setup_logging();

--- a/daemons/lvmdbusd/.gitignore
+++ b/daemons/lvmdbusd/.gitignore
@@ -1 +1,4 @@
 path.py
+lvmdbusd
+lvmdb.py
+lvm_shell_proxy.py
--- a/daemons/lvmdbusd/Makefile.in
+++ b/daemons/lvmdbusd/Makefile.in
@@ -26,9 +26,7 @@ LVMDBUS_SRCDIR_FILES = \
 	__init__.py \
 	job.py \
 	loader.py \
-	lvmdb.py \
 	main.py \
-	lvm_shell_proxy.py \
 	lv.py \
 	manager.py \
 	objectmanager.py \
@@ -40,14 +38,19 @@ LVMDBUS_SRCDIR_FILES = \
 	vg.py

 LVMDBUS_BUILDDIR_FILES = \
+	lvmdb.py \
+	lvm_shell_proxy.py \
 	path.py

-LVMDBUSD = $(srcdir)/lvmdbusd
+LVMDBUSD = lvmdbusd

 include $(top_builddir)/make.tmpl

 .PHONY: install_lvmdbusd

+all:
+	test -x $(LVMDBUSD) || chmod 755 $(LVMDBUSD)
+
 install_lvmdbusd:
 	$(INSTALL_DIR) $(sbindir)
 	$(INSTALL_SCRIPT) $(LVMDBUSD) $(sbindir)
@@ -63,4 +66,5 @@ install_lvm2: install_lvmdbusd
 install: install_lvm2

 DISTCLEAN_TARGETS+= \
-	$(LVMDBUS_BUILDDIR_FILES)
+	$(LVMDBUS_BUILDDIR_FILES) \
+	$(LVMDBUSD)
--- a/daemons/lvmdbusd/automatedproperties.py
+++ b/daemons/lvmdbusd/automatedproperties.py
@@ -100,7 +100,7 @@ class AutomatedProperties(dbus.service.Object):
 		raise dbus.exceptions.DBusException(
 			obj._ap_interface,
 			'The object %s does not implement the %s interface'
-			% (self.__class__, interface_name))
+			% (obj.__class__, interface_name))

 	@dbus.service.method(dbus_interface=dbus.PROPERTIES_IFACE,
 							in_signature='s', out_signature='a{sv}',
--- a/daemons/lvmdbusd/background.py
+++ b/daemons/lvmdbusd/background.py
@@ -9,11 +9,13 @@

 import subprocess
 from . import cfg
-from .cmdhandler import options_to_cli_args
+from .cmdhandler import options_to_cli_args, LvmExecutionMeta
 import dbus
-from .utils import pv_range_append, pv_dest_ranges, log_error, log_debug
+from .utils import pv_range_append, pv_dest_ranges, log_error, log_debug,\
+	add_no_notify
 import os
 import threading
+import time


 def pv_move_lv_cmd(move_options, lv_full_name,
@@ -42,6 +44,15 @@ def _move_merge(interface_name, command, job_state):
 	# the command always as we will be getting periodic output from them on
 	# the status of the long running operation.
 	command.insert(0, cfg.LVM_CMD)
+
+	# Instruct lvm to not register an event with us
+	command = add_no_notify(command)
+
+	#(self, start, ended, cmd, ec, stdout_txt, stderr_txt)
+	meta = LvmExecutionMeta(time.time(), 0, command, -1000, None, None)
+
+	cfg.blackbox.add(meta)
+
 	process = subprocess.Popen(command, stdout=subprocess.PIPE,
 								env=os.environ,
 								stderr=subprocess.PIPE, close_fds=True)
@@ -59,12 +70,21 @@ def _move_merge(interface_name, command, job_state):
 				(device, ignore, percentage) = line_str.split(':')
 				job_state.Percent = round(
 					float(percentage.strip()[:-1]), 1)
+
+				# While the move is in progress we need to periodically update
+				# the state to reflect where everything is at.
+				cfg.load()
 		except ValueError:
 			log_error("Trying to parse percentage which failed for %s" %
 				line_str)

 	out = process.communicate()

+	with meta.lock:
+		meta.ended = time.time()
+		meta.ec = process.returncode
+		meta.stderr_txt = out[1]
+
 	if process.returncode == 0:
 		job_state.Percent = 100
 	else:
@@ -138,5 +158,6 @@ def _run_cmd(req):


 def cmd_runner(request):
-	t = threading.Thread(target=_run_cmd, args=(request,))
+	t = threading.Thread(target=_run_cmd, args=(request,),
+							name="cmd_runner %s" % str(request.method))
 	t.start()
--- a/daemons/lvmdbusd/cfg.py
+++ b/daemons/lvmdbusd/cfg.py
@@ -26,7 +26,7 @@ bus = None
 args = None

 # Set to true if we are depending on external events for updates
-ee = False
+got_external_event = False

 # Shared state variable across all processes
 run = multiprocessing.Value('i', 1)
--- a/daemons/lvmdbusd/cmdhandler.py
+++ b/daemons/lvmdbusd/cmdhandler.py
@@ -37,6 +37,7 @@ cmd_lock = threading.RLock()
 class LvmExecutionMeta(object):

 	def __init__(self, start, ended, cmd, ec, stdout_txt, stderr_txt):
+		self.lock = threading.RLock()
 		self.start = start
 		self.ended = ended
 		self.cmd = cmd
@@ -45,12 +46,13 @@ class LvmExecutionMeta(object):
 		self.stderr_txt = stderr_txt

 	def __str__(self):
-		return "EC= %d for %s\n" \
-			"STARTED: %f, ENDED: %f\n" \
-			"STDOUT=%s\n" \
-			"STDERR=%s\n" % \
-			(self.ec, str(self.cmd), self.start, self.ended, self.stdout_txt,
-			self.stderr_txt)
+		with self.lock:
+			return "EC= %d for %s\n" \
+				"STARTED: %f, ENDED: %f\n" \
+				"STDOUT=%s\n" \
+				"STDERR=%s\n" % \
+				(self.ec, str(self.cmd), self.start, self.ended, self.stdout_txt,
+				self.stderr_txt)


 class LvmFlightRecorder(object):
@@ -279,7 +281,7 @@ def vg_lv_create(vg_name, create_options, name, size_bytes, pv_dests):
 	cmd = ['lvcreate']
 	cmd.extend(options_to_cli_args(create_options))
 	cmd.extend(['--size', str(size_bytes) + 'B'])
-	cmd.extend(['--name', name, vg_name])
+	cmd.extend(['--name', name, vg_name, '--yes'])
 	pv_dest_ranges(cmd, pv_dests)
 	return call(cmd)

@@ -304,6 +306,8 @@ def _vg_lv_create_common_cmd(create_options, size_bytes, thin_pool):
 		cmd.extend(['--size', str(size_bytes) + 'B'])
 	else:
 		cmd.extend(['--thin', '--size', str(size_bytes) + 'B'])
+
+	cmd.extend(['--yes'])
 	return cmd


@@ -340,7 +344,7 @@ def _vg_lv_create_raid(vg_name, create_options, name, raid_type, size_bytes,
 	if stripe_size_kb != 0:
 		cmd.extend(['--stripesize', str(stripe_size_kb)])

-	cmd.extend(['--name', name, vg_name])
+	cmd.extend(['--name', name, vg_name, '--yes'])
 	return call(cmd)


@@ -361,7 +365,7 @@ def vg_lv_create_mirror(
 	cmd.extend(['--type', 'mirror'])
 	cmd.extend(['--mirrors', str(num_copies)])
 	cmd.extend(['--size', str(size_bytes) + 'B'])
-	cmd.extend(['--name', name, vg_name])
+	cmd.extend(['--name', name, vg_name, '--yes'])
 	return call(cmd)


@@ -415,7 +419,7 @@ def lv_lv_create(lv_full_name, create_options, name, size_bytes):
 	cmd = ['lvcreate']
 	cmd.extend(options_to_cli_args(create_options))
 	cmd.extend(['--virtualsize', str(size_bytes) + 'B', '-T'])
-	cmd.extend(['--name', name, lv_full_name])
+	cmd.extend(['--name', name, lv_full_name, '--yes'])
 	return call(cmd)


@@ -551,7 +555,7 @@ def pv_resize(device, size_bytes, create_options):
 	cmd.extend(options_to_cli_args(create_options))

 	if size_bytes != 0:
-		cmd.extend(['--setphysicalvolumesize', str(size_bytes) + 'B'])
+		cmd.extend(['--yes', '--setphysicalvolumesize', str(size_bytes) + 'B'])

 	cmd.extend([device])
 	return call(cmd)
@@ -616,10 +620,10 @@ def vg_reduce(vg_name, missing, pv_devices, reduce_options):
 	cmd = ['vgreduce']
 	cmd.extend(options_to_cli_args(reduce_options))

-	if len(pv_devices) == 0:
-		cmd.append('--all')
 	if missing:
 		cmd.append('--removemissing')
+	elif len(pv_devices) == 0:
+		cmd.append('--all')

 	cmd.append(vg_name)
 	cmd.extend(pv_devices)
--- a/daemons/lvmdbusd/fetch.py
+++ b/daemons/lvmdbusd/fetch.py
@@ -82,10 +82,10 @@ class StateUpdate(object):

 	@staticmethod
 	def update_thread(obj):
+		queued_requests = []
 		while cfg.run.value != 0:
 			# noinspection PyBroadException
 			try:
-				queued_requests = []
 				refresh = True
 				emit_signal = True
 				cache_refresh = True
@@ -96,7 +96,7 @@ class StateUpdate(object):
 					wait = not obj.deferred
 					obj.deferred = False

-				if wait:
+				if len(queued_requests) == 0 and wait:
 					queued_requests.append(obj.queue.get(True, 2))

 				# Ok we have one or the deferred queue has some,
@@ -131,11 +131,17 @@ class StateUpdate(object):
 				for i in queued_requests:
 					i.set_result(num_changes)

+				# Only clear out the requests after we have given them a result
+				# otherwise we can orphan the waiting threads and they never
+				# wake up if we get an exception
+				queued_requests = []
+
 			except queue.Empty:
 				pass
 			except Exception:
 				st = traceback.format_exc()
 				log_error("update_thread exception: \n%s" % st)
+				cfg.blackbox.dump()

 	def __init__(self):
 		self.lock = threading.RLock()
@@ -146,7 +152,8 @@ class StateUpdate(object):
 		load(refresh=False, emit_signal=False, need_main_thread=False)

 		self.thread = threading.Thread(target=StateUpdate.update_thread,
-										args=(self,))
+										args=(self,),
+										name="StateUpdate.update_thread")

 	def load(self, refresh=True, emit_signal=True, cache_refresh=True,
 					log=True, need_main_thread=True):
--- a/daemons/lvmdbusd/job.py
+++ b/daemons/lvmdbusd/job.py
@@ -8,7 +8,7 @@
 # along with this program. If not, see <http://www.gnu.org/licenses/>.

 from .automatedproperties import AutomatedProperties
-from .utils import job_obj_path_generate, mt_async_result, mt_run_no_wait
+from .utils import job_obj_path_generate, mt_async_call
 from . import cfg
 from .cfg import JOB_INTERFACE
 import dbus
@@ -30,7 +30,7 @@ class WaitingClient(object):
 				# Remove ourselves from waiting client
 				wc.job_state.remove_waiting_client(wc)
 				wc.timer_id = -1
-				mt_async_result(wc.cb, wc.job_state.Complete)
+				mt_async_call(wc.cb, wc.job_state.Complete)
 				wc.job_state = None

 	def __init__(self, job_state, tmo, cb, cbe):
@@ -55,7 +55,7 @@ class WaitingClient(object):
 					GLib.source_remove(self.timer_id)
 					self.timer_id = -1

-				mt_async_result(self.cb, self.job_state.Complete)
+				mt_async_call(self.cb, self.job_state.Complete)
 				self.job_state = None


@@ -188,7 +188,7 @@ class Job(AutomatedProperties):
 	@Complete.setter
 	def Complete(self, value):
 		self.state.Complete = value
-		mt_run_no_wait(Job._signal_complete, self)
+		mt_async_call(Job._signal_complete, self)

 	@property
 	def GetError(self):
--- a/daemons/lvmdbusd/lv.py
+++ b/daemons/lvmdbusd/lv.py
@@ -232,7 +232,6 @@ class LvState(State):
@utils.dbus_property(LV_COMMON_INTERFACE, 'Attr', 's')
@utils.dbus_property(LV_COMMON_INTERFACE, 'DataPercent', 'u')
@utils.dbus_property(LV_COMMON_INTERFACE, 'SnapPercent', 'u')
-@utils.dbus_property(LV_COMMON_INTERFACE, 'DataPercent', 'u')
@utils.dbus_property(LV_COMMON_INTERFACE, 'MetaDataPercent', 'u')
@utils.dbus_property(LV_COMMON_INTERFACE, 'CopyPercent', 'u')
@utils.dbus_property(LV_COMMON_INTERFACE, 'SyncPercent', 'u')
--- a/daemons/lvmdbusd/lvm_shell_proxy.py.in
+++ b/daemons/lvmdbusd/lvm_shell_proxy.py.in
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!@PYTHON3@

 # Copyright (C) 2015-2016 Red Hat, Inc. All rights reserved.
 #
--- a/daemons/lvmdbusd/lvmdb.py.in
+++ b/daemons/lvmdbusd/lvmdb.py.in
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!@PYTHON3@

 # Copyright (C) 2015-2016 Red Hat, Inc. All rights reserved.
 #
--- a/daemons/lvmdbusd/lvmdbusd.in
+++ b/daemons/lvmdbusd/lvmdbusd.in
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!@PYTHON3@

 # Copyright (C) 2015-2016 Red Hat, Inc. All rights reserved.
 #
--- a/daemons/lvmdbusd/main.py
+++ b/daemons/lvmdbusd/main.py
@@ -63,6 +63,24 @@ def check_bb_size(value):
 	return v


+def install_signal_handlers():
+	# Because of the glib main loop stuff the python signal handler code is
+	# apparently not usable and we need to use the glib calls instead
+	signal_add = None
+
+	if hasattr(GLib, 'unix_signal_add'):
+		signal_add = GLib.unix_signal_add
+	elif hasattr(GLib, 'unix_signal_add_full'):
+		signal_add = GLib.unix_signal_add_full
+
+	if signal_add:
+		signal_add(GLib.PRIORITY_HIGH, signal.SIGHUP, utils.handler, signal.SIGHUP)
+		signal_add(GLib.PRIORITY_HIGH, signal.SIGINT, utils.handler, signal.SIGINT)
+		signal_add(GLib.PRIORITY_HIGH, signal.SIGUSR1, utils.handler, signal.SIGUSR1)
+	else:
+		log_error("GLib.unix_signal_[add|add_full] are NOT available!")
+
+
 def main():
 	start = time.time()
 	# Add simple command line handling
@@ -112,12 +130,7 @@ def main():
 	# List of threads that we start up
 	thread_list = []

-	# Install signal handlers
-	for s in [signal.SIGHUP, signal.SIGINT]:
-		try:
-			signal.signal(s, utils.handler)
-		except RuntimeError:
-			pass
+	install_signal_handlers()

 	dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
 	dbus.mainloop.glib.threads_init()
@@ -138,7 +151,8 @@ def main():

 	# Using a thread to process requests, we cannot hang the dbus library
 	# thread that is handling the dbus interface
-	thread_list.append(threading.Thread(target=process_request))
+	thread_list.append(threading.Thread(target=process_request,
+										name='process_request'))

 	# Have a single thread handling updating lvm and the dbus model so we
 	# don't have multiple threads doing this as the same time
@@ -176,5 +190,7 @@ def main():
 			for thread in thread_list:
 				thread.join()
 	except KeyboardInterrupt:
-		utils.handler(signal.SIGINT, None)
+		# If we are unable to register signal handler, we will end up here when
+		# the service gets a ^C or a kill -2 <parent pid>
+		utils.handler(signal.SIGINT)
 	return 0
--- a/daemons/lvmdbusd/manager.py
+++ b/daemons/lvmdbusd/manager.py
@@ -6,7 +6,6 @@
 #
 # You should have received a copy of the GNU General Public License
 # along with this program. If not, see <http://www.gnu.org/licenses/>.
-from utils import log_debug
 from .automatedproperties import AutomatedProperties

 from . import utils
@@ -48,7 +47,7 @@ class Manager(AutomatedProperties):
 		pv = cfg.om.get_object_path_by_uuid_lvm_id(device, device)
 		if pv:
 			raise dbus.exceptions.DBusException(
-				MANAGER_INTERFACE, "PV Already exists!")
+				MANAGER_INTERFACE, "PV %s Already exists!" % device)

 		rc, out, err = cmdhandler.pv_create(create_options, [device])
 		Manager.handle_execute(rc, out, err)
@@ -145,7 +144,7 @@ class Manager(AutomatedProperties):
 		p = cfg.om.get_object_path_by_uuid_lvm_id(key, key)
 		if not p:
 			p = '/'
-		log_debug('LookUpByLvmId: key = %s, result = %s' % (key, p))
+		utils.log_debug('LookUpByLvmId: key = %s, result = %s' % (key, p))
 		return p

 	@dbus.service.method(
@@ -206,7 +205,7 @@ class Manager(AutomatedProperties):
 				utils.log_debug("ExternalEvent received, disabling "
 								"udev monitoring")
 				# We are dependent on external events now to stay current!
-				cfg.ee = True
+				cfg.got_external_event = True

 		r = RequestEntry(
 			-1, Manager._external_event, (command,), None, None, False)
--- a/daemons/lvmdbusd/objectmanager.py
+++ b/daemons/lvmdbusd/objectmanager.py
@@ -223,8 +223,9 @@ class ObjectManager(AutomatedProperties):
 		:param lvm_id: The lvm identifier
 		"""
 		with self.rlock:
-			if lvm_id in self._id_to_object_path:
-				return self.get_object_by_path(self._id_to_object_path[lvm_id])
+			lookup_rc = self._id_lookup(lvm_id)
+			if lookup_rc:
+				return self.get_object_by_path(lookup_rc)
 			return None

 	def get_object_path_by_lvm_id(self, lvm_id):
@@ -234,8 +235,9 @@ class ObjectManager(AutomatedProperties):
 		:return: Object path or '/' if not found
 		"""
 		with self.rlock:
-			if lvm_id in self._id_to_object_path:
-				return self._id_to_object_path[lvm_id]
+			lookup_rc = self._id_lookup(lvm_id)
+			if lookup_rc:
+				return lookup_rc
 			return '/'

 	def _uuid_verify(self, path, uuid, lvm_id):
--- a/daemons/lvmdbusd/pv.py
+++ b/daemons/lvmdbusd/pv.py
@@ -79,7 +79,9 @@ class PvState(State):

 		self.lv = self._lv_object_list(vg_name)

-		if vg_name:
+		# It's possible to have a vg_name and no uuid with the main example
+		# being when the vg_name == '[unknown]'
+		if vg_uuid and vg_name:
 			self.vg_path = cfg.om.get_object_path_by_uuid_lvm_id(
 				vg_uuid, vg_name, vg_obj_path_generate)
 		else:
--- a/daemons/lvmdbusd/request.py
+++ b/daemons/lvmdbusd/request.py
@@ -13,7 +13,7 @@ from gi.repository import GLib
 from .job import Job
 from . import cfg
 import traceback
-from .utils import log_error, mt_async_result
+from .utils import log_error, mt_async_call


 class RequestEntry(object):
@@ -116,9 +116,9 @@ class RequestEntry(object):
 				if error_rc == 0:
 					if self.cb:
 						if self._return_tuple:
-							mt_async_result(self.cb, (result, '/'))
+							mt_async_call(self.cb, (result, '/'))
 						else:
-							mt_async_result(self.cb, result)
+							mt_async_call(self.cb, result)
 				else:
 					if self.cb_error:
 						if not error_exception:
@@ -129,7 +129,7 @@ class RequestEntry(object):
 							else:
 								error_exception = Exception(error_msg)

-						mt_async_result(self.cb_error, error_exception)
+						mt_async_call(self.cb_error, error_exception)
 			else:
 				# We have a job and it's complete, indicate that it's done.
 				self._job.Complete = True
--- a/daemons/lvmdbusd/udevwatch.py
+++ b/daemons/lvmdbusd/udevwatch.py
@@ -16,9 +16,33 @@ from . import utils
 observer = None
 observer_lock = threading.RLock()

+_udev_lock = threading.RLock()
+_udev_count = 0
+
+
+def udev_add():
+	global _udev_count
+	with _udev_lock:
+		if _udev_count == 0:
+			_udev_count += 1
+
+			# Place this on the queue so any other operations will sequence
+			# behind it
+			r = RequestEntry(
+				-1, _udev_event, (), None, None, False)
+			cfg.worker_q.put(r)
+
+
+def udev_complete():
+	global _udev_count
+	with _udev_lock:
+		if _udev_count > 0:
+			_udev_count -= 1
+

 def _udev_event():
 	utils.log_debug("Processing udev event")
+	udev_complete()
 	cfg.load()


@@ -44,10 +68,7 @@ def filter_event(action, device):
 		refresh = True

 	if refresh:
-		# Place this on the queue so any other operations will sequence behind it
-		r = RequestEntry(
-			-1, _udev_event, (), None, None, False)
-		cfg.worker_q.put(r)
+		udev_add()


 def add():
--- a/daemons/lvmdbusd/utils.py
+++ b/daemons/lvmdbusd/utils.py
@@ -20,7 +20,8 @@ from lvmdbusd import cfg
 # noinspection PyUnresolvedReferences
 from gi.repository import GLib
 import threading
-
+import traceback
+import signal

 STDOUT_TTY = os.isatty(sys.stdout.fileno())

@@ -281,12 +282,47 @@ def log_error(msg, *attributes):
 	_common_log(msg, *attributes)


+def dump_threads_stackframe():
+	ident_to_name = {}
+
+	for thread_object in threading.enumerate():
+		ident_to_name[thread_object.ident] = thread_object
+
+	stacks = []
+	for thread_ident, frame in sys._current_frames().items():
+		stack = traceback.format_list(traceback.extract_stack(frame))
+
+		# There is a possibility that a thread gets created after we have
+		# enumerated all threads, so this lookup table may be incomplete, so
+		# account for this
+		if thread_ident in ident_to_name:
+			thread_name = ident_to_name[thread_ident].name
+		else:
+			thread_name = "unknown"
+
+		stacks.append("Thread: %s" % (thread_name))
+		stacks.append("".join(stack))
+
+	log_error("Dumping thread stack frames!\n" + "\n".join(stacks))
+
+
 # noinspection PyUnusedLocal
-def handler(signum, frame):
-	cfg.run.value = 0
-	log_debug('Signal handler called with signal %d' % signum)
-	if cfg.loop is not None:
-		cfg.loop.quit()
+def handler(signum):
+	try:
+		if signum == signal.SIGUSR1:
+			dump_threads_stackframe()
+		else:
+			cfg.run.value = 0
+			log_debug('Exiting daemon with signal %d' % signum)
+			if cfg.loop is not None:
+				cfg.loop.quit()
+	except:
+		st = traceback.format_exc()
+		log_error("signal handler: exception (logged, not reported!) \n %s" % st)
+
+	# It's important we report that we handled the exception for the exception
+	# handler to continue to work, especially for signal 10 (SIGUSR1)
+	return True


 def pv_obj_path_generate():
@@ -510,16 +546,21 @@ def add_no_notify(cmdline):
 	:rtype: list
 	"""

-	if 'help' in cmdline:
-		return cmdline
+	# Only after we have seen an external event will be disable lvm from sending
+	# us one when we call lvm
+	if cfg.got_external_event:
+		if 'help' in cmdline:
+			return cmdline

-	if '--config' in cmdline:
-		for i, arg in enumerate(cmdline):
-			if arg == '--config':
-				cmdline[i] += "global/notify_dbus=0"
-				break
-	else:
-		cmdline.extend(['--config', 'global/notify_dbus=0'])
+		if '--config' in cmdline:
+			for i, arg in enumerate(cmdline):
+				if arg == '--config':
+					if len(cmdline) <= i+1:
+						raise dbus.exceptions.DBusException("Missing value for --config option.")
+					cmdline[i+1] += " global/notify_dbus=0"
+					break
+		else:
+			cmdline.extend(['--config', 'global/notify_dbus=0'])
 	return cmdline


@@ -529,21 +570,27 @@ def add_no_notify(cmdline):
 # ensure all dbus library interaction is done from the same thread!


-def _async_result(call_back, results):
-	log_debug('Results = %s' % str(results))
-	call_back(results)
+def _async_handler(call_back, parameters):
+	params_str = ", ".join(str(x) for x in parameters)
+	log_debug('Main thread execution, callback = %s, parameters = (%s)' %
+				(str(call_back), params_str))
+
+	try:
+		if parameters:
+			call_back(*parameters)
+		else:
+			call_back()
+	except:
+		st = traceback.format_exc()
+		log_error("mt_async_call: exception (logged, not reported!) \n %s" % st)


-# Return result in main thread
-def mt_async_result(call_back, results):
-	GLib.idle_add(_async_result, call_back, results)
+# Execute the function on the main thread with the provided parameters, do
+# not return *any* value or wait for the execution to complete!
+def mt_async_call(function_call_back, *parameters):
+	GLib.idle_add(_async_handler, function_call_back, parameters)


-# Take the supplied function and run it on the main thread and not wait for
-# a result!
-def mt_run_no_wait(function, param):
-	GLib.idle_add(function, param)
-
 # Run the supplied function and arguments on the main thread and wait for them
 # to complete while allowing the ability to get the return value too.
 #
@@ -563,6 +610,7 @@ class MThreadRunner(object):
 	def __init__(self, function, *args):
 		self.f = function
 		self.rc = None
+		self.exception = None
 		self.args = args
 		self.function_complete = False
 		self.cond = threading.Condition(threading.Lock())
@@ -572,13 +620,21 @@ class MThreadRunner(object):
 		with self.cond:
 			if not self.function_complete:
 				self.cond.wait()
+		if self.exception:
+			raise self.exception
 		return self.rc

 	def _run(self):
-		if len(self.args):
-			self.rc = self.f(*self.args)
-		else:
-			self.rc = self.f()
+		try:
+			if self.args:
+				self.rc = self.f(*self.args)
+			else:
+				self.rc = self.f()
+		except BaseException as be:
+			self.exception = be
+			st = traceback.format_exc()
+			log_error("MThreadRunner: exception \n %s" % st)
+			log_error("Exception will be raised in calling thread!")


 def _remove_objects(dbus_objects_rm):
--- a/daemons/lvmetad/Makefile.in
+++ b/daemons/lvmetad/Makefile.in
@@ -16,7 +16,7 @@ top_srcdir = @top_srcdir@
 top_builddir = @top_builddir@

 SOURCES = lvmetad-core.c
-SOURCES2 = testclient.c
+SOURCES2 = lvmetactl.c

 TARGETS = lvmetad lvmetactl

@@ -28,22 +28,19 @@ CFLOW_TARGET = lvmetad

 include $(top_builddir)/make.tmpl

+CFLAGS_lvmetactl.o += $(EXTRA_EXEC_CFLAGS)
+CFLAGS_lvmetad-core.o += $(EXTRA_EXEC_CFLAGS)
 INCLUDES += -I$(top_srcdir)/libdaemon/server
-LVMLIBS = -ldaemonserver $(LVMINTERNAL_LIBS) -ldevmapper
-
-LIBS += $(PTHREAD_LIBS)
-
-LDFLAGS += -L$(top_builddir)/libdaemon/server $(EXTRA_EXEC_LDFLAGS)
-CLDFLAGS += -L$(top_builddir)/libdaemon/server
-CFLAGS += $(EXTRA_EXEC_CFLAGS)
+LDFLAGS += -L$(top_builddir)/libdaemon/server $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS)
+LIBS += $(RT_LIBS) $(DAEMON_LIBS) -ldevmapper $(PTHREAD_LIBS)

 lvmetad: $(OBJECTS) $(top_builddir)/libdaemon/client/libdaemonclient.a \
 		    $(top_builddir)/libdaemon/server/libdaemonserver.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) $(LVMLIBS) $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) -ldaemonserver $(LIBS)

 lvmetactl: lvmetactl.o $(top_builddir)/libdaemon/client/libdaemonclient.a \
 	$(top_builddir)/libdaemon/server/libdaemonserver.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ lvmetactl.o $(LVMLIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ lvmetactl.o $(LIBS)

 CLEAN_TARGETS += lvmetactl.o

--- a/daemons/lvmetad/lvmetad-client.h
+++ b/daemons/lvmetad/lvmetad-client.h
@@ -25,6 +25,7 @@
 #define LVMETAD_DISABLE_REASON_LVM1		"LVM1"
 #define LVMETAD_DISABLE_REASON_DUPLICATES	"DUPLICATES"
 #define LVMETAD_DISABLE_REASON_VGRESTORE	"VGRESTORE"
+#define LVMETAD_DISABLE_REASON_REPAIR		"REPAIR"

 struct volume_group;

--- a/daemons/lvmetad/lvmetad-core.c
+++ b/daemons/lvmetad/lvmetad-core.c
@@ -203,8 +203,9 @@ struct vg_info {
 #define GLFL_DISABLE_REASON_LVM1       0x00000008
 #define GLFL_DISABLE_REASON_DUPLICATES 0x00000010
 #define GLFL_DISABLE_REASON_VGRESTORE  0x00000020
+#define GLFL_DISABLE_REASON_REPAIR     0x00000040

-#define GLFL_DISABLE_REASON_ALL (GLFL_DISABLE_REASON_DIRECT | GLFL_DISABLE_REASON_LVM1 | GLFL_DISABLE_REASON_DUPLICATES | GLFL_DISABLE_REASON_VGRESTORE)
+#define GLFL_DISABLE_REASON_ALL (GLFL_DISABLE_REASON_DIRECT | GLFL_DISABLE_REASON_REPAIR | GLFL_DISABLE_REASON_LVM1 | GLFL_DISABLE_REASON_DUPLICATES | GLFL_DISABLE_REASON_VGRESTORE)

 #define VGFL_INVALID 0x00000001

@@ -257,6 +258,21 @@ static void destroy_metadata_hashes(lvmetad_state *s)
 	dm_hash_iterate(n, s->pvid_to_pvmeta)
 		dm_config_destroy(dm_hash_get_data(s->pvid_to_pvmeta, n));

+	dm_hash_iterate(n, s->vgid_to_vgname)
+		dm_free(dm_hash_get_data(s->vgid_to_vgname, n));
+
+	dm_hash_iterate(n, s->vgname_to_vgid)
+		dm_free(dm_hash_get_data(s->vgname_to_vgid, n));
+
+	dm_hash_iterate(n, s->vgid_to_info)
+		dm_free(dm_hash_get_data(s->vgid_to_info, n));
+
+	dm_hash_iterate(n, s->device_to_pvid)
+		dm_free(dm_hash_get_data(s->device_to_pvid, n));
+
+	dm_hash_iterate(n, s->pvid_to_vgid)
+		dm_free(dm_hash_get_data(s->pvid_to_vgid, n));
+
 	dm_hash_destroy(s->pvid_to_pvmeta);
 	dm_hash_destroy(s->vgid_to_metadata);
 	dm_hash_destroy(s->vgid_to_vgname);
@@ -792,7 +808,8 @@ static int _update_pvid_to_vgid(lvmetad_state *s, struct dm_config_tree *vg,

 		if ((mode == REMOVE_EMPTY) && vgid_old) {
 			/* This copies the vgid_old string, doesn't reference it. */
-			if (!dm_hash_insert(to_check, vgid_old, (void*) 1)) {
+			if ((dm_hash_lookup(to_check, vgid_old) != (void*) 1) &&
+			    !dm_hash_insert(to_check, vgid_old, (void*) 1)) {
 				ERROR(s, "update_pvid_to_vgid out of memory for hash insert vgid_old %s", vgid_old);
 				goto abort_daemon;
 			}
@@ -868,16 +885,13 @@ static int remove_metadata(lvmetad_state *s, const char *vgid, int update_pvids)

 	/* free the unmapped data */

-	if (info_lookup)
-		dm_free(info_lookup);
 	if (meta_lookup)
 		dm_config_destroy(meta_lookup);
-	if (name_lookup)
-		dm_free(name_lookup);
 	if (outdated_pvs_lookup)
 		dm_config_destroy(outdated_pvs_lookup);
-	if (vgid_lookup)
-		dm_free(vgid_lookup);
+	dm_free(info_lookup);
+	dm_free(name_lookup);
+	dm_free(vgid_lookup);
 	return 1;
 }

@@ -1204,10 +1218,8 @@ static int _update_metadata_add_new(lvmetad_state *s, const char *new_name, cons
 out:
 out_free:
 	if (!new_name_dup || !new_vgid_dup || abort_daemon) {
-		if (new_name_dup)
-			dm_free(new_name_dup);
-		if (new_vgid_dup)
-			dm_free(new_vgid_dup);
+		dm_free(new_name_dup);
+		dm_free(new_vgid_dup);
 		ERROR(s, "lvmetad could not be updated and is aborting.");
 		exit(EXIT_FAILURE);
 	}
@@ -1797,8 +1809,7 @@ static response pv_gone(lvmetad_state *s, request r)
 	}

 	dm_config_destroy(pvmeta);
-	if (old_pvid)
-		dm_free(old_pvid);
+	dm_free(old_pvid);

 	return daemon_reply_simple("OK", NULL );
 }
@@ -1911,7 +1922,7 @@ static response pv_found(lvmetad_state *s, request r)
 	const char *arg_pvid = NULL;
 	const char *arg_pvid_lookup = NULL;
 	const char *new_pvid = NULL;
-	const char *new_pvid_dup = NULL;
+	char *new_pvid_dup = NULL;
 	const char *arg_name = NULL;
 	const char *arg_vgid = NULL;
 	const char *arg_vgid_lookup = NULL;
@@ -2074,7 +2085,7 @@ static response pv_found(lvmetad_state *s, request r)
 		if (!(new_pvid_dup = dm_strdup(new_pvid)))
 			goto nomem_free1;

-		if (!dm_hash_insert_binary(s->device_to_pvid, &new_device, sizeof(new_device), (char *)new_pvid_dup))
+		if (!dm_hash_insert_binary(s->device_to_pvid, &new_device, sizeof(new_device), new_pvid_dup))
 			goto nomem_free2;

 		if (!dm_hash_insert(s->pvid_to_pvmeta, new_pvid, new_pvmeta))
@@ -2110,6 +2121,8 @@ static response pv_found(lvmetad_state *s, request r)
 				DEBUGLOG(s, "pv_found ignore duplicate device %" PRIu64 " of existing PV for pvid %s",
 				         arg_device, arg_pvid);
 				dm_config_destroy(new_pvmeta);
+				/* device_to_pvid no longer references prev_pvid_lookup */
+				dm_free((void*)prev_pvid_on_dev);
 				s->flags |= GLFL_DISABLE;
 				s->flags |= GLFL_DISABLE_REASON_DUPLICATES;
 				return reply_fail("Ignore duplicate PV");
@@ -2120,7 +2133,7 @@ static response pv_found(lvmetad_state *s, request r)
 		if (!(new_pvid_dup = dm_strdup(new_pvid)))
 			goto nomem_free1;

-		if (!dm_hash_insert_binary(s->device_to_pvid, &arg_device, sizeof(arg_device), (char *)new_pvid_dup))
+		if (!dm_hash_insert_binary(s->device_to_pvid, &arg_device, sizeof(arg_device), new_pvid_dup))
 			goto nomem_free2;

 		if (!dm_hash_insert(s->pvid_to_pvmeta, new_pvid, new_pvmeta))
@@ -2220,8 +2233,7 @@ static response pv_found(lvmetad_state *s, request r)
 	}

 	/* This was unhashed from device_to_pvid above. */
-	if (prev_pvid_on_dev)
-		dm_free((void *)prev_pvid_on_dev);
+	dm_free((void *)prev_pvid_on_dev);

 	return daemon_reply_simple("OK",
 				   "status = %s", vg_status,
@@ -2233,7 +2245,7 @@ static response pv_found(lvmetad_state *s, request r)
 				   NULL);

 nomem_free2:
-	dm_free((char *)new_pvid_dup);
+	dm_free(new_pvid_dup);
 nomem_free1:
 	dm_config_destroy(new_pvmeta);
 nomem:
@@ -2355,6 +2367,8 @@ static response set_global_info(lvmetad_state *s, request r)
 	if ((reason = daemon_request_str(r, "disable_reason", NULL))) {
 		if (strstr(reason, LVMETAD_DISABLE_REASON_DIRECT))
 			reason_flags |= GLFL_DISABLE_REASON_DIRECT;
+		if (strstr(reason, LVMETAD_DISABLE_REASON_REPAIR))
+			reason_flags |= GLFL_DISABLE_REASON_REPAIR;
 		if (strstr(reason, LVMETAD_DISABLE_REASON_LVM1))
 			reason_flags |= GLFL_DISABLE_REASON_LVM1;
 		if (strstr(reason, LVMETAD_DISABLE_REASON_DUPLICATES))
@@ -2418,8 +2432,9 @@ static response get_global_info(lvmetad_state *s, request r)
 	pid = (int)daemon_request_int(r, "pid", 0);

 	if (s->flags & GLFL_DISABLE) {
-		snprintf(reason, REASON_BUF_SIZE - 1, "%s%s%s%s",
+		snprintf(reason, REASON_BUF_SIZE - 1, "%s%s%s%s%s",
 			 (s->flags & GLFL_DISABLE_REASON_DIRECT)     ? LVMETAD_DISABLE_REASON_DIRECT "," : "",
+			 (s->flags & GLFL_DISABLE_REASON_REPAIR)     ? LVMETAD_DISABLE_REASON_REPAIR "," : "",
 			 (s->flags & GLFL_DISABLE_REASON_LVM1)       ? LVMETAD_DISABLE_REASON_LVM1 "," : "",
 			 (s->flags & GLFL_DISABLE_REASON_DUPLICATES) ? LVMETAD_DISABLE_REASON_DUPLICATES "," : "",
 			 (s->flags & GLFL_DISABLE_REASON_VGRESTORE)  ? LVMETAD_DISABLE_REASON_VGRESTORE "," : "");
@@ -2557,14 +2572,12 @@ static void _dump_pairs(struct buffer *buf, struct dm_hash_table *ht, const char
 	dm_hash_iterate(n, ht) {
 		const char *key = dm_hash_get_key(ht, n),
 			   *val = dm_hash_get_data(ht, n);
-		buffer_append(buf, "    ");
 		if (int_key)
-			(void) dm_asprintf(&append, "%d = \"%s\"", *(const int*)key, val);
+			(void) dm_asprintf(&append, "    %d = \"%s\"\n", *(const int*)key, val);
 		else
-			(void) dm_asprintf(&append, "%s = \"%s\"", key, val);
+			(void) dm_asprintf(&append, "    %s = \"%s\"\n", key, val);
 		if (append)
 			buffer_append(buf, append);
-		buffer_append(buf, "\n");
 		dm_free(append);
 	}
 	buffer_append(buf, "}\n");
@@ -2582,11 +2595,9 @@ static void _dump_info_version(struct buffer *buf, struct dm_hash_table *ht, con
 	while (n) {
 		const char *key = dm_hash_get_key(ht, n);
 		info = dm_hash_get_data(ht, n);
-		buffer_append(buf, "    ");
-		(void) dm_asprintf(&append, "%s = %lld", key, (long long)info->external_version);
+		(void) dm_asprintf(&append, "    %s = %lld\n", key, (long long)info->external_version);
 		if (append)
 			buffer_append(buf, append);
-		buffer_append(buf, "\n");
 		dm_free(append);
 		n = dm_hash_get_next(ht, n);
 	}
@@ -2605,11 +2616,9 @@ static void _dump_info_flags(struct buffer *buf, struct dm_hash_table *ht, const
 	while (n) {
 		const char *key = dm_hash_get_key(ht, n);
 		info = dm_hash_get_data(ht, n);
-		buffer_append(buf, "    ");
-		(void) dm_asprintf(&append, "%s = %llx", key, (long long)info->flags);
+		(void) dm_asprintf(&append, "    %s = %llx\n", key, (long long)info->flags);
 		if (append)
 			buffer_append(buf, append);
-		buffer_append(buf, "\n");
 		dm_free(append);
 		n = dm_hash_get_next(ht, n);
 	}
--- a/daemons/lvmlockd/Makefile.in
+++ b/daemons/lvmlockd/Makefile.in
@@ -19,10 +19,12 @@ SOURCES = lvmlockd-core.c

 ifeq ("@BUILD_LOCKDSANLOCK@", "yes")
  SOURCES += lvmlockd-sanlock.c
+  LOCK_LIBS += -lsanlock_client
 endif

 ifeq ("@BUILD_LOCKDDLM@", "yes")
  SOURCES += lvmlockd-dlm.c
+  LOCK_LIBS += -ldlm_lt
 endif

 TARGETS = lvmlockd lvmlockctl
@@ -31,29 +33,17 @@ TARGETS = lvmlockd lvmlockctl

 include $(top_builddir)/make.tmpl

+CFLAGS += $(EXTRA_EXEC_CFLAGS)
 INCLUDES += -I$(top_srcdir)/libdaemon/server
-LVMLIBS = -ldaemonserver $(LVMINTERNAL_LIBS) -ldevmapper
-
-LIBS += $(PTHREAD_LIBS)
-
-ifeq ("@BUILD_LOCKDSANLOCK@", "yes")
-  LIBS += -lsanlock_client
-endif
-
-ifeq ("@BUILD_LOCKDDLM@", "yes")
-  LIBS += -ldlm_lt
-endif
-
-LDFLAGS += -L$(top_builddir)/libdaemon/server
-CLDFLAGS += -L$(top_builddir)/libdaemon/server
+LDFLAGS += -L$(top_builddir)/libdaemon/server $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS)
+LIBS += $(RT_LIBS) $(DAEMON_LIBS) -ldevmapper $(PTHREAD_LIBS)

 lvmlockd: $(OBJECTS) $(top_builddir)/libdaemon/client/libdaemonclient.a \
 		    $(top_builddir)/libdaemon/server/libdaemonserver.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) $(LVMLIBS) $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) $(LOCK_LIBS) -ldaemonserver $(LIBS)

-lvmlockctl: lvmlockctl.o $(top_builddir)/libdaemon/client/libdaemonclient.a \
-		    $(top_builddir)/libdaemon/server/libdaemonserver.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ lvmlockctl.o $(LVMLIBS)
+lvmlockctl: lvmlockctl.o $(top_builddir)/libdaemon/client/libdaemonclient.a
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ lvmlockctl.o $(LIBS)

 install_lvmlockd: lvmlockd
 	$(INSTALL_PROGRAM) -D $< $(sbindir)/$(<F)
--- a/daemons/lvmlockd/lvmlockctl.c
+++ b/daemons/lvmlockd/lvmlockctl.c
@@ -379,7 +379,7 @@ static int setup_dump_socket(void)
 	rv = bind(s, (struct sockaddr *) &dump_addr, dump_addrlen);
 	if (rv < 0) {
 		rv = -errno;
-		if (!close(s))
+		if (close(s))
 			log_error("failed to close dump socket");
 		return rv;
 	}
--- a/daemons/lvmlockd/lvmlockd-client.h
+++ b/daemons/lvmlockd/lvmlockd-client.h
@@ -48,5 +48,7 @@ static inline void lvmlockd_close(daemon_handle h)
 #define EVGKILLED 217 /* sanlock lost access to leases and VG is killed. */
 #define ELOCKIO   218 /* sanlock io errors during lock op, may be transient. */
 #define EREMOVED  219
+#define EDEVOPEN  220 /* sanlock failed to open lvmlock LV */
+#define ELMERR    221

 #endif	/* _LVM_LVMLOCKD_CLIENT_H */
--- a/daemons/lvmlockd/lvmlockd-core.c
+++ b/daemons/lvmlockd/lvmlockd-core.c
@@ -19,6 +19,7 @@
 #include "lvm-version.h"
 #include "lvmetad-client.h"
 #include "lvmlockd-client.h"
+#include "dm-ioctl.h" /* for DM_UUID_LEN */

 /* #include <assert.h> */
 #include <errno.h>
@@ -1388,12 +1389,11 @@ static int res_convert(struct lockspace *ls, struct resource *r,
 	}

 	rv = lm_convert(ls, r, act->mode, act, r_version);
-	if (rv < 0) {
-		log_error("S %s R %s res_convert lm error %d", ls->name, r->name, rv);
-		return rv;
-	}

-	log_debug("S %s R %s res_convert lm done", ls->name, r->name);
+	log_debug("S %s R %s res_convert rv %d", ls->name, r->name, rv);
+
+	if (rv < 0)
+		return rv;

 	if (lk->mode == LD_LK_EX && act->mode == LD_LK_SH) {
 		r->sh_count = 1;
@@ -2651,10 +2651,16 @@ out_act:
 	ls->drop_vg = drop_vg;
 	if (ls->lm_type == LD_LM_DLM && !strcmp(ls->name, gl_lsname_dlm))
 		global_dlm_lockspace_exists = 0;
-	/* Avoid a name collision of the same lockspace is added again before this thread is cleaned up. */
-	memset(tmp_name, 0, sizeof(tmp_name));
-	snprintf(tmp_name, MAX_NAME, "REM:%s", ls->name);
-	memcpy(ls->name, tmp_name, MAX_NAME);
+
+	/*
+	 * Avoid a name collision of the same lockspace is added again before
+	 * this thread is cleaned up.  We just set ls->name to a "junk" value
+	 * for the short period until the struct is freed.  We could make it
+	 * blank or fill it with garbage, but instead set it to REM:<name>
+	 * to make it easier to follow progress of freeing is via log_debug.
+	 */
+	dm_strncpy(tmp_name, ls->name, sizeof(tmp_name));
+	snprintf(ls->name, sizeof(ls->name), "REM:%s", tmp_name);
 	pthread_mutex_unlock(&lockspaces_mutex);

 	/* worker_thread will join this thread, and free the ls */
@@ -3303,7 +3309,6 @@ static int work_init_lv(struct action *act)
 		lm_type = ls->lm_type;
 		memcpy(vg_args, ls->vg_args, MAX_ARGS);
 		free_offset = ls->free_lock_offset;
-		ls->free_lock_offset = 0;
 	}
 	pthread_mutex_unlock(&lockspaces_mutex);

@@ -3533,11 +3538,15 @@ static int setup_worker_thread(void)

 static void close_worker_thread(void)
 {
+	int perrno;
+
 	pthread_mutex_lock(&worker_mutex);
 	worker_stop = 1;
 	pthread_cond_signal(&worker_cond);
 	pthread_mutex_unlock(&worker_mutex);
-	pthread_join(worker_thread, NULL);
+
+	if ((perrno = pthread_join(worker_thread, NULL)))
+		log_error("pthread_join worker_thread error %d", perrno);
 }

 /* client_mutex is locked */
@@ -3666,7 +3675,17 @@ static int client_send_result(struct client *cl, struct action *act)
 			if (!gl_lsname_dlm[0])
 				strcat(result_flags, "NO_GL_LS,");
 		} else {
-			strcat(result_flags, "NO_GL_LS,");
+			int found_lm = 0;
+
+			if (lm_support_dlm() && lm_is_running_dlm())
+				found_lm++;
+			if (lm_support_sanlock() && lm_is_running_sanlock())
+				found_lm++;
+
+			if (!found_lm)
+				strcat(result_flags, "NO_GL_LS,NO_LM");
+			else
+				strcat(result_flags, "NO_GL_LS");
 		}
 	}

@@ -3763,7 +3782,8 @@ static int client_send_result(struct client *cl, struct action *act)
 	if (dump_fd >= 0) {
 		/* To avoid deadlock, send data here after the reply. */
 		send_dump_buf(dump_fd, dump_len);
-		close(dump_fd);
+		if (close(dump_fd))
+			log_error("failed to close dump socket %d", dump_fd);
 	}

 	return rv;
@@ -3836,8 +3856,9 @@ static int add_lock_action(struct action *act)
 	pthread_mutex_lock(&lockspaces_mutex);
 	if (ls_name[0])
 		ls = find_lockspace_name(ls_name);
-	pthread_mutex_unlock(&lockspaces_mutex);
 	if (!ls) {
+		pthread_mutex_unlock(&lockspaces_mutex);
+
 		if (act->op == LD_OP_UPDATE && act->rt == LD_RT_VG) {
 			log_debug("lockspace \"%s\" not found ignored for vg update", ls_name);
 			return -ENOLS;
@@ -4754,8 +4775,8 @@ static void *client_thread_main(void *arg_in)
 			} else {
 				pthread_mutex_unlock(&cl->mutex);
 			}
-		}
-		pthread_mutex_unlock(&client_mutex);
+		} else
+			pthread_mutex_unlock(&client_mutex);
 	}
 out:
 	return NULL;
@@ -4779,11 +4800,15 @@ static int setup_client_thread(void)

 static void close_client_thread(void)
 {
+	int perrno;
+
 	pthread_mutex_lock(&client_mutex);
 	client_stop = 1;
 	pthread_cond_signal(&client_cond);
 	pthread_mutex_unlock(&client_mutex);
-	pthread_join(client_thread, NULL);
+
+	if ((perrno = pthread_join(client_thread, NULL)))
+		log_error("pthread_join client_thread error %d", perrno);
 }

 /*
@@ -4907,14 +4932,10 @@ static int get_lockd_vgs(struct list_head *vg_lockd)
 				continue;

 			for (lv_cn = md_cn->child; lv_cn; lv_cn = lv_cn->sib) {
-				snprintf(find_str_path, PATH_MAX, "%s/lock_type", lv_cn->key);
-				lock_type = dm_config_find_str(lv_cn, find_str_path, NULL);
-
-				if (!lock_type)
-					continue;
-
 				snprintf(find_str_path, PATH_MAX, "%s/lock_args", lv_cn->key);
 				lock_args = dm_config_find_str(lv_cn, find_str_path, NULL);
+				if (!lock_args)
+					continue;

 				snprintf(find_str_path, PATH_MAX, "%s/id", lv_cn->key);
 				lv_uuid = dm_config_find_str(lv_cn, find_str_path, NULL);
@@ -4960,7 +4981,7 @@ out:
 	return rv;
 }

-static char _dm_uuid[64];
+static char _dm_uuid[DM_UUID_LEN];

 static char *get_dm_uuid(char *dm_name)
 {
@@ -5179,20 +5200,17 @@ static void adopt_locks(void)
 	 * Get list of lockspaces from lock managers.
 	 * Get list of VGs from lvmetad with a lockd type.
 	 * Get list of active lockd type LVs from /dev.
-	 *
-	 * ECONNREFUSED means the lock manager is not running.
-	 * This is expected for at least one of them.
 	 */

-	if (lm_support_dlm()) {
+	if (lm_support_dlm() && lm_is_running_dlm()) {
 		rv = lm_get_lockspaces_dlm(&ls_found);
-		if ((rv < 0) && (rv != -ECONNREFUSED))
+		if (rv < 0)
 			goto fail;
 	}

-	if (lm_support_sanlock()) {
+	if (lm_support_sanlock() && lm_is_running_sanlock()) {
 		rv = lm_get_lockspaces_sanlock(&ls_found);
-		if ((rv < 0) && (rv != -ECONNREFUSED))
+		if (rv < 0)
 			goto fail;
 	}

@@ -5269,7 +5287,7 @@ static void adopt_locks(void)
 	list_for_each_entry_safe(ls1, l1safe, &ls_found, list) {

 		/* The dlm global lockspace is special and doesn't match a VG. */
-		if (!strcmp(ls1->name, gl_lsname_dlm)) {
+		if ((ls1->lm_type == LD_LM_DLM) && !strcmp(ls1->name, gl_lsname_dlm)) {
 			list_del(&ls1->list);
 			free(ls1);
 			continue;
--- a/daemons/lvmlockd/lvmlockd-dlm.c
+++ b/daemons/lvmlockd/lvmlockd-dlm.c
@@ -508,7 +508,7 @@ lockrv:
 	}
 	if (rv < 0) {
 		log_error("S %s R %s lock_dlm acquire error %d errno %d", ls->name, r->name, rv, errno);
-		return rv;
+		return -ELMERR;
 	}

 	if (rdd->vb) {
@@ -581,6 +581,7 @@ int lm_convert_dlm(struct lockspace *ls, struct resource *r,
 	}
 	if (rv < 0) {
 		log_error("S %s R %s convert_dlm error %d", ls->name, r->name, rv);
+		rv = -ELMERR;
 	}
 	return rv;
 }
@@ -654,6 +655,7 @@ int lm_unlock_dlm(struct lockspace *ls, struct resource *r,
 			      0, NULL, NULL, NULL);
 	if (rv < 0) {
 		log_error("S %s R %s unlock_dlm error %d", ls->name, r->name, rv);
+		rv = -ELMERR;
 	}

 	return rv;
--- a/daemons/lvmlockd/lvmlockd-sanlock.c
+++ b/daemons/lvmlockd/lvmlockd-sanlock.c
@@ -224,7 +224,10 @@ static int lock_lv_offset_from_args(char *lv_args, uint64_t *lock_lv_offset)
 	if (rv < 0)
 		return rv;

+	errno = 0;
 	*lock_lv_offset = strtoull(offset_str, NULL, 10);
+	if (errno)
+		return -1;
 	return 0;
 }

@@ -353,12 +356,19 @@ int lm_init_vg_sanlock(char *ls_name, char *vg_name, uint32_t flags, char *vg_ar
 	log_debug("sanlock daemon version %08x proto %08x",
 		  daemon_version, daemon_proto);

-	align_size = sanlock_align(&disk);
-	if (align_size <= 0) {
-		log_error("S %s init_vg_san bad disk align size %d %s",
-			  ls_name, align_size, disk.path);
-		return -EARGS;
-	}
+	rv = sanlock_align(&disk);
+	if (rv <= 0) {
+		if (rv == -EACCES) {
+			log_error("S %s init_vg_san sanlock error -EACCES: no permission to access %s",
+				  ls_name, disk.path);
+			return -EDEVOPEN;
+		} else {
+			log_error("S %s init_vg_san sanlock error %d trying to get align size of %s",
+				  ls_name, rv, disk.path);
+			return -EARGS;
+		}
+	} else
+		align_size = rv;

 	strncpy(ss.name, ls_name, SANLK_NAME_LEN);
 	memcpy(ss.host_id_disk.path, disk.path, SANLK_PATH_LEN);
@@ -935,7 +945,9 @@ int lm_find_free_lock_sanlock(struct lockspace *ls, uint64_t *free_offset)
 	struct lm_sanlock *lms = (struct lm_sanlock *)ls->lm_data;
 	struct sanlk_resourced rd;
 	uint64_t offset;
+	uint64_t start_offset;
 	int rv;
+	int round = 0;

 	if (daemon_test) {
 		*free_offset = (1048576 * LV_LOCK_BEGIN) + (1048576 * (daemon_test_lv_count + 1));
@@ -948,9 +960,22 @@ int lm_find_free_lock_sanlock(struct lockspace *ls, uint64_t *free_offset)
 	rd.rs.num_disks = 1;
 	strncpy(rd.rs.disks[0].path, lms->ss.host_id_disk.path, SANLK_PATH_LEN-1);

-	offset = lms->align_size * LV_LOCK_BEGIN;
+	if (ls->free_lock_offset)
+		offset = ls->free_lock_offset;
+	else
+		offset = lms->align_size * LV_LOCK_BEGIN;
+
+	start_offset = offset;

 	while (1) {
+		if (offset >= start_offset && round) {
+			/* This indicates the all space are allocated. */
+			log_debug("S %s init_lv_san read back to start offset %llu",
+				ls->name, (unsigned long long)offset);
+			rv = -EMSGSIZE;
+			return rv;
+		}
+
 		rd.rs.disks[0].offset = offset;

 		memset(rd.rs.name, 0, SANLK_NAME_LEN);
@@ -960,7 +985,14 @@ int lm_find_free_lock_sanlock(struct lockspace *ls, uint64_t *free_offset)
 			/* This indicates the end of the device is reached. */
 			log_debug("S %s find_free_lock_san read limit offset %llu",
 				  ls->name, (unsigned long long)offset);
-			return -EMSGSIZE;
+
+			/* remember the NO SPACE offset, if no free area left,
+			 * search from this offset after extend */
+			*free_offset = offset;
+
+			offset = lms->align_size * LV_LOCK_BEGIN;
+			round = 1;
+			continue;
 		}

 		/*
@@ -1428,6 +1460,12 @@ int lm_lock_sanlock(struct lockspace *ls, struct resource *r, int ld_mode,

 	rv = sanlock_acquire(lms->sock, -1, flags, 1, &rs, &opt);

+	/*
+	 * errors: translate the sanlock error number to an lvmlockd error.
+	 * We don't want to return an sanlock-specific error number from
+	 * this function to code that doesn't recognize sanlock error numbers.
+	 */
+
 	if (rv == -EAGAIN) {
 		/*
 		 * It appears that sanlock_acquire returns EAGAIN when we request
@@ -1496,6 +1534,26 @@ int lm_lock_sanlock(struct lockspace *ls, struct resource *r, int ld_mode,
 		return -EAGAIN;
 	}

+	if (rv == SANLK_AIO_TIMEOUT) {
+		/*
+		 * sanlock got an i/o timeout when trying to acquire the
+		 * lease on disk.
+		 */
+		log_debug("S %s R %s lock_san acquire mode %d rv %d", ls->name, r->name, ld_mode, rv);
+		*retry = 0;
+		return -EAGAIN;
+	}
+
+	if (rv == SANLK_DBLOCK_LVER || rv == SANLK_DBLOCK_MBAL) {
+		/*
+		 * There was contention with another host for the lease,
+		 * and we lost.
+		 */
+		log_debug("S %s R %s lock_san acquire mode %d rv %d", ls->name, r->name, ld_mode, rv);
+		*retry = 0;
+		return -EAGAIN;
+	}
+
 	if (rv == SANLK_ACQUIRE_OWNED_RETRY) {
 		/*
 		 * The lock is held by a failed host, and will eventually
@@ -1546,15 +1604,25 @@ int lm_lock_sanlock(struct lockspace *ls, struct resource *r, int ld_mode,
 		if (rv == -ENOSPC)
 			rv = -ELOCKIO;

-		return rv;
+		/*
+		 * generic error number for sanlock errors that we are not
+		 * catching above.
+		 */
+		return -ELMERR;
 	}

+	/*
+	 * sanlock acquire success (rv 0)
+	 */
+
 	if (rds->vb) {
 		rv = sanlock_get_lvb(0, rs, (char *)&vb, sizeof(vb));
 		if (rv < 0) {
 			log_error("S %s R %s lock_san get_lvb error %d", ls->name, r->name, rv);
 			memset(rds->vb, 0, sizeof(struct val_blk));
 			memset(vb_out, 0, sizeof(struct val_blk));
+			/* the lock is still acquired, the vb values considered invalid */
+			rv = 0;
 			goto out;
 		}

@@ -1607,6 +1675,7 @@ int lm_convert_sanlock(struct lockspace *ls, struct resource *r,
 		if (rv < 0) {
 			log_error("S %s R %s convert_san set_lvb error %d",
 				  ls->name, r->name, rv);
+			return -ELMERR;
 		}
 	}

@@ -1619,14 +1688,35 @@ int lm_convert_sanlock(struct lockspace *ls, struct resource *r,
 	if (daemon_test)
 		return 0;

+	/*
+	 * Don't block waiting for a failed lease to expire since it causes
+	 * sanlock_convert to block for a long time, which would prevent this
+	 * thread from processing other lock requests.
+	 *
+	 * FIXME: SANLK_CONVERT_OWNER_NOWAIT is the same as SANLK_ACQUIRE_OWNER_NOWAIT.
+	 * Change to use the CONVERT define when the latest sanlock version has it.
+	 */
+	flags |= SANLK_ACQUIRE_OWNER_NOWAIT;
+
 	rv = sanlock_convert(lms->sock, -1, flags, rs);
-	if (rv == -EAGAIN) {
-		/* FIXME: When could this happen?  Should something different be done? */
-		log_error("S %s R %s convert_san EAGAIN", ls->name, r->name);
+	if (!rv)
+		return 0;
+
+	switch (rv) {
+	case -EAGAIN:
+	case SANLK_ACQUIRE_IDLIVE:
+	case SANLK_ACQUIRE_OWNED:
+	case SANLK_ACQUIRE_OWNED_RETRY:
+	case SANLK_ACQUIRE_OTHER:
+	case SANLK_AIO_TIMEOUT:
+	case SANLK_DBLOCK_LVER:
+	case SANLK_DBLOCK_MBAL:
+		/* expected errors from known/normal cases like lock contention or io timeouts */
+		log_debug("S %s R %s convert_san error %d", ls->name, r->name, rv);
 		return -EAGAIN;
-	}
-	if (rv < 0) {
+	default:
 		log_error("S %s R %s convert_san convert error %d", ls->name, r->name, rv);
+		rv = -ELMERR;
 	}

 	return rv;
@@ -1663,6 +1753,7 @@ static int release_rename(struct lockspace *ls, struct resource *r)
 	rv = sanlock_release(lms->sock, -1, SANLK_REL_RENAME, 2, res_args);
 	if (rv < 0) {
 		log_error("S %s R %s unlock_san release rename error %d", ls->name, r->name, rv);
+		rv = -ELMERR;
 	}

 	free(res_args);
@@ -1719,6 +1810,7 @@ int lm_unlock_sanlock(struct lockspace *ls, struct resource *r,
 		if (rv < 0) {
 			log_error("S %s R %s unlock_san set_lvb error %d",
 				  ls->name, r->name, rv);
+			return -ELMERR;
 		}
 	}

@@ -1737,6 +1829,8 @@ int lm_unlock_sanlock(struct lockspace *ls, struct resource *r,

 	if (rv == -EIO)
 		rv = -ELOCKIO;
+	else if (rv < 0)
+		rv = -ELMERR;

 	return rv;
 }
--- a/daemons/lvmpolld/Makefile.in
+++ b/daemons/lvmpolld/Makefile.in
@@ -27,18 +27,14 @@ CFLOW_TARGET = lvmpolld

 include $(top_builddir)/make.tmpl

+CFLAGS += $(EXTRA_EXEC_CFLAGS)
 INCLUDES += -I$(top_srcdir)/libdaemon/server
-LVMLIBS = -ldaemonserver $(LVMINTERNAL_LIBS) -ldevmapper
-
-LIBS += $(PTHREAD_LIBS)
-
-LDFLAGS += -L$(top_builddir)/libdaemon/server $(DAEMON_LDFLAGS)
-CLDFLAGS += -L$(top_builddir)/libdaemon/server
-CFLAGS += $(DAEMON_CFLAGS)
+LDFLAGS += -L$(top_builddir)/libdaemon/server $(EXTRA_EXEC_LDFLAGS) $(ELDFLAGS)
+LIBS += $(DAEMON_LIBS) -ldaemonserver -ldevmapper $(PTHREAD_LIBS)

 lvmpolld: $(OBJECTS) $(top_builddir)/libdaemon/client/libdaemonclient.a \
 		    $(top_builddir)/libdaemon/server/libdaemonserver.a
-	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) $(LVMLIBS) $(LIBS)
+	$(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJECTS) $(LIBS)

 install_lvmpolld: lvmpolld
 	$(INSTALL_PROGRAM) -D $< $(sbindir)/$(<F)
--- a/doc/aio_design.txt
+++ b/doc/aio_design.txt
@@ -0,0 +1,215 @@
+Introducing asynchronous I/O to LVM
+===================================
+
+Issuing I/O asynchronously means instructing the kernel to perform specific
+I/O and return immediately without waiting for it to complete.  The data
+is collected from the kernel later.
+
+Advantages
+----------
+
+A1. While waiting for the I/O to happen, the program could perform other
+operations.
+
+A2. When LVM is searching for its Physical Volumes, it issues a small amount of
+I/O to a large number of disks.  If this was issued in parallel the overall
+runtime might be shorter while there should be little effect on the cpu time.
+
+A3. If more than one timeout occurs when accessing any devices, these can be
+taken in parallel, again reducing the runtime.  This applies globally,
+not just while the code is searching for Physical Volumes, so reading,
+writing and committing the metadata may occasionally benefit too to some
+extent and there are probably maintenance advantages in using the same
+method of I/O throughout the main body of the code.
+
+A4. By introducing a simple callback function mechanism, the conversion can be
+performed largely incrementally by first refactoring and continuing to
+use synchronous I/O with the callbacks performed immediately.  This allows the
+callbacks to be introduced without changing the running sequence of the code
+initially.  Future projects could refactor some of the calling sites to
+simplify the code structure and even eliminate some of the nesting.
+This allows each part of what might ultimately amount to a large change to be
+introduced and tested independently.
+
+
+Disadvantages
+-------------
+
+D1. The resulting code may be more complex with more failure modes to
+handle.  Mitigate by thorough auditing and testing, rolling out
+gradually, and offering a simple switch to revert to the old behaviour.
+
+D2. The linux asynchronous I/O implementation is less mature than
+its synchronous I/O implementation and might show up problems that
+depend on the version of the kernel or library used.  Fixes or
+workarounds for some of these might require kernel changes.  For
+example, there are suggestions that despite being supposedly async,
+there are still cases where system calls can block.  There might be
+resource dependencies on other processes running on the system that make
+it unsuitable for use while any devices are suspended.  Mitigation
+as for D1.
+
+D3. The error handling within callbacks becomes more complicated.
+However we know that existing call paths can already sometimes discard
+errors, sometimes deliberately, sometimes not, so this aspect is in need
+of a complete review anyway and the new approach will make the error
+handling more transparent.  Aim initially for overall behaviour that is
+no worse than that of the existing code, then work on improving it
+later.
+
+D4. The work will take a few weeks to code and test.  This leads to a
+significant opportunity cost when compared against other enhancements
+that could be achieved in that time.  However, the proof-of-concept work
+performed while writing this design has satisfied me that the work could
+proceed and be committed incrementally as a background task.
+
+
+Observations regarding LVM's I/O Architecture 
+---------------------------------------------
+
+H1. All device, metadata and config file I/O is constrained to pass through a
+single route in lib/device.
+
+H2. The first step of the analysis was to instrument this code path with
+log_debug messages.  I/O is split into the following categories:
+
+        "dev signatures",
+        "PV labels",
+        "VG metadata header",
+        "VG metadata content",
+        "extra VG metadata header",
+        "extra VG metadata content",
+        "LVM1 metadata",
+        "pool metadata",
+        "LV content",
+        "logging",
+
+H3. A bounce buffer is used for most I/O.
+
+H4. Most callers finish using the supplied data before any further I/O is
+issued.  The few that don't could be converted trivially to do so.
+
+H5. There is one stream of I/O per metadata area on each device.
+
+H6. Some reads fall at offsets close to immediately preceding reads, so it's
+possible to avoid these by caching one "block" per metadata area I/O stream.
+
+H7. Simple analysis suggests a minimum aligned read size of 8k would deliver
+immediate gains from this caching.  A larger size might perform worse because
+almost all the time the extra data read would not be used, but this can be
+re-examined and tuned after the code is in place.
+
+
+Proposal
+--------
+
+P1. Retain the "single I/O path" but offer an asynchronous option.
+
+P2. Eliminate the bounce buffer in most cases by improving alignment.
+
+P3. Reduce the number of reads by always reading a minimum of an aligned
+8k block.  
+
+P4. Eliminate repeated reads by caching the last block read and changing
+the lib/device interface to return a pointer to read-only data within
+this block.
+
+P5. Only perform these interface changes for code on the critical path
+for now by converting other code sites to use wrappers around the new
+interface.
+
+P6. Treat asynchronous I/O as the interface of choice and optimise only
+for this case.
+
+P7. Convert the callers on the critical path to pass callback functions
+to the device layer.  These functions will be called later with the
+read-only data, a context pointer and a success/failure indicator.
+Where an existing function performs a sequence of I/O, this has the
+advantage of breaking up the large function into smaller ones and
+wrapping the parameters used into structures.  While this might look
+rather messy and ad-hoc in the short-term, it's a first step towards
+breaking up confusingly long functions into component parts and wrapping
+the existing long parameter lists into more appropriate structures and
+refactoring these parts of the code.
+
+P8. Limit the resources used by the asynchronous I/O by using two
+tunable parameters, one limiting the number of outstanding I/Os issued
+and another limiting the total amount of memory used.
+
+P9. Provide a fallback option if asynchronous I/O is unavailable by
+sharing the code paths but issuing the I/O synchronously and calling the
+callback immediately.
+
+P10. Only allocate the buffer for the I/O at the point where the I/O is
+about to be issued.
+
+P11. If the thresholds are exceeded, add the request to a simple queue,
+and process it later after some I/O has completed.
+
+
+Future work
+-----------
+F1. Perform a complete review of the error tracking so that device
+failures are handled and reported more cleanly, extending the existing
+basic error counting mechanism.
+
+F2. Consider whether some of the nested callbacks can be eliminated,
+which would allow for additional simplifications.
+
+F3. Adjust the contents of the adhoc context structs into more logical
+arrangements and use them more widely.
+
+F4. Perform wider refactoring of these areas of code.
+
+
+Testing considerations
+----------------------
+T1. The changes touch code on the device path, so a thorough re-test of
+the device layer is required.  The new code needs a full audit down
+through the library layer into the kernel to check that all the error
+conditions that are currently implemented (such as EAGAIN) are handled
+sensibly. (LVM's I/O layer needs to remain as solid as we can make it.)
+
+T2. The current test suite provides a reasonably broad range of coverage
+of this area but is far from comprehensive.
+
+
+Acceptance criteria
+-------------------
+A1. The current test suite should pass to the same extent as before the
+changes.
+
+A2. When all debugging and logging is disabled, strace -c must show
+improvements e.g. the expected fewer number of reads.
+
+A3. Running a range of commands under valgrind must not reveal any
+new leaks due to the changes.
+
+A4. All new coverity reports from the change must be addressed.
+
+A5. CPU time should be similar to that before, as the same work
+is being done overall, just in a different order.
+
+A6. Tests need to show improved behaviour in targetted areas.  For example,
+if several devices are slow and time out, the delays should occur
+in parallel and the elapsed time should be less than before.
+
+
+Release considerations
+----------------------
+R1. Async I/O should be widely available and largely reliable on linux
+nowadays (even though parts of its interface and implementation remain a
+matter of controversy) so we should try to make its use the default
+whereever it is supported.  If certain types of systems have problems we
+should try to detect those cases and disable it automatically there.
+
+R2. Because the implications of an unexpected problem in the new code
+could be severe for the people affected, the roll out needs to be gentle
+without a deadline to allow us plenty of time to gain confidence in the
+new code.  Our own testing will only be able to cover a tiny fraction of
+the different setups our users have, so we need to look out for problems
+caused by this proactively and encourage people to test it on their own
+systems and report back.  It must go into the tree near the start of a
+release cycle rather than at the end to provide time for our confidence
+in it to grow.
+
--- a/doc/kernel/cache.txt
+++ b/doc/kernel/cache.txt
@@ -207,6 +207,10 @@ Optional feature arguments are:
 		   block, then the cache block is invalidated.
 		   To enable passthrough mode the cache must be clean.

+   metadata2	: use version 2 of the metadata.  This stores the dirty bits
+                  in a separate btree, which improves speed of shutting
+		  down the cache.
+
 A policy called 'default' is always registered.  This is an alias for
 the policy we currently think is giving best all round performance.

@@ -286,7 +290,7 @@ message, which takes an arbitrary number of cblock ranges.  Each cblock
 range's end value is "one past the end", meaning 5-10 expresses a range
 of values from 5 to 9.  Each cblock must be expressed as a decimal
 value, in the future a variant message that takes cblock ranges
-expressed in hexidecimal may be needed to better support efficient
+expressed in hexadecimal may be needed to better support efficient
 invalidation of larger caches.  The cache must be in passthrough mode
 when invalidate_cblocks is used.

--- a/doc/kernel/crypt.txt
+++ b/doc/kernel/crypt.txt
@@ -11,23 +11,57 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
 	      <offset> [<#opt_params> <opt_params>]

 <cipher>
-    Encryption cipher and an optional IV generation mode.
-    (In format cipher[:keycount]-chainmode-ivmode[:ivopts]).
-    Examples:
-       des
-       aes-cbc-essiv:sha256
-       twofish-ecb
+    Encryption cipher, encryption mode and Initial Vector (IV) generator.

-    /proc/crypto contains supported crypto modes
+    The cipher specifications format is:
+       cipher[:keycount]-chainmode-ivmode[:ivopts]
+    Examples:
+       aes-cbc-essiv:sha256
+       aes-xts-plain64
+       serpent-xts-plain64
+
+    Cipher format also supports direct specification with kernel crypt API
+    format (selected by capi: prefix). The IV specification is the same
+    as for the first format type.
+    This format is mainly used for specification of authenticated modes.
+
+    The crypto API cipher specifications format is:
+        capi:cipher_api_spec-ivmode[:ivopts]
+    Examples:
+        capi:cbc(aes)-essiv:sha256
+        capi:xts(aes)-plain64
+    Examples of authenticated modes:
+        capi:gcm(aes)-random
+        capi:authenc(hmac(sha256),xts(aes))-random
+        capi:rfc7539(chacha20,poly1305)-random
+
+    The /proc/crypto contains a list of curently loaded crypto modes.

 <key>
-    Key used for encryption. It is encoded as a hexadecimal number.
+    Key used for encryption. It is encoded either as a hexadecimal number
+    or it can be passed as <key_string> prefixed with single colon
+    character (':') for keys residing in kernel keyring service.
    You can only use key sizes that are valid for the selected cipher
    in combination with the selected iv mode.
    Note that for some iv modes the key string can contain additional
    keys (for example IV seed) so the key contains more parts concatenated
    into a single string.

+<key_string>
+    The kernel keyring key is identified by string in following format:
+    <key_size>:<key_type>:<key_description>.
+
+<key_size>
+    The encryption key size in bytes. The kernel key payload size must match
+    the value passed in <key_size>.
+
+<key_type>
+    Either 'logon' or 'user' kernel key type.
+
+<key_description>
+    The kernel keyring key description crypt target should look for
+    when loading key of <key_type>.
+
 <keycount>
    Multi-key compatibility mode. You can define <keycount> keys and
    then sectors are encrypted according to their offsets (sector 0 uses key0;
@@ -76,6 +110,32 @@ submit_from_crypt_cpus
    thread because it benefits CFQ to have writes submitted using the
    same context.

+integrity:<bytes>:<type>
+    The device requires additional <bytes> metadata per-sector stored
+    in per-bio integrity structure. This metadata must by provided
+    by underlying dm-integrity target.
+
+    The <type> can be "none" if metadata is used only for persistent IV.
+
+    For Authenticated Encryption with Additional Data (AEAD)
+    the <type> is "aead". An AEAD mode additionally calculates and verifies
+    integrity for the encrypted device. The additional space is then
+    used for storing authentication tag (and persistent IV if needed).
+
+sector_size:<bytes>
+    Use <bytes> as the encryption unit instead of 512 bytes sectors.
+    This option can be in range 512 - 4096 bytes and must be power of two.
+    Virtual device will announce this size as a minimal IO and logical sector.
+
+iv_large_sectors
+   IV generators will use sector number counted in <sector_size> units
+   instead of default 512 bytes sectors.
+
+   For example, if <sector_size> is 4096 bytes, plain64 IV for the second
+   sector will be 8 (without flag) and 1 if iv_large_sectors is present.
+   The <iv_offset> must be multiple of <sector_size> (in 512 bytes units)
+   if this flag is specified.
+
 Example scripts
 ===============
 LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
@@ -85,7 +145,13 @@ https://gitlab.com/cryptsetup/cryptsetup
 [[
 #!/bin/sh
 # Create a crypt device using dmsetup
-dmsetup create crypt1 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
+dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
+]]
+
+[[
+#!/bin/sh
+# Create a crypt device using dmsetup when encryption key is stored in keyring service
+dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0"
 ]]

 [[
--- a/doc/kernel/delay.txt
+++ b/doc/kernel/delay.txt
@@ -16,12 +16,12 @@ Example scripts
 [[
 #!/bin/sh
 # Create device delaying rw operation for 500ms
-echo "0 `blockdev --getsize $1` delay $1 0 500" | dmsetup create delayed
+echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed
 ]]

 [[
 #!/bin/sh
 # Create device delaying only write operation for 500ms and
 # splitting reads and writes to different devices $1 $2
-echo "0 `blockdev --getsize $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
+echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed
 ]]
--- a/doc/kernel/flakey.txt
+++ b/doc/kernel/flakey.txt
@@ -42,7 +42,7 @@ Optional feature parameters:
    <direction>: Either 'r' to corrupt reads or 'w' to corrupt writes.
 		 'w' is incompatible with drop_writes.
    <value>: The value (from 0-255) to write.
-    <flags>: Perform the replacement only if bio->bi_rw has all the
+    <flags>: Perform the replacement only if bio->bi_opf has all the
 	     selected flags set.

 Examples:
--- a/doc/kernel/integrity.txt
+++ b/doc/kernel/integrity.txt
@@ -0,0 +1,199 @@
+The dm-integrity target emulates a block device that has additional
+per-sector tags that can be used for storing integrity information.
+
+A general problem with storing integrity tags with every sector is that
+writing the sector and the integrity tag must be atomic - i.e. in case of
+crash, either both sector and integrity tag or none of them is written.
+
+To guarantee write atomicity, the dm-integrity target uses journal, it
+writes sector data and integrity tags into a journal, commits the journal
+and then copies the data and integrity tags to their respective location.
+
+The dm-integrity target can be used with the dm-crypt target - in this
+situation the dm-crypt target creates the integrity data and passes them
+to the dm-integrity target via bio_integrity_payload attached to the bio.
+In this mode, the dm-crypt and dm-integrity targets provide authenticated
+disk encryption - if the attacker modifies the encrypted device, an I/O
+error is returned instead of random data.
+
+The dm-integrity target can also be used as a standalone target, in this
+mode it calculates and verifies the integrity tag internally. In this
+mode, the dm-integrity target can be used to detect silent data
+corruption on the disk or in the I/O path.
+
+
+When loading the target for the first time, the kernel driver will format
+the device. But it will only format the device if the superblock contains
+zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
+target can't be loaded.
+
+To use the target for the first time:
+1. overwrite the superblock with zeroes
+2. load the dm-integrity target with one-sector size, the kernel driver
+	will format the device
+3. unload the dm-integrity target
+4. read the "provided_data_sectors" value from the superblock
+5. load the dm-integrity target with the the target size
+	"provided_data_sectors"
+6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
+	with the size "provided_data_sectors"
+
+
+Target arguments:
+
+1. the underlying block device
+
+2. the number of reserved sector at the beginning of the device - the
+	dm-integrity won't read of write these sectors
+
+3. the size of the integrity tag (if "-" is used, the size is taken from
+	the internal-hash algorithm)
+
+4. mode:
+	D - direct writes (without journal) - in this mode, journaling is
+		not used and data sectors and integrity tags are written
+		separately. In case of crash, it is possible that the data
+		and integrity tag doesn't match.
+	J - journaled writes - data and integrity tags are written to the
+		journal and atomicity is guaranteed. In case of crash,
+		either both data and tag or none of them are written. The
+		journaled mode degrades write throughput twice because the
+		data have to be written twice.
+	R - recovery mode - in this mode, journal is not replayed,
+		checksums are not checked and writes to the device are not
+		allowed. This mode is useful for data recovery if the
+		device cannot be activated in any of the other standard
+		modes.
+
+5. the number of additional arguments
+
+Additional arguments:
+
+journal_sectors:number
+	The size of journal, this argument is used only if formatting the
+	device. If the device is already formatted, the value from the
+	superblock is used.
+
+interleave_sectors:number
+	The number of interleaved sectors. This values is rounded down to
+	a power of two. If the device is already formatted, the value from
+	the superblock is used.
+
+buffer_sectors:number
+	The number of sectors in one buffer. The value is rounded down to
+	a power of two.
+
+	The tag area is accessed using buffers, the buffer size is
+	configurable. The large buffer size means that the I/O size will
+	be larger, but there could be less I/Os issued.
+
+journal_watermark:number
+	The journal watermark in percents. When the size of the journal
+	exceeds this watermark, the thread that flushes the journal will
+	be started.
+
+commit_time:number
+	Commit time in milliseconds. When this time passes, the journal is
+	written. The journal is also written immediatelly if the FLUSH
+	request is received.
+
+internal_hash:algorithm(:key)	(the key is optional)
+	Use internal hash or crc.
+	When this argument is used, the dm-integrity target won't accept
+	integrity tags from the upper target, but it will automatically
+	generate and verify the integrity tags.
+
+	You can use a crc algorithm (such as crc32), then integrity target
+	will protect the data against accidental corruption.
+	You can also use a hmac algorithm (for example
+	"hmac(sha256):0123456789abcdef"), in this mode it will provide
+	cryptographic authentication of the data without encryption.
+
+	When this argument is not used, the integrity tags are accepted
+	from an upper layer target, such as dm-crypt. The upper layer
+	target should check the validity of the integrity tags.
+
+journal_crypt:algorithm(:key)	(the key is optional)
+	Encrypt the journal using given algorithm to make sure that the
+	attacker can't read the journal. You can use a block cipher here
+	(such as "cbc(aes)") or a stream cipher (for example "chacha20",
+	"salsa20", "ctr(aes)" or "ecb(arc4)").
+
+	The journal contains history of last writes to the block device,
+	an attacker reading the journal could see the last sector nubmers
+	that were written. From the sector numbers, the attacker can infer
+	the size of files that were written. To protect against this
+	situation, you can encrypt the journal.
+
+journal_mac:algorithm(:key)	(the key is optional)
+	Protect sector numbers in the journal from accidental or malicious
+	modification. To protect against accidental modification, use a
+	crc algorithm, to protect against malicious modification, use a
+	hmac algorithm with a key.
+
+	This option is not needed when using internal-hash because in this
+	mode, the integrity of journal entries is checked when replaying
+	the journal. Thus, modified sector number would be detected at
+	this stage.
+
+block_size:number
+	The size of a data block in bytes.  The larger the block size the
+	less overhead there is for per-block integrity metadata.
+	Supported values are 512, 1024, 2048 and 4096 bytes.  If not
+	specified the default block size is 512 bytes.
+
+The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can
+be changed when reloading the target (load an inactive table and swap the
+tables with suspend and resume). The other arguments should not be changed
+when reloading the target because the layout of disk data depend on them
+and the reloaded target would be non-functional.
+
+
+The layout of the formatted block device:
+* reserved sectors (they are not used by this target, they can be used for
+  storing LUKS metadata or for other purpose), the size of the reserved
+  area is specified in the target arguments
+* superblock (4kiB)
+	* magic string - identifies that the device was formatted
+	* version
+	* log2(interleave sectors)
+	* integrity tag size
+	* the number of journal sections
+	* provided data sectors - the number of sectors that this target
+	  provides (i.e. the size of the device minus the size of all
+	  metadata and padding). The user of this target should not send
+	  bios that access data beyond the "provided data sectors" limit.
+	* flags - a flag is set if journal_mac is used
+* journal
+	The journal is divided into sections, each section contains:
+	* metadata area (4kiB), it contains journal entries
+	  every journal entry contains:
+		* logical sector (specifies where the data and tag should
+		  be written)
+		* last 8 bytes of data
+		* integrity tag (the size is specified in the superblock)
+	    every metadata sector ends with
+		* mac (8-bytes), all the macs in 8 metadata sectors form a
+		  64-byte value. It is used to store hmac of sector
+		  numbers in the journal section, to protect against a
+		  possibility that the attacker tampers with sector
+		  numbers in the journal.
+		* commit id
+	* data area (the size is variable; it depends on how many journal
+	  entries fit into the metadata area)
+	    every sector in the data area contains:
+		* data (504 bytes of data, the last 8 bytes are stored in
+		  the journal entry)
+		* commit id
+	To test if the whole journal section was written correctly, every
+	512-byte sector of the journal ends with 8-byte commit id. If the
+	commit id matches on all sectors in a journal section, then it is
+	assumed that the section was written correctly. If the commit id
+	doesn't match, the section was written partially and it should not
+	be replayed.
+* one or more runs of interleaved tags and data. Each run contains:
+	* tag area - it contains integrity tags. There is one tag for each
+	  sector in the data area
+	* data area - it contains data sectors. The number of data sectors
+	  in one run must be a power of two. log2 of this value is stored
+	  in the superblock.
--- a/doc/kernel/linear.txt
+++ b/doc/kernel/linear.txt
@@ -16,15 +16,15 @@ Example scripts
 [[
 #!/bin/sh
 # Create an identity mapping for a device
-echo "0 `blockdev --getsize $1` linear $1 0" | dmsetup create identity
+echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity
 ]]


 [[
 #!/bin/sh
 # Join 2 devices together
-size1=`blockdev --getsize $1`
-size2=`blockdev --getsize $2`
+size1=`blockdev --getsz $1`
+size2=`blockdev --getsz $2`
 echo "0 $size1 linear $1 0
 $size1 $size2 linear $2 0" | dmsetup create joined
 ]]
@@ -44,7 +44,7 @@ if (!defined($dev)) {
        die("Please specify a device.\n");
 }

-my $dev_size = `blockdev --getsize $dev`;
+my $dev_size = `blockdev --getsz $dev`;
 my $extents = int($dev_size / $extent_size) -
              (($dev_size % $extent_size) ? 1 : 0);

--- a/doc/kernel/log-writes.txt
+++ b/doc/kernel/log-writes.txt
@@ -14,14 +14,14 @@ Log Ordering

 We log things in order of completion once we are sure the write is no longer in
 cache.  This means that normal WRITE requests are not actually logged until the
-next REQ_FLUSH request.  This is to make it easier for userspace to replay the
-log in a way that correlates to what is on disk and not what is in cache, to
-make it easier to detect improper waiting/flushing.
+next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
+the log in a way that correlates to what is on disk and not what is in cache,
+to make it easier to detect improper waiting/flushing.

 This works by attaching all WRITE requests to a list once the write completes.
-Once we see a REQ_FLUSH request we splice this list onto the request and once
+Once we see a REQ_PREFLUSH request we splice this list onto the request and once
 the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
-completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to
+completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
 simulate the worst case scenario with regard to power failures.  Consider the
 following example (W means write, C means complete):

--- a/doc/kernel/raid.txt
+++ b/doc/kernel/raid.txt
@@ -14,8 +14,12 @@ The target is named "raid" and it accepts the following parameters:
    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]

 <raid_type>:
+  raid0		RAID0 striping (no resilience)
  raid1		RAID1 mirroring
-  raid4		RAID4 dedicated parity disk
+  raid4		RAID4 with dedicated last parity disk
+  raid5_n 	RAID5 with dedicated last parity disk supporting takeover
+		Same as raid4
+		-Transitory layout
  raid5_la	RAID5 left asymmetric
 		- rotating parity 0 with data continuation
  raid5_ra	RAID5 right asymmetric
@@ -30,7 +34,19 @@ The target is named "raid" and it accepts the following parameters:
 		- rotating parity N (right-to-left) with data restart
  raid6_nc	RAID6 N continue
 		- rotating parity N (right-to-left) with data continuation
+  raid6_n_6	RAID6 with dedicate parity disks
+		- parity and Q-syndrome on the last 2 disks;
+		  layout for takeover from/to raid4/raid5_n
+  raid6_la_6	Same as "raid_la" plus dedicated last Q-syndrome disk
+		- layout for takeover from raid5_la from/to raid6
+  raid6_ra_6	Same as "raid5_ra" dedicated last Q-syndrome disk
+		- layout for takeover from raid5_ra from/to raid6
+  raid6_ls_6	Same as "raid5_ls" dedicated last Q-syndrome disk
+		- layout for takeover from raid5_ls from/to raid6
+  raid6_rs_6	Same as "raid5_rs" dedicated last Q-syndrome disk
+		- layout for takeover from raid5_rs from/to raid6
  raid10        Various RAID10 inspired algorithms chosen by additional params
+		(see raid10_format and raid10_copies below)
 		- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
 		- RAID1E: Integrated Adjacent Stripe Mirroring
 		- RAID1E: Integrated Offset Stripe Mirroring
@@ -116,10 +132,57 @@ The target is named "raid" and it accepts the following parameters:
 		Here we see layouts closely akin to 'RAID1E - Integrated
 		Offset Stripe Mirroring'.

+        [delta_disks <N>]
+		The delta_disks option value (-251 < N < +251) triggers
+		device removal (negative value) or device addition (positive
+		value) to any reshape supporting raid levels 4/5/6 and 10.
+		RAID levels 4/5/6 allow for addition of devices (metadata
+		and data device tuple), raid10_near and raid10_offset only
+		allow for device addition. raid10_far does not support any
+		reshaping at all.
+		A minimum of devices have to be kept to enforce resilience,
+		which is 3 devices for raid4/5 and 4 devices for raid6.
+
+        [data_offset <sectors>]
+		This option value defines the offset into each data device
+		where the data starts. This is used to provide out-of-place
+		reshaping space to avoid writing over data whilst
+		changing the layout of stripes, hence an interruption/crash
+		may happen at any time without the risk of losing data.
+		E.g. when adding devices to an existing raid set during
+		forward reshaping, the out-of-place space will be allocated
+		at the beginning of each raid device. The kernel raid4/5/6/10
+		MD personalities supporting such device addition will read the data from
+		the existing first stripes (those with smaller number of stripes)
+		starting at data_offset to fill up a new stripe with the larger
+		number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
+		and write that new stripe to offset 0. Same will be applied to all
+		N-1 other new stripes. This out-of-place scheme is used to change
+		the RAID type (i.e. the allocation algorithm) as well, e.g.
+		changing from raid5_ls to raid5_n.
+
+	[journal_dev <dev>]
+		This option adds a journal device to raid4/5/6 raid sets and
+		uses it to close the 'write hole' caused by the non-atomic updates
+		to the component devices which can cause data loss during recovery.
+		The journal device is used as writethrough thus causing writes to
+		be throttled versus non-journaled raid4/5/6 sets.
+		Takeover/reshape is not possible with a raid4/5/6 journal device;
+		it has to be deconfigured before requesting these.
+
+	[journal_mode <mode>]
+		This option sets the caching mode on journaled raid4/5/6 raid sets
+		(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
+		If 'writeback' is selected the journal device has to be resilient
+		and must not suffer from the 'write hole' problem itself (e.g. use
+		raid1 or raid10) to avoid a single point of failure.
+
 <#raid_devs>: The number of devices composing the array.
 	Each device consists of two entries.  The first is the device
 	containing the metadata (if any); the second is the one containing the
-	data.
+	data. A Maximum of 64 metadata/data device entries are supported
+	up to target version 1.8.0.
+	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.

 	If a drive has failed or is missing at creation time, a '-' can be
 	given for both the metadata and data drives for a given position.
@@ -195,6 +258,14 @@ recovery.  Here is a fuller description of the individual fields:
 			in RAID1/10 or wrong parity values found in RAID4/5/6.
 			This value is valid only after a "check" of the array
 			is performed.  A healthy array has a 'mismatch_cnt' of 0.
+	<data_offset>   The current data offset to the start of the user data on
+			each component device of a raid set (see the respective
+			raid parameter to support out-of-place reshaping).
+	<journal_char>	'A' - active write-through journal device.
+			'a' - active write-back journal device.
+			'D' - dead journal device.
+			'-' - no journal device.
+

 Message Interface
 -----------------
@@ -207,7 +278,6 @@ include:
 	"recover"- Initiate/continue a recover process.
 	"check"  - Initiate a check (i.e. a "scrub") of the array.
 	"repair" - Initiate a repair of the array.
-	"reshape"- Currently unsupported (-EINVAL).


 Discard Support
@@ -257,3 +327,19 @@ Version History
 1.5.2   'mismatch_cnt' is zero unless [last_]sync_action is "check".
 1.6.0   Add discard support (and devices_handle_discard_safely module param).
 1.7.0   Add support for MD RAID0 mappings.
+1.8.0   Explicitly check for compatible flags in the superblock metadata
+	and reject to start the raid set if any are set by a newer
+	target version, thus avoiding data corruption on a raid set
+	with a reshape in progress.
+1.9.0   Add support for RAID level takeover/reshape/region size
+	and set size reduction.
+1.9.1   Fix activation of existing RAID 4/10 mapped devices
+1.9.2   Don't emit '- -' on the status table line in case the constructor
+	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
+	'D' on the status line.  If '- -' is passed into the constructor, emit
+	'- -' on the table line and '-' as the status line health character.
+1.10.0  Add support for raid4/5/6 journal device
+1.10.1  Fix data corruption on reshape request
+1.11.0  Fix table line argument order
+	(wrong raid10_copies/raid10_format sequence)
+1.11.1  Add raid4/5/6 journal write-back support via journal_mode option
--- a/doc/kernel/striped.txt
+++ b/doc/kernel/striped.txt
@@ -37,9 +37,9 @@ if (!$num_devs) {
        die("Specify at least one device\n");
 }

-$min_dev_size = `blockdev --getsize $devs[0]`;
+$min_dev_size = `blockdev --getsz $devs[0]`;
 for ($i = 1; $i < $num_devs; $i++) {
-        my $this_size = `blockdev --getsize $devs[$i]`;
+        my $this_size = `blockdev --getsz $devs[$i]`;
        $min_dev_size = ($min_dev_size < $this_size) ?
                        $min_dev_size : $this_size;
 }
--- a/doc/kernel/switch.txt
+++ b/doc/kernel/switch.txt
@@ -123,7 +123,7 @@ Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
 the same size.

 Create a switch device with 64kB region size:
-    dmsetup create switch --table "0 `blockdev --getsize /dev/vg1/switch0`
+    dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0`
 	switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"

 Set mappings for the first 7 entries to point to devices switch0, switch1,
--- a/doc/kernel/zoned.txt
+++ b/doc/kernel/zoned.txt
@@ -0,0 +1,144 @@
+dm-zoned
+========
+
+The dm-zoned device mapper target exposes a zoned block device (ZBC and
+ZAC compliant devices) as a regular block device without any write
+pattern constraints. In effect, it implements a drive-managed zoned
+block device which hides from the user (a file system or an application
+doing raw block device accesses) the sequential write constraints of
+host-managed zoned block devices and can mitigate the potential
+device-side performance degradation due to excessive random writes on
+host-aware zoned block devices.
+
+For a more detailed description of the zoned block device models and
+their constraints see (for SCSI devices):
+
+http://www.t10.org/drafts.htm#ZBC_Family
+
+and (for ATA devices):
+
+http://www.t13.org/Documents/UploadedDocuments/docs2015/di537r05-Zoned_Device_ATA_Command_Set_ZAC.pdf
+
+The dm-zoned implementation is simple and minimizes system overhead (CPU
+and memory usage as well as storage capacity loss). For a 10TB
+host-managed disk with 256 MB zones, dm-zoned memory usage per disk
+instance is at most 4.5 MB and as little as 5 zones will be used
+internally for storing metadata and performaing reclaim operations.
+
+dm-zoned target devices are formatted and checked using the dmzadm
+utility available at:
+
+https://github.com/hgst/dm-zoned-tools
+
+Algorithm
+=========
+
+dm-zoned implements an on-disk buffering scheme to handle non-sequential
+write accesses to the sequential zones of a zoned block device.
+Conventional zones are used for caching as well as for storing internal
+metadata.
+
+The zones of the device are separated into 2 types:
+
+1) Metadata zones: these are conventional zones used to store metadata.
+Metadata zones are not reported as useable capacity to the user.
+
+2) Data zones: all remaining zones, the vast majority of which will be
+sequential zones used exclusively to store user data. The conventional
+zones of the device may be used also for buffering user random writes.
+Data in these zones may be directly mapped to the conventional zone, but
+later moved to a sequential zone so that the conventional zone can be
+reused for buffering incoming random writes.
+
+dm-zoned exposes a logical device with a sector size of 4096 bytes,
+irrespective of the physical sector size of the backend zoned block
+device being used. This allows reducing the amount of metadata needed to
+manage valid blocks (blocks written).
+
+The on-disk metadata format is as follows:
+
+1) The first block of the first conventional zone found contains the
+super block which describes the on disk amount and position of metadata
+blocks.
+
+2) Following the super block, a set of blocks is used to describe the
+mapping of the logical device blocks. The mapping is done per chunk of
+blocks, with the chunk size equal to the zoned block device size. The
+mapping table is indexed by chunk number and each mapping entry
+indicates the zone number of the device storing the chunk of data. Each
+mapping entry may also indicate if the zone number of a conventional
+zone used to buffer random modification to the data zone.
+
+3) A set of blocks used to store bitmaps indicating the validity of
+blocks in the data zones follows the mapping table. A valid block is
+defined as a block that was written and not discarded. For a buffered
+data chunk, a block is always valid only in the data zone mapping the
+chunk or in the buffer zone of the chunk.
+
+For a logical chunk mapped to a conventional zone, all write operations
+are processed by directly writing to the zone. If the mapping zone is a
+sequential zone, the write operation is processed directly only if the
+write offset within the logical chunk is equal to the write pointer
+offset within of the sequential data zone (i.e. the write operation is
+aligned on the zone write pointer). Otherwise, write operations are
+processed indirectly using a buffer zone. In that case, an unused
+conventional zone is allocated and assigned to the chunk being
+accessed. Writing a block to the buffer zone of a chunk will
+automatically invalidate the same block in the sequential zone mapping
+the chunk. If all blocks of the sequential zone become invalid, the zone
+is freed and the chunk buffer zone becomes the primary zone mapping the
+chunk, resulting in native random write performance similar to a regular
+block device.
+
+Read operations are processed according to the block validity
+information provided by the bitmaps. Valid blocks are read either from
+the sequential zone mapping a chunk, or if the chunk is buffered, from
+the buffer zone assigned. If the accessed chunk has no mapping, or the
+accessed blocks are invalid, the read buffer is zeroed and the read
+operation terminated.
+
+After some time, the limited number of convnetional zones available may
+be exhausted (all used to map chunks or buffer sequential zones) and
+unaligned writes to unbuffered chunks become impossible. To avoid this
+situation, a reclaim process regularly scans used conventional zones and
+tries to reclaim the least recently used zones by copying the valid
+blocks of the buffer zone to a free sequential zone. Once the copy
+completes, the chunk mapping is updated to point to the sequential zone
+and the buffer zone freed for reuse.
+
+Metadata Protection
+===================
+
+To protect metadata against corruption in case of sudden power loss or
+system crash, 2 sets of metadata zones are used. One set, the primary
+set, is used as the main metadata region, while the secondary set is
+used as a staging area. Modified metadata is first written to the
+secondary set and validated by updating the super block in the secondary
+set, a generation counter is used to indicate that this set contains the
+newest metadata. Once this operation completes, in place of metadata
+block updates can be done in the primary metadata set. This ensures that
+one of the set is always consistent (all modifications committed or none
+at all). Flush operations are used as a commit point. Upon reception of
+a flush request, metadata modification activity is temporarily blocked
+(for both incoming BIO processing and reclaim process) and all dirty
+metadata blocks are staged and updated. Normal operation is then
+resumed. Flushing metadata thus only temporarily delays write and
+discard requests. Read requests can be processed concurrently while
+metadata flush is being executed.
+
+Usage
+=====
+
+A zoned block device must first be formatted using the dmzadm tool. This
+will analyze the device zone configuration, determine where to place the
+metadata sets on the device and initialize the metadata sets.
+
+Ex:
+
+dmzadm --format /dev/sdxx
+
+For a formatted device, the target can be created normally with the
+dmsetup utility. The only parameter that dm-zoned requires is the
+underlying zoned block device name. Ex:
+
+echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}`
--- a/doc/vdo.md
+++ b/doc/vdo.md
@@ -0,0 +1,85 @@
+# VDO - Compression and deduplication.
+
+Currently device stacking looks like this:
+
+    Physical x [multipath] x [partition] x [mdadm] x [LUKS] x [LVS] x [LUKS] x [FS|Database|...]
+
+Adding VDO:
+
+    Physical x [multipath] x [partition] x [mdadm] x [LUKS] x [LVS] x [LUKS] x VDO x [LVS] x [FS|Database|...]
+
+## Where VDO fits (and where it does not):
+
+### Backing devices for VDO volumes:
+
+1. Physical x [multipath] x [partition] x [mdadm],
+2. LUKS over (1) - full disk encryption.
+3. LVs (raids|mirror|stripe|linear) x [cache] over (1).
+4. LUKS over (3) - especially when using raids.
+
+Usual limitations apply:
+
+- Never layer LUKS over another LUKS - it makes no sense.
+- LUKS is better over the raids, than under.
+
+### Using VDO as a PV:
+
+1. under tpool
+    - The best fit - it will deduplicate additional redundancies among all
+      snapshots and will reduce the footprint.
+    - Risks: Resize! dmevent will not be able to handle resizing of tpool ATM.
+2. under corig
+    - Cache fits better under VDO device - it will reduce amount of data, and
+      deduplicate, so there should be more hits.
+    - This is useful to keep the most frequently used data in cache
+      uncompressed (if that happens to be a bottleneck.)
+3. under (multiple) linear LVs - e.g. used for VMs.
+
+### And where VDO does not fit:
+
+- *never* use VDO under LUKS volumes
+    - these are random data and do not compress nor deduplicate well,
+- *never* use VDO under cmeta and tmeta LVs
+    - these are random data and do not compress nor deduplicate well,
+- under raids
+    - raid{4,5,6} scrambles data, so they do not deduplicate well,
+    - raid{1,4,5,6,10} also causes amount of data grow, so more (duplicit in
+      case of raid{1,10}) work has to be done in order to find less duplicates.
+
+### And where it could be useful:
+
+- under snapshot CoW device - when there are multiple of those it could deduplicate
+
+### Things to decide
+
+- under integrity devices - it should work - mostly for data
+    - hash is not compressible and unique - it makes sense to have separate imeta and idata volumes for integrity devices
+
+### Future Integration of VDO into LVM:
+
+One issue is using both LUKS and RAID under VDO. We have two options:
+
+- use mdadm x LUKS x VDO+LV
+- use LV RAID x LUKS x VDO+LV - still requiring recursive LVs.
+
+Another issue is duality of VDO - it is a top level LV but it can be seen as a "pool" for multiple devices.
+
+- This is one usecase which could not be handled by LVM at the moment.
+- Size of the VDO is its physical size and virtual size - just like tpool.
+      - same problems with virtual vs physical size - it can get full, without exposing it fo a FS
+
+Another possible RFE is to split data and metadata:
+
+- e.g. keep data on HDD and metadata on SSD
+
+## Issues / Testing
+
+- fstrim/discard pass down - does it work with VDO?
+- VDO can run in synchronous vs. asynchronous mode
+    - synchronous for devices where write is safe after it is confirmed. Some devices are lying.
+    - asynchronous for devices requiring flush
+- multiple devices under VDO - need to find common options
+- pvmove - changing characteristics of underlying device
+- autoactivation during boot
+    - Q: can we use VDO for RootFS?
+
--- a/include/configure.h.in
+++ b/include/configure.h.in
@@ -1,5 +1,8 @@
 /* include/configure.h.in.  Generated from configure.in by autoheader.  */

+/* Define to 1 if aio is available. */
+#undef AIO_SUPPORT
+
 /* Define to 1 to use libblkid detection of signatures when wiping. */
 #undef BLKID_WIPING_SUPPORT

@@ -148,6 +151,9 @@
 /* Library version */
 #undef DM_LIB_VERSION

+/* Path to fsadm binary. */
+#undef FSADM_PATH
+
 /* Define to 1 if you have the `alarm' function. */
 #undef HAVE_ALARM

@@ -491,6 +497,9 @@
 /* Define to 1 if you have the <sys/file.h> header file. */
 #undef HAVE_SYS_FILE_H

+/* Define to 1 if you have the <sys/inotify.h> header file. */
+#undef HAVE_SYS_INOTIFY_H
+
 /* Define to 1 if you have the <sys/ioctl.h> header file. */
 #undef HAVE_SYS_IOCTL_H

@@ -626,6 +635,9 @@
 /* Define to 1 to include code that uses lvmpolld. */
 #undef LVMPOLLD_SUPPORT

+/* configure command line used */
+#undef LVM_CONFIGURE_LINE
+
 /* Path to lvm binary. */
 #undef LVM_PATH

@@ -676,9 +688,6 @@
 /* Define to 1 to include the LVM readline shell. */
 #undef READLINE_SUPPORT

-/* Define to 1 to include built-in support for replicators. */
-#undef REPLICATOR_INTERNAL
-
 /* Define as the return type of signal handlers (`int' or `void'). */
 #undef RETSIGTYPE

--- a/lib/Makefile.in
+++ b/lib/Makefile.in
@@ -36,10 +36,6 @@ ifeq ("@RAID@", "shared")
  SUBDIRS += raid
 endif

-ifeq ("@REPLICATORS@", "shared")
-  SUBDIRS += replicator
-endif
-
 ifeq ("@THIN@", "shared")
  SUBDIRS += thin
 endif
@@ -48,6 +44,10 @@ ifeq ("@CACHE@", "shared")
  SUBDIRS += cache_segtype
 endif

+ifeq ("@CLUSTER@", "shared")
+  SUBDIRS += locking
+endif
+
 SOURCES =\
 	activate/activate.c \
 	cache/lvmcache.c \
@@ -96,13 +96,13 @@ SOURCES =\
 	metadata/lv_manip.c \
 	metadata/merge.c \
 	metadata/metadata.c \
+	metadata/metadata-liblvm.c \
 	metadata/mirror.c \
 	metadata/pool_manip.c \
 	metadata/pv.c \
 	metadata/pv_manip.c \
 	metadata/pv_map.c \
 	metadata/raid_manip.c \
-	metadata/replicator_manip.c \
 	metadata/segtype.c \
 	metadata/snapshot_manip.c \
 	metadata/thin_manip.c \
@@ -149,10 +149,6 @@ ifeq ("@CLUSTER@", "internal")
  SOURCES += locking/cluster_locking.c
 endif

-ifeq ("@CLUSTER@", "shared")
-  SUBDIRS += locking
-endif
-
 ifeq ("@SNAPSHOTS@", "internal")
  SOURCES += snapshot/snapshot.c
 endif
@@ -165,10 +161,6 @@ ifeq ("@RAID@", "internal")
  SOURCES += raid/raid.c
 endif

-ifeq ("@REPLICATORS@", "internal")
-  SOURCES += replicator/replicator.c
-endif
-
 ifeq ("@THIN@", "internal")
  SOURCES += thin/thin.c
 endif
@@ -204,11 +196,6 @@ ifeq ("@BUILD_LVMLOCKD@", "yes")
 	locking/lvmlockd.c
 endif

-ifeq ("@DMEVENTD@", "yes")
-  CLDFLAGS += -L$(top_builddir)/daemons/dmeventd
-  LIBS += -ldevmapper-event
-endif
-
 LIB_NAME = liblvm-internal
 LIB_STATIC = $(LIB_NAME).a

@@ -220,7 +207,6 @@ ifeq ($(MAKECMDGOALS),distclean)
 	mirror \
 	notify \
 	raid \
-	replicator \
 	thin \
 	cache_segtype \
 	locking
@@ -229,9 +215,9 @@ endif
 CFLOW_LIST = $(SOURCES)
 CFLOW_LIST_TARGET = $(LIB_NAME).cflow

-include $(top_builddir)/make.tmpl
+PROGS_CFLAGS = $(BLKID_CFLAGS) $(UDEV_CFLAGS)

-CFLAGS += $(BLKID_CFLAGS) $(UDEV_CFLAGS) $(VALGRIND_CFLAGS)
+include $(top_builddir)/make.tmpl

 $(SUBDIRS): $(LIB_STATIC)

--- a/lib/activate/activate.c
+++ b/lib/activate/activate.c
@@ -150,15 +150,15 @@ static int _lv_passes_volumes_filter(struct cmd_context *cmd, const struct logic
 				    || str_list_match_list(&cmd->tags,
 							   &lv->vg->tags, NULL))
 					    return 1;
-				else
-					continue;
+
+				continue;
 			}
 			/* If supplied tag matches LV or VG tag, activate */
 			if (str_list_match_item(&lv->tags, str) ||
 			    str_list_match_item(&lv->vg->tags, str))
 				return 1;
-			else
-				continue;
+
+			continue;
 		}

 		/* If supplied name is vgname[/lvname] */
@@ -323,12 +323,6 @@ int lvs_in_vg_opened(const struct volume_group *vg)
 {
 	return 0;
 }
-/******
-int lv_suspend(struct cmd_context *cmd, const char *lvid_s)
-{
-	return 1;
-}
-*******/
 int lv_suspend_if_active(struct cmd_context *cmd, const char *lvid_s, unsigned origin_only, unsigned exclusive,
 			 const struct logical_volume *lv, const struct logical_volume *lv_pre)
 {
@@ -781,7 +775,8 @@ int lv_info_with_seg_status(struct cmd_context *cmd,
 	if (lv_is_used_cache_pool(lv)) {
 		/* INFO is not set as cache-pool cannot be active.
 		 * STATUS is collected from cache LV */
-		lv_seg = get_only_segment_using_this_lv(lv);
+		if (!(lv_seg = get_only_segment_using_this_lv(lv)))
+			return_0;
 		(void) _lv_info(cmd, lv_seg->lv, 1, NULL, lv_seg, &status->seg_status, 0, 0);
 		return 1;
 	}
@@ -796,14 +791,18 @@ int lv_info_with_seg_status(struct cmd_context *cmd,
 				status->info.exists = 0; /* So pool LV is not active */
 		}
 		return 1;
-	} else if (lv_is_external_origin(lv)) {
+	}
+
+	if (lv_is_external_origin(lv)) {
 		if (!_lv_info(cmd, lv, 0, &status->info, NULL, NULL,
 			      with_open_count, with_read_ahead))
 			return_0;

 		(void) _lv_info(cmd, lv, 1, NULL, lv_seg, &status->seg_status, 0, 0);
 		return 1;
-	} else if (lv_is_origin(lv)) {
+	}
+
+	if (lv_is_origin(lv)) {
 		/* Query segment status for 'layered' (-real) device most of the time,
 		 * only for merging snapshot, query its progress.
 		 * TODO: single LV may need couple status to be exposed at once....
@@ -820,7 +819,9 @@ int lv_info_with_seg_status(struct cmd_context *cmd,
 			/* Grab STATUS from layered -real */
 			(void) _lv_info(cmd, lv, 1, NULL, lv_seg, &status->seg_status, 0, 0);
 		return 1;
-	} else if (lv_is_cow(lv)) {
+	}
+
+	if (lv_is_cow(lv)) {
 		if (lv_is_merging_cow(lv)) {
 			olv = origin_from_cow(lv);

@@ -835,7 +836,6 @@ int lv_info_with_seg_status(struct cmd_context *cmd,
 				 * When merge is in progress, query merging origin LV instead.
 				 * COW volume is already mapped as error target in this case.
 				 */
-				status->lv = olv;
 				return 1;
 			}

@@ -1701,7 +1701,7 @@ static char *_build_target_uuid(struct cmd_context *cmd, const struct logical_vo

 	if (lv_is_thin_pool(lv))
 		layer = "tpool"; /* Monitor "tpool" for the "thin pool". */
-	else if (lv_is_origin(lv))
+	else if (lv_is_origin(lv) || lv_is_external_origin(lv))
 		layer = "real"; /* Monitor "real" for "snapshot-origin". */
 	else
 		layer = NULL;
@@ -1849,12 +1849,15 @@ int monitor_dev_for_events(struct cmd_context *cmd, const struct logical_volume
 	 *  However in case command would have crashed, such LV is
 	 *  left unmonitored and may potentially require dmeventd.
 	 */
-	if ((lv_is_cache_pool_data(lv) || lv_is_cache_pool_metadata(lv)) &&
-	    !lv_is_used_cache_pool((find_pool_seg(first_seg(lv))->lv))) {
-		log_debug_activation("Skipping %smonitor of %s.%s",
-				     (monitor) ? "" : "un", display_lvname(lv),
-				     (monitor) ? " Cache pool activation for clearing only." : "");
-		return 1;
+	if (lv_is_cache_pool_data(lv) || lv_is_cache_pool_metadata(lv)) {
+		if (!(seg = find_pool_seg(first_seg(lv))))
+			return_0;
+		if (!lv_is_used_cache_pool(seg->lv)) {
+			log_debug_activation("Skipping %smonitor of %s.%s",
+					     (monitor) ? "" : "un", display_lvname(lv),
+					     (monitor) ? " Cache pool activation for clearing only." : "");
+			return 1;
+		}
 	}

 	/*
@@ -1940,6 +1943,13 @@ int monitor_dev_for_events(struct cmd_context *cmd, const struct logical_volume
 			r = 0;
 		}

+		if (seg->external_lv &&
+		    !monitor_dev_for_events(cmd, seg->external_lv,
+					    (!monitor) ? laopts : NULL, monitor)) {
+			stack;
+			r = 0;
+		}
+
 		if (seg->metadata_lv &&
 		    !monitor_dev_for_events(cmd, seg->metadata_lv, NULL, monitor)) {
 			stack;
@@ -2073,12 +2083,16 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 	const struct logical_volume *pvmove_lv = NULL;
 	const struct logical_volume *lv_to_free = NULL;
 	const struct logical_volume *lv_pre_to_free = NULL;
-	struct logical_volume *lv_pre_tmp;
+	struct logical_volume *lv_pre_tmp, *lv_tmp;
 	struct seg_list *sl;
 	struct lv_segment *snap_seg;
 	struct lvinfo info;
 	int r = 0, lockfs = 0, flush_required = 0;
 	struct detached_lv_data detached;
+	struct dm_pool *mem = NULL;
+	struct dm_list suspend_lvs;
+	struct lv_list *lvl;
+	int found;

 	if (!activation())
 		return 1;
@@ -2116,9 +2130,6 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 		goto out;
 	}

-	if (!lv_read_replicator_vgs(lv))
-		goto_out;
-
 	lv_calculate_readahead(lv, NULL);

 	/*
@@ -2148,6 +2159,12 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 		}
 		if (!_lv_preload(lv_pre_tmp, laopts, &flush_required))
 			goto_out;
+
+		/* Suspending 1st. LV above PVMOVE suspends whole tree */
+		dm_list_iterate_items(sl, &pvmove_lv->segs_using_this_lv) {
+			lv = sl->seg->lv;
+			break;
+		}
 	} else {
 		if (!_lv_preload(lv_pre, laopts, &flush_required))
 			/* FIXME Revert preloading */
@@ -2185,7 +2202,7 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 	 * NOTE: Mirror repair requires noflush for proper repair!
 	 * TODO: Relax this limiting condition further */
 	if (!flush_required &&
-	    (lv_is_pvmove(lv) ||
+	    (lv_is_pvmove(lv) || pvmove_lv ||
 	     (!lv_is_mirror(lv) && !lv_is_thin_pool(lv) && !lv_is_thin_volume(lv)))) {
 		log_debug("Requiring flush for LV %s.", display_lvname(lv));
 		flush_required = 1;
@@ -2195,10 +2212,6 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 		/* FIXME Consider aborting here */
 		stack;

-	critical_section_inc(cmd, "suspending");
-	if (pvmove_lv)
-		critical_section_inc(cmd, "suspending pvmove LV");
-
 	if (!laopts->origin_only &&
 	    (lv_is_origin(lv_pre) || lv_is_cow(lv_pre)))
 		lockfs = 1;
@@ -2210,40 +2223,68 @@ static int _lv_suspend(struct cmd_context *cmd, const char *lvid_s,
 	if (laopts->origin_only && lv_is_thin_volume(lv) && lv_is_thin_volume(lv_pre))
 		lockfs = 1;

-	/*
-	 * Suspending an LV directly above a PVMOVE LV also
- 	 * suspends other LVs using that same PVMOVE LV.
-	 * FIXME Remove this and delay the 'clear node' until
- 	 * after the code knows whether there's a different
- 	 * inactive table to load or not instead so lv_suspend
- 	 * can be called separately for each LV safely.
- 	 */
-	if ((lv_pre->vg->status & PRECOMMITTED) &&
-	    lv_is_locked(lv_pre) && find_pvmove_lv_in_lv(lv_pre)) {
-		if (!_lv_suspend_lv(lv_pre, laopts, lockfs, flush_required)) {
-			critical_section_dec(cmd, "failed precommitted suspend");
-			if (pvmove_lv)
-				critical_section_dec(cmd, "failed precommitted suspend (pvmove)");
+	critical_section_inc(cmd, "suspending");
+
+	if (!lv_is_locked(lv) && lv_is_locked(lv_pre) &&
+	    (pvmove_lv = find_pvmove_lv_in_lv(lv_pre))) {
+		/*
+		 * When starting PVMOVE, suspend participating LVs first
+		 * with committed metadata by looking at precommited pvmove list.
+		 * In committed metadata these LVs are not connected in any way.
+		 *
+		 * TODO: prepare list of LVs needed to be suspended and pass them
+		 *       via 'struct laopts' directly to _lv_suspend_lv() and handle this
+		 *       with a single 'dmtree' call.
+		 */
+		if (!(mem = dm_pool_create("suspend_lvs", 128)))
 			goto_out;
+
+		/* Prepare list of all LVs for suspend ahead */
+		dm_list_init(&suspend_lvs);
+		dm_list_iterate_items(sl, &pvmove_lv->segs_using_this_lv) {
+			lv_tmp = sl->seg->lv;
+			if (lv_is_cow(lv_tmp))
+				/* Never suspend COW, always has to be origin */
+				lv_tmp = origin_from_cow(lv_tmp);
+			found = 0;
+			dm_list_iterate_items(lvl, &suspend_lvs)
+				if (strcmp(lvl->lv->name, lv_tmp->name) == 0) {
+					found = 1;
+					break;
+				}
+			if (found)
+				continue; /* LV is already in the list */
+			if (!(lvl = dm_pool_alloc(mem, sizeof(*lvl)))) {
+				log_error("lv_list alloc failed.");
+				goto out;
+			}
+			/* Look for precommitted LV name in commmitted VG */
+			if (!(lvl->lv = find_lv(lv->vg, lv_tmp->name))) {
+				log_error(INTERNAL_ERROR "LV %s missing from preload metadata.",
+					  display_lvname(lv_tmp));
+				goto out;
+			}
+			dm_list_add(&suspend_lvs, &lvl->list);
 		}
-	} else {
-		/* Normal suspend */
+		dm_list_iterate_items(lvl, &suspend_lvs)
+			if (!_lv_suspend_lv(lvl->lv, laopts, lockfs, 1)) {
+				critical_section_dec(cmd, "failed suspend");
+				goto_out; /* FIXME: resume on recovery path? */
+			}
+	} else  /* Standard suspend */
 		if (!_lv_suspend_lv(lv, laopts, lockfs, flush_required)) {
 			critical_section_dec(cmd, "failed suspend");
-			if (pvmove_lv)
-				critical_section_dec(cmd, "failed suspend (pvmove)");
 			goto_out;
 		}
-	}

 	r = 1;
 out:
+	if (mem)
+		dm_pool_destroy(mem);
 	if (lv_pre_to_free)
 		release_vg(lv_pre_to_free->vg);
-	if (lv_to_free) {
-		lv_release_replicator_vgs(lv_to_free);
+	if (lv_to_free)
 		release_vg(lv_to_free->vg);
-	}

 	return r;
 }
@@ -2265,12 +2306,29 @@ int lv_suspend_if_active(struct cmd_context *cmd, const char *lvid_s, unsigned o
 	return _lv_suspend(cmd, lvid_s, &laopts, 0, lv, lv_pre);
 }

+static int _check_suspended_lv(struct logical_volume *lv, void *data)
+{
+	struct lvinfo info;
+
+	if (lv_info(lv->vg->cmd, lv, 0, &info, 0, 0) && info.exists && info.suspended) {
+		log_debug("Found suspended LV %s in critical section().", display_lvname(lv));
+		return 0; /* There is suspended subLV in the tree */
+	}
+
+	if (lv_layer(lv) && lv_info(lv->vg->cmd, lv, 1, &info, 0, 0) && info.exists && info.suspended) {
+		log_debug("Found suspended layered LV %s in critical section().", display_lvname(lv));
+		return 0; /* There is suspended subLV in the tree */
+	}
+
+	return 1;
+}

 static int _lv_resume(struct cmd_context *cmd, const char *lvid_s,
 		      struct lv_activate_opts *laopts, int error_if_not_active,
 	              const struct logical_volume *lv)
 {
 	const struct logical_volume *lv_to_free = NULL;
+	struct dm_list *snh;
 	struct lvinfo info;
 	int r = 0;

@@ -2304,12 +2362,28 @@ static int _lv_resume(struct cmd_context *cmd, const char *lvid_s,
 	if (!info.exists || !info.suspended) {
 		if (error_if_not_active)
 			goto_out;
-		r = 1;
-		if (!info.suspended)
-			critical_section_dec(cmd, "already resumed");
-		goto out;
-	}

+		/* ATM only thin-pool with origin-only suspend does not really suspend anything
+		 * it's used only for message passing to thin-pool */
+		if (laopts->origin_only && lv_is_thin_pool(lv))
+			critical_section_dec(cmd, "resumed");
+
+		if (!info.suspended && critical_section()) {
+			/* Validation check if any subLV is suspended */
+			if (!laopts->origin_only && lv_is_origin(lv)) {
+				/* Check all snapshots for this origin LV */
+				dm_list_iterate(snh, &lv->snapshot_segs)
+					if (!_check_suspended_lv(dm_list_struct_base(snh, struct lv_segment, origin_list)->cow, NULL))
+						goto needs_resume; /* Found suspended snapshot */
+			}
+			if ((r = for_each_sub_lv((struct logical_volume *)lv, &_check_suspended_lv, NULL)))
+				goto out; /* Nothing was found suspended */
+		} else {
+			r = 1;
+			goto out;
+		}
+	}
+needs_resume:
 	laopts->read_only = _passes_readonly_filter(cmd, lv);
 	laopts->resuming = 1;

@@ -2427,14 +2501,21 @@ int lv_deactivate(struct cmd_context *cmd, const char *lvid_s, const struct logi
 			goto_out;
 	}

-	if (!lv_read_replicator_vgs(lv))
-		goto_out;
-
 	if (!monitor_dev_for_events(cmd, lv, &laopts, 0))
 		stack;

 	critical_section_inc(cmd, "deactivating");
 	r = _lv_deactivate(lv);
+
+	/*
+	 * Remove any transiently activated error
+	 * devices which arean't used any more.
+	 */
+	if (r && lv_is_raid(lv) && !lv_deactivate_any_missing_subdevs(lv)) {
+		log_error("Failed to remove temporary SubLVs from %s",
+			  display_lvname(lv));
+		r = 0;
+	}
 	critical_section_dec(cmd, "deactivated");

 	if (!lv_info(cmd, lv, 0, &info, 0, 0) || info.exists) {
@@ -2444,10 +2525,8 @@ int lv_deactivate(struct cmd_context *cmd, const char *lvid_s, const struct logi
 		r = 0;
 	}
 out:
-	if (lv_to_free) {
-		lv_release_replicator_vgs(lv_to_free);
+	if (lv_to_free)
 		release_vg(lv_to_free->vg);
-	}

 	return r;
 }
@@ -2495,6 +2574,15 @@ static int _lv_activate(struct cmd_context *cmd, const char *lvid_s,
 	if (!lv && !(lv_to_free = lv = lv_from_lvid(cmd, lvid_s, 0)))
 		goto out;

+	if (!laopts->exclusive &&
+	    (lv_is_origin(lv) ||
+	     seg_only_exclusive(first_seg(lv))))  {
+		log_error(INTERNAL_ERROR "Trying non-exlusive activation of %s with "
+			  "a volume type %s requiring exclusive activation.",
+			  display_lvname(lv), lvseg_name(first_seg(lv)));
+		return 0;
+	}
+
 	if (filter && !_passes_activation_filter(cmd, lv)) {
 		log_verbose("Not activating %s since it does not pass "
 			    "activation filter.", display_lvname(lv));
@@ -2562,9 +2650,6 @@ static int _lv_activate(struct cmd_context *cmd, const char *lvid_s,
 		goto out;
 	}

-	if (!lv_read_replicator_vgs(lv))
-		goto_out;
-
 	lv_calculate_readahead(lv, NULL);

 	critical_section_inc(cmd, "activating");
@@ -2576,10 +2661,8 @@ static int _lv_activate(struct cmd_context *cmd, const char *lvid_s,
 		stack;

 out:
-	if (lv_to_free) {
-		lv_release_replicator_vgs(lv_to_free);
+	if (lv_to_free)
 		release_vg(lv_to_free->vg);
-	}

 	return r;
 }
@@ -2671,10 +2754,8 @@ static int _lv_remove_any_missing_subdevs(struct logical_volume *lv)
 		struct lv_segment *seg;

 		dm_list_iterate_items(seg, &lv->segments) {
-			if (seg->area_count != 1)
-				return_0;
 			if (dm_snprintf(name, sizeof(name), "%s-%s-missing_%u_0", seg->lv->vg->name, seg->lv->name, seg_no) < 0)
-				return 0;
+				return_0;
 			if (!_remove_dm_dev_by_name(name))
 				return 0;

--- a/lib/activate/activate.h
+++ b/lib/activate/activate.h
@@ -174,7 +174,7 @@ int lv_raid_dev_health(const struct logical_volume *lv, char **dev_health);
 int lv_raid_mismatch_count(const struct logical_volume *lv, uint64_t *cnt);
 int lv_raid_sync_action(const struct logical_volume *lv, char **sync_action);
 int lv_raid_message(const struct logical_volume *lv, const char *msg);
-int lv_cache_status(const struct logical_volume *lv,
+int lv_cache_status(const struct logical_volume *cache_lv,
 		    struct lv_status_cache **status);
 int lv_thin_pool_percent(const struct logical_volume *lv, int metadata,
 			 dm_percent_t *percent);
@@ -202,12 +202,12 @@ int lv_has_target_type(struct dm_pool *mem, const struct logical_volume *lv,
 		       const char *layer, const char *target_type);

 int monitor_dev_for_events(struct cmd_context *cmd, const struct logical_volume *lv,
-			   const struct lv_activate_opts *laopts, int do_reg);
+			   const struct lv_activate_opts *laopts, int monitor);

 #ifdef DMEVENTD
 #  include "libdevmapper-event.h"
 char *get_monitor_dso_path(struct cmd_context *cmd, const char *libpath);
-int target_registered_with_dmeventd(struct cmd_context *cmd, const char *libpath,
+int target_registered_with_dmeventd(struct cmd_context *cmd, const char *dso,
 				    const struct logical_volume *lv, int *pending);
 int target_register_events(struct cmd_context *cmd, const char *dso, const struct logical_volume *lv,
 			    int evmask __attribute__((unused)), int set, int timeout);
--- a/lib/activate/dev_manager.c
+++ b/lib/activate/dev_manager.c
@@ -260,6 +260,11 @@ static int _info_run(const char *dlid, struct dm_info *dminfo,
 		start *= seg_status->seg->le;
 		length *= _seg_len(seg_status->seg);

+		/* Uses max DM_THIN_MAX_METADATA_SIZE sectors for metadata device */
+		if (lv_is_thin_pool_metadata(seg_status->seg->lv) &&
+		    (length > DM_THIN_MAX_METADATA_SIZE))
+			length = DM_THIN_MAX_METADATA_SIZE;
+
 		do {
 			target = dm_get_next_target(dmt, target, &target_start,
 						    &target_length, &target_name, &target_params);
@@ -270,7 +275,8 @@ static int _info_run(const char *dlid, struct dm_info *dminfo,
 			target_params = NULL; /* Marking this target_params unusable */
 		} while (target);

-		if (!_get_segment_status_from_target_params(target_name, target_params, seg_status))
+		if (!target_name ||
+		    !_get_segment_status_from_target_params(target_name, target_params, seg_status))
 			stack;
 	}

@@ -1032,7 +1038,8 @@ static int _percent_run(struct dev_manager *dm, const char *name,
 			goto_out;
 	}

-	log_debug_activation("LV percent: %.2f", dm_percent_to_float(*overall_percent));
+	log_debug_activation("LV percent: %s",
+			     display_percent(dm->cmd, *overall_percent));
 	r = 1;

      out:
@@ -1049,10 +1056,11 @@ static int _percent(struct dev_manager *dm, const char *name, const char *dlid,
 		if (_percent_run(dm, NULL, dlid, target_type, wait, lv, percent,
 				 event_nr, fail_if_percent_unsupported))
 			return 1;
-		else if (_original_uuid_format_check_required(dm->cmd) &&
-			 _percent_run(dm, NULL, dlid + sizeof(UUID_PREFIX) - 1,
-				      target_type, wait, lv, percent,
-				      event_nr, fail_if_percent_unsupported))
+
+		if (_original_uuid_format_check_required(dm->cmd) &&
+		    _percent_run(dm, NULL, dlid + sizeof(UUID_PREFIX) - 1,
+				 target_type, wait, lv, percent,
+				 event_nr, fail_if_percent_unsupported))
 			return 1;
 	}

@@ -1709,6 +1717,114 @@ static uint16_t _get_udev_flags(struct dev_manager *dm, const struct logical_vol
 	return udev_flags;
 }

+static int _add_lv_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
+			    const struct logical_volume *lv, int origin_only);
+
+static int _check_holder(struct dev_manager *dm, struct dm_tree *dtree,
+			 const struct logical_volume *lv, uint32_t major,
+			 const char *d_name)
+{
+	const char *default_uuid_prefix = dm_uuid_prefix();
+	const size_t default_uuid_prefix_len = strlen(default_uuid_prefix);
+	const char *name;
+	const char *uuid;
+	struct dm_info info;
+	struct dm_task *dmt;
+	struct logical_volume *lv_det;
+	union lvid id;
+	int dev, r = 0;
+
+	errno = 0;
+	dev = strtoll(d_name + 3, NULL, 10);
+	if (errno) {
+		log_error("Failed to parse dm device minor number from %s.", d_name);
+		return 0;
+	}
+
+	if (!(dmt = _setup_task_run(DM_DEVICE_INFO, &info, NULL, NULL, NULL,
+				    major, dev, 0, 0, 0)))
+		return_0;
+
+	if (info.exists) {
+		uuid = dm_task_get_uuid(dmt);
+		name = dm_task_get_name(dmt);
+
+		log_debug_activation("Checking holder of %s  %s (" FMTu32 ":" FMTu32 ") %s.",
+				     display_lvname(lv), uuid, info.major, info.minor,
+				     name);
+
+		/* Skip common uuid prefix */
+		if (!strncmp(default_uuid_prefix, uuid, default_uuid_prefix_len))
+			uuid += default_uuid_prefix_len;
+
+		if (!strncmp(uuid, (char*)&lv->vg->id, sizeof(lv->vg->id)) &&
+		    !dm_tree_find_node_by_uuid(dtree, uuid)) {
+			dm_strncpy((char*)&id, uuid, 2 * sizeof(struct id) + 1);
+
+			/* If UUID is not yet in dtree, look for matching LV */
+			if (!(lv_det = find_lv_in_vg_by_lvid(lv->vg, &id))) {
+				log_error("Cannot find holder with device name %s in VG %s.",
+					  name, lv->vg->name);
+				goto out;
+			}
+
+			if (lv_is_cow(lv_det))
+				lv_det = origin_from_cow(lv_det);
+			log_debug_activation("Found holder %s of %s.",
+					     display_lvname(lv_det),
+					     display_lvname(lv));
+			if (!_add_lv_to_dtree(dm, dtree, lv_det, 0))
+				goto_out;
+		}
+	}
+
+        r = 1;
+out:
+	dm_task_destroy(dmt);
+
+	return r;
+}
+
+/*
+ * Add exiting devices which holds given LV device open.
+ * This is used in case when metadata already do not contain information
+ * i.e. PVMOVE is being finished and final table is going to be resumed.
+ */
+static int _add_holders_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
+				 const struct logical_volume *lv, struct dm_info *info)
+{
+	const char *sysfs_dir = dm_sysfs_dir();
+	char sysfs_path[PATH_MAX];
+	struct dirent *dirent;
+	DIR *d;
+	int r = 0;
+
+	/* Sysfs path of holders */
+	if (dm_snprintf(sysfs_path, sizeof(sysfs_path), "%sblock/dm-" FMTu32
+			"/holders", sysfs_dir, info->minor) < 0) {
+		log_error("sysfs_path dm_snprintf failed.");
+		return 0;
+	}
+
+	if (!(d = opendir(sysfs_path))) {
+		log_sys_error("opendir", sysfs_path);
+		return 0;
+	}
+
+	while ((dirent = readdir(d)))
+		/* Expects minor is added to 'dm-' prefix */
+		if (!strncmp(dirent->d_name, "dm-", 3) &&
+		    !_check_holder(dm, dtree, lv, info->major, dirent->d_name))
+			goto_out;
+
+	r = 1;
+out:
+	if (closedir(d))
+		log_sys_debug("closedir", "holders");
+
+	return r;
+}
+
 static int _add_dev_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 			     const struct logical_volume *lv, const char *layer)
 {
@@ -1763,83 +1879,14 @@ static int _add_dev_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 			return_0;
 	}

-	return 1;
-}
-
-/*
- * Add replicator devices
- *
- * Using _add_dev_to_dtree() directly instead of _add_lv_to_dtree()
- * to avoid extra checks with extensions.
- */
-static int _add_partial_replicator_to_dtree(struct dev_manager *dm,
-					    struct dm_tree *dtree,
-					    const struct logical_volume *lv)
-{
-	struct logical_volume *rlv = first_seg(lv)->replicator;
-	struct replicator_device *rdev;
-	struct replicator_site *rsite;
-	struct dm_tree_node *rep_node, *rdev_node;
-	const char *uuid;
-
-	if (!lv_is_active_replicator_dev(lv)) {
-		if (!_add_dev_to_dtree(dm, dtree, lv->rdevice->lv,
-				      NULL))
+	/*
+	 * Find holders of existing active LV where name starts with 'pvmove',
+	 * but it's not anymore PVMOVE LV and also it's not PVMOVE _mimage
+	 */
+	if (info.exists && !lv_is_pvmove(lv) &&
+	    !strchr(lv->name, '_') && !strncmp(lv->name, "pvmove", 6))
+		if (!_add_holders_to_dtree(dm, dtree, lv, &info))
 			return_0;
-		return 1;
-	}
-
-	/* Add _rlog and replicator device */
-	if (!_add_dev_to_dtree(dm, dtree, first_seg(rlv)->rlog_lv, NULL))
-		return_0;
-
-	if (!_add_dev_to_dtree(dm, dtree, rlv, NULL))
-		return_0;
-
-	if (!(uuid = build_dm_uuid(dm->mem, rlv, NULL)))
-		return_0;
-
-	rep_node = dm_tree_find_node_by_uuid(dtree, uuid);
-
-	/* Add all related devices for replicator */
-	dm_list_iterate_items(rsite, &rlv->rsites)
-		dm_list_iterate_items(rdev, &rsite->rdevices) {
-			if (rsite->state == REPLICATOR_STATE_ACTIVE) {
-				/* Add _rimage LV */
-				if (!_add_dev_to_dtree(dm, dtree, rdev->lv, NULL))
-					return_0;
-
-				/* Add replicator-dev LV, except of the already added one */
-				if ((lv != rdev->replicator_dev->lv) &&
-				    !_add_dev_to_dtree(dm, dtree,
-						       rdev->replicator_dev->lv, NULL))
-					return_0;
-
-				/* If replicator exists - try connect existing heads */
-				if (rep_node) {
-					uuid = build_dm_uuid(dm->mem,
-							     rdev->replicator_dev->lv,
-							     NULL);
-					if (!uuid)
-						return_0;
-
-					rdev_node = dm_tree_find_node_by_uuid(dtree, uuid);
-					if (rdev_node)
-						dm_tree_node_set_presuspend_node(rdev_node,
-										 rep_node);
-				}
-			}
-
-			if (!rdev->rsite->vg_name)
-				continue;
-
-			if (!_add_dev_to_dtree(dm, dtree, rdev->lv, NULL))
-				return_0;
-
-			if (rdev->slog &&
-			    !_add_dev_to_dtree(dm, dtree, rdev->slog, NULL))
-				return_0;
-		}

 	return 1;
 }
@@ -1857,7 +1904,7 @@ struct pool_cb_data {
 static int _pool_callback(struct dm_tree_node *node,
 			  dm_node_callback_t type, void *cb_data)
 {
-	int ret, status, fd;
+	int ret, status = 0, fd;
 	const struct dm_config_node *cn;
 	const struct dm_config_value *cv;
 	const struct pool_cb_data *data = cb_data;
@@ -1865,12 +1912,45 @@ static int _pool_callback(struct dm_tree_node *node,
 	const struct logical_volume *mlv = first_seg(pool_lv)->metadata_lv;
 	long buf[64 / sizeof(long)]; /* buffer for short disk header (64B) */
 	int args = 0;
+	char *mpath;
 	const char *argv[19] = { /* Max supported 15 args */
-		find_config_tree_str_allow_empty(pool_lv->vg->cmd, data->exec, NULL) /* argv[0] */
+		find_config_tree_str_allow_empty(pool_lv->vg->cmd, data->exec, NULL)
 	};

-	if (!*argv[0])
-		return 1; /* Checking disabled */
+	if (!*argv[0]) /* *_check tool is unconfigured/disabled with "" setting */
+		return 1;
+
+	if (!(mpath = lv_dmpath_dup(data->dm->mem, mlv))) {
+		log_error("Failed to build device path for checking pool metadata %s.",
+			  display_lvname(mlv));
+		return 0;
+	}
+
+	if (data->skip_zero) {
+		if ((fd = open(mpath, O_RDONLY)) < 0) {
+			log_sys_error("open", mpath);
+			return 0;
+		}
+		/* let's assume there is no problem to read 64 bytes */
+		if (read(fd, buf, sizeof(buf)) < (int)sizeof(buf)) {
+			log_sys_error("read", mpath);
+			if (close(fd))
+				log_sys_error("close", mpath);
+			return 0;
+		}
+		for (ret = 0; ret < (int) DM_ARRAY_SIZE(buf); ++ret)
+			if (buf[ret])
+				break;
+
+		if (close(fd))
+			log_sys_error("close", mpath);
+
+		if (ret == (int) DM_ARRAY_SIZE(buf)) {
+			log_debug_activation("Metadata checking skipped, detected empty disk header on %s.",
+					     mpath);
+			return 1;
+		}
+	}

 	if (!(cn = find_config_tree_array(mlv->vg->cmd, data->opts, NULL))) {
 		log_error(INTERNAL_ERROR "Unable to find configuration for pool check options.");
@@ -1892,36 +1972,7 @@ static int _pool_callback(struct dm_tree_node *node,
 		return 0;
 	}

-	if (!(argv[++args] = lv_dmpath_dup(data->dm->mem, mlv))) {
-		log_error("Failed to build pool metadata path.");
-		return 0;
-	}
-
-	if (data->skip_zero) {
-		if ((fd = open(argv[args], O_RDONLY)) < 0) {
-			log_sys_error("open", argv[args]);
-			return 0;
-		}
-		/* let's assume there is no problem to read 64 bytes */
-		if (read(fd, buf, sizeof(buf)) < (int)sizeof(buf)) {
-			log_sys_error("read", argv[args]);
-			if (close(fd))
-				log_sys_error("close", argv[args]);
-			return 0;
-		}
-		for (ret = 0; ret < (int) DM_ARRAY_SIZE(buf); ++ret)
-			if (buf[ret])
-				break;
-
-		if (close(fd))
-			log_sys_error("close", argv[args]);
-
-		if (ret == (int) DM_ARRAY_SIZE(buf)) {
-			log_debug_activation("%s skipped, detect empty disk header on %s.",
-					     argv[0], argv[args]);
-			return 1;
-		}
-	}
+	argv[++args] = mpath;

 	if (!(ret = exec_cmd(pool_lv->vg->cmd, (const char * const *)argv,
 			     &status, 0))) {
@@ -2009,6 +2060,10 @@ static int _add_lv_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 	struct lv_segment *seg;
 	struct dm_tree_node *node;
 	const char *uuid;
+	const struct logical_volume *plv;
+
+	if (lv_is_pvmove(lv) && (dm->track_pvmove_deps == 2))
+		return 1; /* Avoid rechecking of already seen pvmove LV */

 	if (lv_is_cache_pool(lv)) {
 		if (!dm_list_empty(&lv->segs_using_this_lv)) {
@@ -2129,11 +2184,14 @@ static int _add_lv_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 		return_0;

 	/* Add any LVs referencing a PVMOVE LV unless told not to. */
-	if (dm->track_pvmove_deps && lv_is_pvmove(lv)) {
-		dm->track_pvmove_deps = 0;
-		dm_list_iterate_items(sl, &lv->segs_using_this_lv)
-			if (!_add_lv_to_dtree(dm, dtree, sl->seg->lv, origin_only))
+	if ((dm->track_pvmove_deps == 1) && lv_is_pvmove(lv)) {
+		dm->track_pvmove_deps = 2; /* Mark as already seen */
+		dm_list_iterate_items(sl, &lv->segs_using_this_lv) {
+			/* If LV is snapshot COW - whole snapshot needs reload */
+			plv = lv_is_cow(sl->seg->lv) ? origin_from_cow(sl->seg->lv) : sl->seg->lv;
+			if (!_add_lv_to_dtree(dm, dtree, plv, 0))
 				return_0;
+		}
 		dm->track_pvmove_deps = 1;
 	}

@@ -2148,11 +2206,6 @@ static int _add_lv_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 			}
 		}

-	/* Adding LV head of replicator adds all other related devs */
-	if (lv_is_replicator_dev(lv) &&
-	    !_add_partial_replicator_to_dtree(dm, dtree, lv))
-		return_0;
-
 	/* Add any LVs used by segments in this LV */
 	dm_list_iterate_items(seg, &lv->segments) {
 		if (seg->external_lv && dm->track_external_lv_deps &&
@@ -2516,64 +2569,6 @@ static int _add_new_lv_to_dtree(struct dev_manager *dm, struct dm_tree *dtree,
 				struct lv_activate_opts *laopts,
 				const char *layer);

-/* Add all replicators' LVs */
-static int _add_replicator_dev_target_to_dtree(struct dev_manager *dm,
-					       struct dm_tree *dtree,
-					       struct lv_segment *seg,
-					       struct lv_activate_opts *laopts)
-{
-	struct replicator_device *rdev;
-	struct replicator_site *rsite;
-
-	/* For inactive replicator add linear mapping */
-	if (!lv_is_active_replicator_dev(seg->lv)) {
-		if (!_add_new_lv_to_dtree(dm, dtree, seg->lv->rdevice->lv, laopts, NULL))
-			return_0;
-		return 1;
-	}
-
-	/* Add rlog and replicator nodes */
-	if (!seg->replicator ||
-	    !first_seg(seg->replicator)->rlog_lv ||
-	    !_add_new_lv_to_dtree(dm, dtree,
-				  first_seg(seg->replicator)->rlog_lv,
-				  laopts, NULL) ||
-	    !_add_new_lv_to_dtree(dm, dtree, seg->replicator, laopts, NULL))
-	    return_0;
-
-	/* Activation of one replicator_dev node activates all other nodes */
-	dm_list_iterate_items(rsite, &seg->replicator->rsites) {
-		dm_list_iterate_items(rdev, &rsite->rdevices) {
-			if (rdev->lv &&
-			    !_add_new_lv_to_dtree(dm, dtree, rdev->lv,
-						  laopts, NULL))
-				return_0;
-
-			if (rdev->slog &&
-			    !_add_new_lv_to_dtree(dm, dtree, rdev->slog,
-						  laopts, NULL))
-				return_0;
-		}
-	}
-	/* Add remaining replicator-dev nodes in the second loop
-	 * to avoid multiple retries for inserting all elements */
-	dm_list_iterate_items(rsite, &seg->replicator->rsites) {
-		if (rsite->state != REPLICATOR_STATE_ACTIVE)
-			continue;
-		dm_list_iterate_items(rdev, &rsite->rdevices) {
-			if (rdev->replicator_dev->lv == seg->lv)
-				continue;
-			if (!rdev->replicator_dev->lv ||
-			    !_add_new_lv_to_dtree(dm, dtree,
-						  rdev->replicator_dev->lv,
-						  laopts, NULL))
-				return_0;
-		}
-	}
-
-	return 1;
-}
-
 static int _add_new_external_lv_to_dtree(struct dev_manager *dm,
 					 struct dm_tree *dtree,
 					 struct logical_volume *external_lv,
@@ -2674,11 +2669,6 @@ static int _add_segment_to_dtree(struct dev_manager *dm,
 				  lv_layer(seg->pool_lv)))
 		return_0;

-	if (seg_is_replicator_dev(seg)) {
-		if (!_add_replicator_dev_target_to_dtree(dm, dtree, seg, laopts))
-			return_0;
-	}
-
 	/* Add any LVs used by this segment */
 	for (s = 0; s < seg->area_count; ++s) {
 		if ((seg_type(seg, s) == AREA_LV) &&
@@ -3148,8 +3138,6 @@ static int _tree_action(struct dev_manager *dm, const struct logical_volume *lv,
 		if (!dm_tree_preload_children(root, dlid, DLID_SIZE))
 			goto_out;

-		//if (action == PRELOAD) { log_debug("SLEEP"); sleep(7); }
-
 		if ((dm_tree_node_size_changed(root) < 0))
 			dm->flush_required = 1;
 		/* Currently keep the code require flush for any
--- a/lib/activate/fs.c
+++ b/lib/activate/fs.c
@@ -186,11 +186,11 @@ static int _mk_link(const char *dev_dir, const char *vg_name,
 			    !stat(lv_path, &buf)) {
 				if (buf_lp.st_rdev == buf.st_rdev)
 					return 1;
-				else
-					log_warn("Symlink %s that should have been "
-						 "created by udev does not have "
-						 "correct target. Falling back to "
-						 "direct link creation", lv_path);
+
+				log_warn("Symlink %s that should have been "
+					 "created by udev does not have "
+					 "correct target. Falling back to "
+					 "direct link creation", lv_path);
 			} else
 				log_warn("Symlink %s that should have been "
 					 "created by udev could not be checked "
@@ -239,7 +239,9 @@ static int _rm_link(const char *dev_dir, const char *vg_name,
 			return 1;
 		log_sys_error("lstat", lv_path);
 		return 0;
-	} else if (dm_udev_get_sync_support() && udev_checking() && check_udev)
+	}
+
+	if (dm_udev_get_sync_support() && udev_checking() && check_udev)
 		log_warn("The link %s should have been removed by udev "
 			 "but it is still present. Falling back to "
 			 "direct link removal.", lv_path);
@@ -478,9 +480,9 @@ int fs_rename_lv(const struct logical_volume *lv, const char *dev,
 			 _fs_op(FS_ADD, lv->vg->cmd->dev_dir, lv->vg->name,
 				lv->name, dev, "", lv->vg->cmd->current_settings.udev_rules));
 	}
-	else 
-		return _fs_op(FS_RENAME, lv->vg->cmd->dev_dir, lv->vg->name, lv->name,
-			      dev, old_lvname, lv->vg->cmd->current_settings.udev_rules);
+
+	return _fs_op(FS_RENAME, lv->vg->cmd->dev_dir, lv->vg->name, lv->name,
+		      dev, old_lvname, lv->vg->cmd->current_settings.udev_rules);
 }

 void fs_unlock(void)
--- a/lib/cache/lvmcache.c
+++ b/lib/cache/lvmcache.c
@@ -141,6 +141,8 @@ void lvmcache_seed_infos_from_lvmetad(struct cmd_context *cmd)
 /* Volume Group metadata cache functions */
 static void _free_cached_vgmetadata(struct lvmcache_vginfo *vginfo)
 {
+	struct lvmcache_info *info;
+
 	if (!vginfo || !vginfo->vgmetadata)
 		return;

@@ -154,7 +156,11 @@ static void _free_cached_vgmetadata(struct lvmcache_vginfo *vginfo)
 		vginfo->cft = NULL;
 	}

-	log_debug_cache("Metadata cache: VG %s wiped.", vginfo->vgname);
+	/* Invalidate any cached device buffers */
+	dm_list_iterate_items(info, &vginfo->infos)
+		devbufs_release(info->dev);
+
+	log_debug_cache("lvmcache: VG %s wiped.", vginfo->vgname);

 	release_vg(vginfo->cached_vg);
 }
@@ -197,7 +203,7 @@ static void _store_metadata(struct volume_group *vg, unsigned precommitted)
 		return;
 	}

-	log_debug_cache("Metadata cache: VG %s (%s) stored (%" PRIsize_t " bytes%s).",
+	log_debug_cache("lvmcache: VG %s (%s) stored (%" PRIsize_t " bytes%s).",
 			vginfo->vgname, uuid, size,
 			precommitted ? ", precommitted" : "");
 }
@@ -289,7 +295,7 @@ void lvmcache_commit_metadata(const char *vgname)
 		return;

 	if (vginfo->precommitted) {
-		log_debug_cache("Precommitted metadata cache: VG %s upgraded to committed.",
+		log_debug_cache("lvmcache: Upgraded pre-committed VG %s metadata to committed.",
 				vginfo->vgname);
 		vginfo->precommitted = 0;
 	}
@@ -542,7 +548,6 @@ const struct format_type *lvmcache_fmt_from_vgname(struct cmd_context *cmd,
 {
 	struct lvmcache_vginfo *vginfo;
 	struct lvmcache_info *info;
-	struct label *label;
 	struct dm_list *devh, *tmp;
 	struct dm_list devs;
 	struct device_list *devl;
@@ -587,7 +592,7 @@ const struct format_type *lvmcache_fmt_from_vgname(struct cmd_context *cmd,

 	dm_list_iterate_safe(devh, tmp, &devs) {
 		devl = dm_list_item(devh, struct device_list);
-		(void) label_read(devl->dev, &label, UINT64_C(0));
+		(void) label_read(devl->dev, NULL, UINT64_C(0));
 		dm_list_del(&devl->list);
 		dm_free(devl);
 	}
@@ -616,7 +621,7 @@ struct lvmcache_vginfo *lvmcache_vginfo_from_vgid(const char *vgid)
 	id[ID_LEN] = '\0';

 	if (!(vginfo = dm_hash_lookup(_vgid_hash, id))) {
-		log_debug_cache("Metadata cache has no info for vgid \"%s\"", id);
+		log_debug_cache("lvmcache has no info for vgid \"%s\"", id);
 		return NULL;
 	}

@@ -770,10 +775,8 @@ char *lvmcache_vgname_from_pvid(struct cmd_context *cmd, const char *pvid)

 static void _rescan_entry(struct lvmcache_info *info)
 {
-	struct label *label;
-
 	if (info->status & CACHE_INVALID)
-		(void) label_read(info->dev, &label, UINT64_C(0));
+		(void) label_read(info->dev, NULL, UINT64_C(0));
 }

 static int _scan_invalid(void)
@@ -1095,17 +1098,31 @@ next:
 	goto next;
 }

+/* Track the number of outstanding label reads */
+/* FIXME Switch to struct and also track failed */
+static void _process_label_data(int failed, unsigned ioflags, void *context, const void *data)
+{
+	int *nr_labels_outstanding = context;
+
+	if (!*nr_labels_outstanding) {
+		log_error(INTERNAL_ERROR "_process_label_data called too many times");
+		return;
+	}
+
+	(*nr_labels_outstanding)--;
+}
+
 int lvmcache_label_scan(struct cmd_context *cmd)
 {
 	struct dm_list del_cache_devs;
 	struct dm_list add_cache_devs;
 	struct lvmcache_info *info;
 	struct device_list *devl;
-	struct label *label;
 	struct dev_iter *iter;
 	struct device *dev;
 	struct format_type *fmt;
 	int dev_count = 0;
+	int nr_labels_outstanding = 0;

 	int r = 0;

@@ -1144,13 +1161,22 @@ int lvmcache_label_scan(struct cmd_context *cmd)
 	_destroy_duplicate_device_list(&_found_duplicate_devs);

 	while ((dev = dev_iter_get(iter))) {
-		(void) label_read(dev, &label, UINT64_C(0));
+		log_debug_io("Scanning device %s", dev_name(dev));
+		nr_labels_outstanding++;
+		if (!label_read_callback(dev, UINT64_C(0), AIO_SUPPORTED_CODE_PATH, _process_label_data, &nr_labels_outstanding))
+			nr_labels_outstanding--;
 		dev_count++;
 	}

 	dev_iter_destroy(iter);

-	log_very_verbose("Scanned %d device labels", dev_count);
+	while (nr_labels_outstanding) {
+		log_very_verbose("Scanned %d device labels (%d outstanding)", dev_count, nr_labels_outstanding);
+		if (!dev_async_getevents())
+			return_0;
+	}
+
+	log_very_verbose("Scanned %d device labels (%d outstanding)", dev_count, nr_labels_outstanding);

 	/*
 	 * _choose_preferred_devs() returns:
@@ -1184,7 +1210,7 @@ int lvmcache_label_scan(struct cmd_context *cmd)

 		dm_list_iterate_items(devl, &add_cache_devs) {
 			log_debug_cache("Rescan preferred device %s for lvmcache", dev_name(devl->dev));
-			(void) label_read(devl->dev, &label, UINT64_C(0));
+			(void) label_read(devl->dev, NULL, UINT64_C(0));
 		}

 		dm_list_splice(&_unused_duplicate_devs, &del_cache_devs);
@@ -1204,7 +1230,7 @@ int lvmcache_label_scan(struct cmd_context *cmd)
 	 */
 	if (_force_label_scan && cmd->is_long_lived &&
 	    cmd->dump_filter && cmd->full_filter && cmd->full_filter->dump &&
-	    !cmd->full_filter->dump(cmd->full_filter, 0))
+	    !cmd->full_filter->dump(cmd->full_filter, cmd->mem, 0))
 		stack;

 	r = 1;
@@ -1505,7 +1531,6 @@ const char *lvmcache_pvid_from_devname(struct cmd_context *cmd,
 				       const char *devname)
 {
 	struct device *dev;
-	struct label *label;

 	if (!(dev = dev_cache_get(devname, cmd->filter))) {
 		log_error("%s: Couldn't find device.  Check your filters?",
@@ -1513,7 +1538,7 @@ const char *lvmcache_pvid_from_devname(struct cmd_context *cmd,
 		return NULL;
 	}

-	if (!(label_read(dev, &label, UINT64_C(0))))
+	if (!(label_read(dev, NULL, UINT64_C(0))))
 		return NULL;

 	return dev->pvid;
@@ -1600,8 +1625,6 @@ void lvmcache_del(struct lvmcache_info *info)
 	info->label->labeller->ops->destroy_label(info->label->labeller,
 						  info->label);
 	dm_free(info);
-
-	return;
 }

 /*
@@ -1872,7 +1895,7 @@ static int _lvmcache_update_vgname(struct lvmcache_info *info,
 				vginfo->vgid[0] ? vginfo->vgid : "",
 				vginfo->vgid[0] ? ")" : "", mdabuf);
 	} else
-		log_debug_cache("lvmcache initialised VG %s.", vgname);
+		log_debug_cache("lvmcache: Initialised VG %s.", vgname);

 	return 1;
 }
@@ -1898,8 +1921,7 @@ static int _lvmcache_update_vgstatus(struct lvmcache_info *info, uint32_t vgstat
 						   info->vginfo->creation_host))
 		goto set_lock_type;

-	if (info->vginfo->creation_host)
-		dm_free(info->vginfo->creation_host);
+	dm_free(info->vginfo->creation_host);

 	if (!(info->vginfo->creation_host = dm_strdup(creation_host))) {
 		log_error("cache creation host alloc failed for %s.",
@@ -1918,8 +1940,7 @@ set_lock_type:
 	if (info->vginfo->lock_type && !strcmp(lock_type, info->vginfo->lock_type))
 		goto set_system_id;

-	if (info->vginfo->lock_type)
-		dm_free(info->vginfo->lock_type);
+	dm_free(info->vginfo->lock_type);

 	if (!(info->vginfo->lock_type = dm_strdup(lock_type))) {
 		log_error("cache lock_type alloc failed for %s", lock_type);
@@ -1937,8 +1958,7 @@ set_system_id:
 	if (info->vginfo->system_id && !strcmp(system_id, info->vginfo->system_id))
 		goto out;

-	if (info->vginfo->system_id)
-		dm_free(info->vginfo->system_id);
+	dm_free(info->vginfo->system_id);

 	if (!(info->vginfo->system_id = dm_strdup(system_id))) {
 		log_error("cache system_id alloc failed for %s", system_id);
@@ -1984,7 +2004,7 @@ int lvmcache_add_orphan_vginfo(const char *vgname, struct format_type *fmt)
 	return _lvmcache_update_vgname(NULL, vgname, vgname, 0, "", fmt);
 }

-int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vgsummary *vgsummary)
+int lvmcache_update_vgname_and_id(struct lvmcache_info *info, const struct lvmcache_vgsummary *vgsummary)
 {
 	const char *vgname = vgsummary->vgname;
 	const char *vgid = (char *)&vgsummary->vgid;
--- a/lib/cache/lvmcache.h
+++ b/lib/cache/lvmcache.h
@@ -85,7 +85,7 @@ void lvmcache_del(struct lvmcache_info *info);

 /* Update things */
 int lvmcache_update_vgname_and_id(struct lvmcache_info *info,
-				  struct lvmcache_vgsummary *vgsummary);
+				  const struct lvmcache_vgsummary *vgsummary);
 int lvmcache_update_vg(struct volume_group *vg, unsigned precommitted);

 void lvmcache_lock_vgname(const char *vgname, int read_only);
@@ -181,7 +181,7 @@ int lvmcache_foreach_ba(struct lvmcache_info *info,
 			int (*fun)(struct disk_locn *, void *),
 			void *baton);

-int lvmcache_foreach_pv(struct lvmcache_vginfo *vg,
+int lvmcache_foreach_pv(struct lvmcache_vginfo *vginfo,
 			int (*fun)(struct lvmcache_info *, void *), void * baton);

 uint64_t lvmcache_device_size(struct lvmcache_info *info);
--- a/lib/cache/lvmetad.c
+++ b/lib/cache/lvmetad.c
@@ -39,7 +39,7 @@ static int64_t _lvmetad_update_timeout;

 static int _found_lvm1_metadata = 0;

-static struct volume_group *lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg);
+static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg);

 static uint64_t _monotonic_seconds(void)
 {
@@ -66,7 +66,7 @@ static int _log_debug_inequality(const char *name, struct dm_config_node *a, str
 					log_debug_lvmetad("VG %s metadata inequality at %s / %s: %s / %s",
 							  name, a->key, b->key, av->v.str, bv->v.str);
 				else if (a->v->type == DM_CFG_INT && b->v->type == DM_CFG_INT)
-					log_debug_lvmetad("VG %s metadata inequality at %s / %s: " FMTi64 " / " FMTi64,
+					log_debug_lvmetad("VG %s metadata inequality at %s / %s: " FMTd64 " / " FMTd64,
 							  name, a->key, b->key, av->v.i, bv->v.i);
 				else
 					log_debug_lvmetad("VG %s metadata inequality at %s / %s: type %d / type %d",
@@ -145,13 +145,14 @@ int lvmetad_connect(struct cmd_context *cmd)
 		_lvmetad_use = 1;
 		_lvmetad_cmd = cmd;
 		return 1;
-	} else {
-		log_debug_lvmetad("Failed to connect to lvmetad: %s", strerror(_lvmetad.error));
-		_lvmetad_connected = 0;
-		_lvmetad_use = 0;
-		_lvmetad_cmd = NULL;
-		return 0;
 	}
+
+	log_debug_lvmetad("Failed to connect to lvmetad: %s", strerror(_lvmetad.error));
+	_lvmetad_connected = 0;
+	_lvmetad_use = 0;
+	_lvmetad_cmd = NULL;
+
+	return 0;
 }

 int lvmetad_used(void)
@@ -550,6 +551,7 @@ static int _token_update(int *replaced_update)
 	daemon_reply reply;
 	const char *token_expected;
 	const char *prev_token;
+	const char *reply_str;
 	int update_pid;
 	int ending_our_update;

@@ -566,13 +568,14 @@ static int _token_update(int *replaced_update)
 	}

 	update_pid = (int)daemon_reply_int(reply, "update_pid", 0);
+	reply_str = daemon_reply_str(reply, "response", "");

 	/*
 	 * A mismatch can only happen when this command attempts to set the
 	 * token to filter:<hash> at the end of its update, but the update has
 	 * been preempted in lvmetad by a new one (from update_pid).
 	 */
-	if (!strcmp(daemon_reply_str(reply, "response", ""), "token_mismatch")) {
+	if (!strcmp(reply_str, "token_mismatch")) {
 		token_expected = daemon_reply_str(reply, "expected", "");

 		ending_our_update = strcmp(_lvmetad_token, LVMETAD_TOKEN_UPDATE_IN_PROGRESS);
@@ -598,7 +601,7 @@ static int _token_update(int *replaced_update)
 		return 0;
 	}

-	if (strcmp(daemon_reply_str(reply, "response", ""), "OK")) {
+	if (strcmp(reply_str, "OK")) {
 		log_error("Failed response from lvmetad for token update.");
 		daemon_reply_destroy(reply);
 		return 0;
@@ -625,6 +628,7 @@ static int _lvmetad_handle_reply(daemon_reply reply, const char *id, const char
 {
 	const char *token_expected;
 	const char *action;
+	const char *reply_str;
 	int action_modifies = 0;
 	int daemon_in_update;
 	int we_are_in_update;
@@ -662,15 +666,15 @@ static int _lvmetad_handle_reply(daemon_reply reply, const char *id, const char
 	}

 	if (reply.error) {
-		log_warn("WARNING: lvmetad cannot be used due to error: %s", strerror(reply.error));
+		log_error("lvmetad cannot be used due to error: %s", strerror(reply.error));
 		goto fail;
 	}

 	/*
 	 * Errors related to token mismatch.
 	 */
-
-	if (!strcmp(daemon_reply_str(reply, "response", ""), "token_mismatch")) {
+	reply_str = daemon_reply_str(reply, "response", "");
+	if (!strcmp(reply_str, "token_mismatch")) {

 		token_expected = daemon_reply_str(reply, "expected", "");
 		update_pid = (int)daemon_reply_int(reply, "update_pid", 0);
@@ -768,14 +772,14 @@ static int _lvmetad_handle_reply(daemon_reply reply, const char *id, const char
 	 */

 	/* All OK? */
-	if (!strcmp(daemon_reply_str(reply, "response", ""), "OK")) {
+	if (!strcmp(reply_str, "OK")) {
 		if (found)
 			*found = 1;
 		return 1;
 	}

 	/* Unknown device permitted? */
-	if (found && !strcmp(daemon_reply_str(reply, "response", ""), "unknown")) {
+	if (found && !strcmp(reply_str, "unknown")) {
 		log_very_verbose("Request to %s %s%sin lvmetad did not find any matching object.",
 				 action, object, *object ? " " : "");
 		*found = 0;
@@ -783,7 +787,7 @@ static int _lvmetad_handle_reply(daemon_reply reply, const char *id, const char
 	}

 	/* Multiple VGs with the same name were found. */
-	if (found && !strcmp(daemon_reply_str(reply, "response", ""), "multiple")) {
+	if (found && !strcmp(reply_str, "multiple")) {
 		log_very_verbose("Request to %s %s%sin lvmetad found multiple matching objects.",
 				 action, object, *object ? " " : "");
 		if (found)
@@ -1089,7 +1093,7 @@ struct volume_group *lvmetad_vg_lookup(struct cmd_context *cmd, const char *vgna
 		 * invalidated the cached vg.
 		 */
 		if (rescan) {
-			if (!(vg2 = lvmetad_pvscan_vg(cmd, vg))) {
+			if (!(vg2 = _lvmetad_pvscan_vg(cmd, vg))) {
 				log_debug_lvmetad("VG %s from lvmetad not found during rescan.", vgname);
 				fid = NULL;
 				release_vg(vg);
@@ -1280,7 +1284,7 @@ int lvmetad_vg_update_finish(struct volume_group *vg)
 		if (pvl->pv->dev && !lvmetad_pv_found(vg->cmd, &pvl->pv->id, pvl->pv->dev,
 						      vgu->fid ? vgu->fid->fmt : pvl->pv->fmt,
 						      pvl->pv->label_sector, NULL, NULL, NULL))
-			return 0;
+			return_0;
 	}

 	vg->lvmetad_update_pending = 0;
@@ -1515,7 +1519,7 @@ int lvmetad_vg_list_to_lvmcache(struct cmd_context *cmd)
 	return 1;
 }

-struct _extract_dl_baton {
+struct extract_dl_baton {
 	int i;
 	struct dm_config_tree *cft;
 	struct dm_config_node *pre_sib;
@@ -1523,7 +1527,7 @@ struct _extract_dl_baton {

 static int _extract_mda(struct metadata_area *mda, void *baton)
 {
-	struct _extract_dl_baton *b = baton;
+	struct extract_dl_baton *b = baton;
 	struct dm_config_node *cn;
 	char id[32];

@@ -1544,7 +1548,7 @@ static int _extract_mda(struct metadata_area *mda, void *baton)

 static int _extract_disk_location(const char *name, struct disk_locn *dl, void *baton)
 {
-	struct _extract_dl_baton *b = baton;
+	struct extract_dl_baton *b = baton;
 	struct dm_config_node *cn;
 	char id[32];

@@ -1579,7 +1583,7 @@ static int _extract_ba(struct disk_locn *ba, void *baton)
 static int _extract_mdas(struct lvmcache_info *info, struct dm_config_tree *cft,
 			 struct dm_config_node *pre_sib)
 {
-	struct _extract_dl_baton baton = { .cft = cft };
+	struct extract_dl_baton baton = { .cft = cft };

 	if (!lvmcache_foreach_mda(info, &_extract_mda, &baton))
 		return 0;
@@ -1606,7 +1610,7 @@ int lvmetad_pv_found(struct cmd_context *cmd, const struct id *pvid, struct devi
 	struct dm_config_tree *pvmeta, *vgmeta;
 	const char *status = NULL, *vgname = NULL;
 	int64_t changed = 0;
-	int result;
+	int result, seqno_after;

 	if (!lvmetad_used() || test_mode())
 		return 1;
@@ -1671,10 +1675,12 @@ int lvmetad_pv_found(struct cmd_context *cmd, const struct id *pvid, struct devi

 	result = _lvmetad_handle_reply(reply, "pv_found", uuid, NULL);

-	if (vg && result &&
-	    (daemon_reply_int(reply, "seqno_after", -1) != vg->seqno ||
-	     daemon_reply_int(reply, "seqno_after", -1) != daemon_reply_int(reply, "seqno_before", -1)))
-		log_warn("WARNING: Inconsistent metadata found for VG %s", vg->name);
+	if (vg && result) {
+		seqno_after = daemon_reply_int(reply, "seqno_after", -1);
+		if ((seqno_after != vg->seqno) ||
+		    (seqno_after != daemon_reply_int(reply, "seqno_before", -1)))
+			log_warn("WARNING: Inconsistent metadata found for VG %s", vg->name);
+	}

 	if (result && found_vgnames) {
 		status = daemon_reply_str(reply, "status", NULL);
@@ -1765,7 +1771,7 @@ static int _lvmetad_pvscan_single(struct metadata_area *mda, void *baton)
 	struct volume_group *vg;

 	if (mda_is_ignored(mda) ||
-	    !(vg = mda->ops->vg_read(b->fid, "", mda, NULL, NULL, 1)))
+	    !(vg = mda->ops->vg_read(b->fid, "", mda, NULL, NULL, 1, 0)))
 		return 1;

 	/* FIXME Also ensure contents match etc. */
@@ -1786,7 +1792,7 @@ static int _lvmetad_pvscan_single(struct metadata_area *mda, void *baton)
 * the VG, and that PV may have been reused for another VG.
 */

-static struct volume_group *lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg)
+static struct volume_group *_lvmetad_pvscan_vg(struct cmd_context *cmd, struct volume_group *vg)
 {
 	char pvid_s[ID_LEN + 1] __attribute__((aligned(8)));
 	char uuid[64] __attribute__((aligned(8)));
@@ -2874,6 +2880,9 @@ int lvmetad_is_disabled(struct cmd_context *cmd, const char **reason)
 		} else if (strstr(reply_reason, LVMETAD_DISABLE_REASON_DIRECT)) {
 			*reason = "the disable flag was set directly";

+		} else if (strstr(reply_reason, LVMETAD_DISABLE_REASON_REPAIR)) {
+			*reason = "a repair command was run";
+
 		} else if (strstr(reply_reason, LVMETAD_DISABLE_REASON_LVM1)) {
 			*reason = "LVM1 metadata was found";

--- a/lib/cache_segtype/cache.c
+++ b/lib/cache_segtype/cache.c
@@ -51,11 +51,18 @@ static void _cache_display(const struct lv_segment *seg)

 	log_print("  Chunk size\t\t%s",
 		  display_size(seg->lv->vg->cmd, pool_seg->chunk_size));
-	log_print("  Metadata format\t%u", pool_seg->cache_metadata_format);
-	log_print("  Mode\t\t%s", get_cache_mode_name(pool_seg));
-	log_print("  Policy\t\t%s", pool_seg->policy_name);

-	if ((n = pool_seg->policy_settings->child))
+	if (pool_seg->cache_metadata_format != CACHE_METADATA_FORMAT_UNSELECTED)
+		log_print("  Metadata format\t%u", pool_seg->cache_metadata_format);
+
+	if (pool_seg->cache_mode != CACHE_MODE_UNSELECTED)
+		log_print("  Mode\t\t%s", get_cache_mode_name(pool_seg));
+
+	if (pool_seg->policy_name)
+		log_print("  Policy\t\t%s", pool_seg->policy_name);
+
+	if (pool_seg->policy_settings &&
+	    (n = pool_seg->policy_settings->child))
 		dm_config_write_node(n, _cache_out_line, NULL);

 	log_print(" ");
@@ -258,6 +265,39 @@ static void _destroy(struct segment_type *segtype)
 }

 #ifdef DEVMAPPER_SUPPORT
+/*
+ * Parse and look for kernel symbol in /proc/kallsyms
+ * this could be our only change to figure out there is
+ * cache policy symbol already in the monolithic kernel
+ * where 'modprobe dm-cache-smq' will simply not work
+ */
+static int _lookup_kallsyms(const char *symbol)
+{
+	static const char _syms[] = "/proc/kallsyms";
+	int ret = 0;
+	char *line = NULL;
+	size_t len;
+	FILE *s;
+
+	if (!(s = fopen(_syms, "r")))
+		log_sys_debug("fopen", _syms);
+	else {
+		while (getline(&line, &len, s) != -1)
+			if (strstr(line, symbol)) {
+				ret = 1; /* Found symbol */
+				log_debug("Found kernel symbol%s.", symbol); /* space is in symbol */
+				break;
+			}
+
+		free(line);
+		if (fclose(s))
+			log_sys_debug("fclose", _syms);
+	}
+
+	return ret;
+}
+
+
 static int _target_present(struct cmd_context *cmd,
 			   const struct lv_segment *seg __attribute__((unused)),
 			   unsigned *attributes __attribute__((unused)))
@@ -270,14 +310,15 @@ static int _target_present(struct cmd_context *cmd,
 		unsigned cache_alias;
 		const char feature[12];
 		const char module[12]; /* check dm-%s */
+		const char ksymbol[12]; /* check for kernel symbol */
 		const char *aliasing;
 	} _features[] = {
-		/* Assumption: cache >=1.9 always aliases MQ policy */
 		{ 1, 10, CACHE_FEATURE_METADATA2, 0, "metadata2" },
+		/* Assumption: cache >=1.9 always aliases MQ policy */
 		{ 1, 9, CACHE_FEATURE_POLICY_SMQ, CACHE_FEATURE_POLICY_MQ, "policy_smq", "cache-smq",
-		" and aliases cache-mq" },
-		{ 1, 8, CACHE_FEATURE_POLICY_SMQ, 0, "policy_smq", "cache-smq" },
-		{ 1, 3, CACHE_FEATURE_POLICY_MQ, 0, "policy_mq", "cache-mq" },
+		 " smq_exit", " and aliases cache-mq" },
+		{ 1, 8, CACHE_FEATURE_POLICY_SMQ, 0, "policy_smq", "cache-smq", " smq_exit" },
+		{ 1, 3, CACHE_FEATURE_POLICY_MQ, 0, "policy_mq", "cache-mq", " mq_init" },
 	};
 	static const char _lvmconf[] = "global/cache_disabled_features";
 	static unsigned _attrs = 0;
@@ -323,7 +364,8 @@ static int _target_present(struct cmd_context *cmd,
 			}
 			if (((maj > _features[i].maj) ||
 			     (maj == _features[i].maj && min >= _features[i].min)) &&
-			    module_present(cmd, _features[i].module)) {
+			    ((_features[i].ksymbol[0] && _lookup_kallsyms(_features[i].ksymbol)) ||
+			     module_present(cmd, _features[i].module))) {
 				log_debug_activation("Cache policy %s is available%s.",
 						     _features[i].module,
 						     _features[i].aliasing ? : "");
--- a/lib/commands/toolcontext.c
+++ b/lib/commands/toolcontext.c
@@ -46,6 +46,7 @@

 #include <locale.h>
 #include <sys/stat.h>
+#include <sys/syscall.h>
 #include <sys/utsname.h>
 #include <syslog.h>
 #include <time.h>
@@ -54,7 +55,7 @@
 #  include <malloc.h>
 #endif

-static const size_t linebuffer_size = 4096;
+static const size_t _linebuffer_size = 4096;

 /*
 * Copy the input string, removing invalid characters.
@@ -283,6 +284,8 @@ static int _parse_debug_classes(struct cmd_context *cmd)
 			debug_classes |= LOG_CLASS_LVMPOLLD;
 		else if (!strcasecmp(cv->v.str, "dbus"))
 			debug_classes |= LOG_CLASS_DBUS;
+		else if (!strcasecmp(cv->v.str, "io"))
+			debug_classes |= LOG_CLASS_IO;
 		else
 			log_verbose("Unrecognised value for log/debug_classes: %s", cv->v.str);
 	}
@@ -474,10 +477,12 @@ bad:

 int process_profilable_config(struct cmd_context *cmd)
 {
+	const char *units;
+
 	if (!(cmd->default_settings.unit_factor =
-	      dm_units_to_factor(find_config_tree_str(cmd, global_units_CFG, NULL),
+	      dm_units_to_factor(units = find_config_tree_str(cmd, global_units_CFG, NULL),
 				 &cmd->default_settings.unit_type, 1, NULL))) {
-		log_error("Invalid units specification");
+		log_error("Unrecognised configuration setting for global/units: %s", units);
 		return 0;
 	}

@@ -562,7 +567,7 @@ static int _process_config(struct cmd_context *cmd)
 #ifdef DEVMAPPER_SUPPORT
 	dm_set_dev_dir(cmd->dev_dir);

-	if (!dm_set_uuid_prefix("LVM-"))
+	if (!dm_set_uuid_prefix(UUID_PREFIX))
 		return_0;
 #endif

@@ -631,6 +636,16 @@ static int _process_config(struct cmd_context *cmd)
 	 */
 	cmd->default_settings.udev_fallback = udev_disabled ? 1 : -1;

+#ifdef AIO_SUPPORT
+	cmd->use_aio = find_config_tree_bool(cmd, devices_use_aio_CFG, NULL);
+#else
+	cmd->use_aio = 0;
+#endif
+	if (cmd->use_aio && !dev_async_setup(cmd))
+		cmd->use_aio = 0;
+
+	log_debug_io("%ssing asynchronous I/O.", cmd->use_aio ? "U" : "Not u");
+
 	init_retry_deactivation(find_config_tree_bool(cmd, activation_retry_deactivation_CFG, NULL));

 	init_activation_checks(find_config_tree_bool(cmd, activation_checks_CFG, NULL));
@@ -1283,7 +1298,7 @@ int init_filters(struct cmd_context *cmd, unsigned load_persistent_cache)
 		lvm_stat_ctim(&ts, &st);
 		cts = config_file_timestamp(cmd->cft);
 		if (timespeccmp(&ts, &cts, >) &&
-		    !persistent_filter_load(cmd->filter, NULL))
+		    !persistent_filter_load(cmd->mem, cmd->filter, NULL))
 			log_verbose("Failed to load existing device cache from %s",
 				    dev_cache);
 	}
@@ -1499,11 +1514,6 @@ static int _init_segtypes(struct cmd_context *cmd)
 		dm_list_add(&cmd->segtypes, &segtype->list);
 	}

-#ifdef REPLICATOR_INTERNAL
-	if (!init_replicator_segtype(cmd, &seglib))
-		return 0;
-#endif
-
 #ifdef RAID_INTERNAL
 	if (!init_raid_segtypes(cmd, &seglib))
 		return 0;
@@ -1874,9 +1884,15 @@ struct cmd_context *create_toolcontext(unsigned is_long_lived,

 #ifndef VALGRIND_POOL
 	/* Set in/out stream buffering before glibc */
-	if (set_buffering) {
+	if (set_buffering
+#ifdef SYS_gettid
+	    /* For threaded programs no changes of streams */
+            /* On linux gettid() is implemented only via syscall */
+	    && (syscall(SYS_gettid) == getpid())
+#endif
+	   ) {
 		/* Allocate 2 buffers */
-		if (!(cmd->linebuffer = dm_malloc(2 * linebuffer_size))) {
+		if (!(cmd->linebuffer = dm_malloc(2 * _linebuffer_size))) {
 			log_error("Failed to allocate line buffer.");
 			goto out;
 		}
@@ -1887,7 +1903,7 @@ struct cmd_context *create_toolcontext(unsigned is_long_lived,
 		    (flags & O_ACCMODE) != O_WRONLY) {
 			if (!reopen_standard_stream(&stdin, "r"))
 				goto_out;
-			if (setvbuf(stdin, cmd->linebuffer, _IOLBF, linebuffer_size)) {
+			if (setvbuf(stdin, cmd->linebuffer, _IOLBF, _linebuffer_size)) {
 				log_sys_error("setvbuf", "");
 				goto out;
 			}
@@ -1898,14 +1914,14 @@ struct cmd_context *create_toolcontext(unsigned is_long_lived,
 		    (flags & O_ACCMODE) != O_RDONLY) {
 			if (!reopen_standard_stream(&stdout, "w"))
 				goto_out;
-			if (setvbuf(stdout, cmd->linebuffer + linebuffer_size,
-				     _IOLBF, linebuffer_size)) {
+			if (setvbuf(stdout, cmd->linebuffer + _linebuffer_size,
+				     _IOLBF, _linebuffer_size)) {
 				log_sys_error("setvbuf", "");
 				goto out;
 			}
 		}
 		/* Buffers are used for lines without '\n' */
-	} else
+	} else if (!set_buffering)
 		/* Without buffering, must not use stdin/stdout */
 		init_silent(1);
 #endif
@@ -2009,7 +2025,6 @@ out:
 		cmd = NULL;
 	}

-
 	return cmd;
 }

@@ -2140,6 +2155,8 @@ int refresh_toolcontext(struct cmd_context *cmd)

 	cmd->lib_dir = NULL;

+	label_init();
+
 	if (!_init_lvm_conf(cmd))
 		return_0;

@@ -2227,7 +2244,7 @@ void destroy_toolcontext(struct cmd_context *cmd)
 	int flags;

 	if (cmd->dump_filter && cmd->filter && cmd->filter->dump &&
-	    !cmd->filter->dump(cmd->filter, 1))
+	    !cmd->filter->dump(cmd->filter, cmd->mem, 1))
 		stack;

 	archive_exit(cmd);
--- a/lib/commands/toolcontext.h
+++ b/lib/commands/toolcontext.h
@@ -160,9 +160,11 @@ struct cmd_context {
 	unsigned lockd_vg_rescan:1;
 	unsigned lockd_vg_default_sh:1;
 	unsigned lockd_vg_enforce_sh:1;
+	unsigned lockd_lv_sh:1;
 	unsigned vg_notify:1;
 	unsigned lv_notify:1;
 	unsigned pv_notify:1;
+	unsigned use_aio:1;

 	/*
 	 * Filtering.
--- a/lib/config/config.c
+++ b/lib/config/config.c
@@ -1,6 +1,6 @@
 /*
 * Copyright (C) 2001-2004 Sistina Software, Inc. All rights reserved.
- * Copyright (C) 2004-2011 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2004-2018 Red Hat, Inc. All rights reserved.
 *
 * This file is part of LVM2.
 *
@@ -65,11 +65,11 @@ struct config_source {
 * Map each ID to respective definition of the configuration item.
 */
 static struct cfg_def_item _cfg_def_items[CFG_COUNT + 1] = {
-#define cfg_section(id, name, parent, flags, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_SECTION, {0}, flags, since_version, {0}, deprecated_since_version, deprecation_comment, comment},
-#define cfg(id, name, parent, flags, type, default_value, since_version, unconfigured_value, deprecated_since_version, deprecation_comment, comment) {id, parent, name, type, {.v_##type = default_value}, flags, since_version, {.v_UNCONFIGURED = unconfigured_value}, deprecated_since_version, deprecation_comment, comment},
-#define cfg_runtime(id, name, parent, flags, type, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, type, {.fn_##type = get_default_##id}, flags | CFG_DEFAULT_RUN_TIME, since_version, {.fn_UNCONFIGURED = get_default_unconfigured_##id}, deprecated_since_version, deprecation_comment, comment},
-#define cfg_array(id, name, parent, flags, types, default_value, since_version, unconfigured_value, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_ARRAY | types, {.v_CFG_TYPE_STRING = default_value}, flags, since_version, {.v_UNCONFIGURED = unconfigured_value}, deprecated_since_version, deprecation_comment, comment},
-#define cfg_array_runtime(id, name, parent, flags, types, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_ARRAY | types, {.fn_CFG_TYPE_STRING = get_default_##id}, flags | CFG_DEFAULT_RUN_TIME, since_version, {.fn_UNCONFIGURED = get_default_unconfigured_##id}, deprecated_since_version, deprecation_comment, comment},
+#define cfg_section(id, name, parent, flags, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_SECTION, {0}, (flags), since_version, {0}, deprecated_since_version, deprecation_comment, comment},
+#define cfg(id, name, parent, flags, type, default_value, since_version, unconfigured_value, deprecated_since_version, deprecation_comment, comment) {id, parent, name, type, {.v_##type = (default_value)}, (flags), since_version, {.v_UNCONFIGURED = (unconfigured_value)}, deprecated_since_version, deprecation_comment, comment},
+#define cfg_runtime(id, name, parent, flags, type, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, type, {.fn_##type = get_default_##id}, (flags) | CFG_DEFAULT_RUN_TIME, since_version, {.fn_UNCONFIGURED = get_default_unconfigured_##id}, deprecated_since_version, (deprecation_comment), comment},
+#define cfg_array(id, name, parent, flags, types, default_value, since_version, unconfigured_value, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_ARRAY | (types), {.v_CFG_TYPE_STRING = (default_value)}, (flags), (since_version), {.v_UNCONFIGURED = (unconfigured_value)}, deprecated_since_version, deprecation_comment, comment},
+#define cfg_array_runtime(id, name, parent, flags, types, since_version, deprecated_since_version, deprecation_comment, comment) {id, parent, name, CFG_TYPE_ARRAY | (types), {.fn_CFG_TYPE_STRING = get_default_##id}, (flags) | CFG_DEFAULT_RUN_TIME, (since_version), {.fn_UNCONFIGURED = get_default_unconfigured_##id}, deprecated_since_version, deprecation_comment, comment},
 #include "config_settings.h"
 #undef cfg_section
 #undef cfg
@@ -279,7 +279,7 @@ struct dm_config_tree *config_file_open_and_read(const char *config_file,
 	}

 	log_very_verbose("Loading config file: %s", config_file);
-	if (!config_file_read(cft)) {
+	if (!config_file_read(cmd->mem, cft)) {
 		log_error("Failed to load config file %s", config_file);
 		goto bad;
 	}
@@ -481,39 +481,110 @@ int override_config_tree_from_profile(struct cmd_context *cmd,

 	if (profile->source == CONFIG_PROFILE_COMMAND)
 		return _override_config_tree_from_command_profile(cmd, profile);
-	else if (profile->source == CONFIG_PROFILE_METADATA)
+
+	if (profile->source == CONFIG_PROFILE_METADATA)
 		return _override_config_tree_from_metadata_profile(cmd, profile);

 	log_error(INTERNAL_ERROR "override_config_tree_from_profile: incorrect profile source type");
 	return 0;
 }

+struct process_config_file_params {
+	struct dm_config_tree *cft;
+	struct device *dev;
+	off_t offset;
+	size_t size;
+	off_t offset2;
+	size_t size2;
+	checksum_fn_t checksum_fn;
+	uint32_t checksum;
+	int checksum_only;
+	int no_dup_node_check;
+	lvm_callback_fn_t config_file_read_fd_callback;
+	void *config_file_read_fd_context;
+	int ret;
+};
+
+static void _process_config_file_buffer(int failed, unsigned ioflags, void *context, const void *data)
+{
+	struct process_config_file_params *pcfp = context;
+	const char *fb = data, *fe;
+
+	if (failed) {
+		pcfp->ret = 0;
+		goto_out;
+	}
+
+	if (pcfp->checksum_fn && pcfp->checksum !=
+	    (pcfp->checksum_fn(pcfp->checksum_fn(INITIAL_CRC, (const uint8_t *)fb, pcfp->size),
+			 (const uint8_t *)(fb + pcfp->size), pcfp->size2))) {
+		log_error("%s: Checksum error at offset %" PRIu64, dev_name(pcfp->dev), (uint64_t) pcfp->offset);
+		pcfp->ret = 0;
+		goto out;
+	}
+
+	if (!pcfp->checksum_only) {
+		fe = fb + pcfp->size + pcfp->size2;
+		if (pcfp->no_dup_node_check) {
+			if (!dm_config_parse_without_dup_node_check(pcfp->cft, fb, fe))
+				pcfp->ret = 0;
+		} else if (!dm_config_parse(pcfp->cft, fb, fe))
+			pcfp->ret = 0;
+	}
+
+out:
+	if (pcfp->config_file_read_fd_callback)
+		pcfp->config_file_read_fd_callback(!pcfp->ret, ioflags, pcfp->config_file_read_fd_context, NULL);
+}
+
 /*
 * When checksum_only is set, the checksum of buffer is only matched
 * and function avoids parsing of mda into config tree which
 * remains unmodified and should not be used.
 */
-int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
+int config_file_read_fd(struct dm_pool *mem, struct dm_config_tree *cft, struct device *dev, dev_io_reason_t reason,
 			off_t offset, size_t size, off_t offset2, size_t size2,
 			checksum_fn_t checksum_fn, uint32_t checksum,
-			int checksum_only, int no_dup_node_check)
+			int checksum_only, int no_dup_node_check, unsigned ioflags,
+			lvm_callback_fn_t config_file_read_fd_callback, void *config_file_read_fd_context)
 {
-	char *fb, *fe;
+	char *fb;
 	int r = 0;
-	int use_mmap = 1;
 	off_t mmap_offset = 0;
-	char *buf = NULL;
+	int use_mmap = 1;
+	const char *buf = NULL;
+	unsigned circular = size2 ? 1 : 0;	/* Wrapped around end of disk metadata buffer? */
 	struct config_source *cs = dm_config_get_custom(cft);
+	struct process_config_file_params *pcfp;

 	if (!_is_file_based_config_source(cs->type)) {
 		log_error(INTERNAL_ERROR "config_file_read_fd: expected file, special file "
 					 "or profile config source, found %s config source.",
 					 _config_source_names[cs->type]);
-		return 0;
+		goto bad;
 	}

+	if (!(pcfp = dm_pool_zalloc(mem, sizeof(*pcfp)))) {
+		log_debug("config_file_read_fd: process_config_file_params struct allocation failed");
+		goto bad;
+	}
+
+	pcfp->cft = cft;
+	pcfp->dev = dev;
+	pcfp->offset = offset;
+	pcfp->size = size;
+	pcfp->offset2 = offset2;
+	pcfp->size2 = size2;
+	pcfp->checksum_fn = checksum_fn;
+	pcfp->checksum = checksum;
+	pcfp->checksum_only = checksum_only;
+	pcfp->no_dup_node_check = no_dup_node_check;
+	pcfp->config_file_read_fd_callback = config_file_read_fd_callback;
+	pcfp->config_file_read_fd_context = config_file_read_fd_context;
+	pcfp->ret = 1;
+
 	/* Only use mmap with regular files */
-	if (!(dev->flags & DEV_REGULAR) || size2)
+	if (!(dev->flags & DEV_REGULAR) || circular)
 		use_mmap = 0;

 	if (use_mmap) {
@@ -523,56 +594,40 @@ int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
 			  MAP_PRIVATE, dev_fd(dev), offset - mmap_offset);
 		if (fb == (caddr_t) (-1)) {
 			log_sys_error("mmap", dev_name(dev));
-			goto out;
+			goto bad;
 		}
-		fb = fb + mmap_offset;
-	} else {
-		if (!(buf = dm_malloc(size + size2))) {
-			log_error("Failed to allocate circular buffer.");
-			return 0;
-		}
-		if (!dev_read_circular(dev, (uint64_t) offset, size,
-				       (uint64_t) offset2, size2, buf)) {
-			goto out;
-		}
-		fb = buf;
-	}
-
-	if (checksum_fn && checksum !=
-	    (checksum_fn(checksum_fn(INITIAL_CRC, (const uint8_t *)fb, size),
-			 (const uint8_t *)(fb + size), size2))) {
-		log_error("%s: Checksum error", dev_name(dev));
-		goto out;
-	}
-
-	if (!checksum_only) {
-		fe = fb + size + size2;
-		if (no_dup_node_check) {
-			if (!dm_config_parse_without_dup_node_check(cft, fb, fe))
-				goto_out;
-		} else {
-			if (!dm_config_parse(cft, fb, fe))
-				goto_out;
-		}
-	}
-
-	r = 1;
-
-      out:
-	if (!use_mmap)
-		dm_free(buf);
-	else {
+		_process_config_file_buffer(0, ioflags, pcfp, fb + mmap_offset);
+		r = pcfp->ret;
 		/* unmap the file */
-		if (munmap(fb - mmap_offset, size + mmap_offset)) {
+		if (munmap(fb, size + mmap_offset)) {
 			log_sys_error("munmap", dev_name(dev));
 			r = 0;
 		}
+	} else {
+		if (circular) {
+			if (!(buf = dev_read_circular(dev, (uint64_t) offset, size, (uint64_t) offset2, size2, reason)))
+				goto_out;
+			_process_config_file_buffer(0, ioflags, pcfp, buf);
+			dm_free((void *)buf);
+		} else {
+			dev_read_callback(dev, (uint64_t) offset, size, reason, ioflags, _process_config_file_buffer, pcfp);
+			if (config_file_read_fd_callback)
+				return 1;
+		}
+		r = pcfp->ret;
 	}

+out:
 	return r;
+
+bad:
+	if (config_file_read_fd_callback)
+		config_file_read_fd_callback(1, ioflags, config_file_read_fd_context, NULL);
+
+	return 0;
 }

-int config_file_read(struct dm_config_tree *cft)
+int config_file_read(struct dm_pool *mem, struct dm_config_tree *cft)
 {
 	const char *filename = NULL;
 	struct config_source *cs = dm_config_get_custom(cft);
@@ -600,8 +655,8 @@ int config_file_read(struct dm_config_tree *cft)
 		}
 	}

-	r = config_file_read_fd(cft, cf->dev, 0, (size_t) info.st_size, 0, 0,
-				(checksum_fn_t) NULL, 0, 0, 0);
+	r = config_file_read_fd(mem, cft, cf->dev, DEV_IO_MDA_CONTENT, 0, (size_t) info.st_size, 0, 0,
+				(checksum_fn_t) NULL, 0, 0, 0, 0, NULL, NULL);

 	if (!cf->keep_open) {
 		if (!dev_close(cf->dev))
@@ -619,9 +674,9 @@ struct timespec config_file_timestamp(struct dm_config_tree *cft)
 }

 #define cfg_def_get_item_p(id) (&_cfg_def_items[id])
-#define cfg_def_get_default_unconfigured_value_hint(cmd,item) ((item->flags & CFG_DEFAULT_RUN_TIME) ? item->default_unconfigured_value.fn_UNCONFIGURED(cmd) : item->default_unconfigured_value.v_UNCONFIGURED)
-#define cfg_def_get_default_value_hint(cmd,item,type,profile) ((item->flags & CFG_DEFAULT_RUN_TIME) ? item->default_value.fn_##type(cmd,profile) : item->default_value.v_##type)
-#define cfg_def_get_default_value(cmd,item,type,profile) (item->flags & CFG_DEFAULT_UNDEFINED ? 0 : cfg_def_get_default_value_hint(cmd,item,type,profile))
+#define cfg_def_get_default_unconfigured_value_hint(cmd,item) (((item)->flags & CFG_DEFAULT_RUN_TIME) ? (item)->default_unconfigured_value.fn_UNCONFIGURED(cmd) : (item)->default_unconfigured_value.v_UNCONFIGURED)
+#define cfg_def_get_default_value_hint(cmd,item,type,profile) (((item)->flags & CFG_DEFAULT_RUN_TIME) ? (item)->default_value.fn_##type(cmd,profile) : (item)->default_value.v_##type)
+#define cfg_def_get_default_value(cmd,item,type,profile) ((item)->flags & CFG_DEFAULT_UNDEFINED ? 0 : cfg_def_get_default_value_hint(cmd,item,type,profile))

 static int _cfg_def_make_path(char *buf, size_t buf_size, int id, cfg_def_item_t *item, int xlate)
 {
@@ -742,14 +797,16 @@ static struct dm_config_value *_get_def_array_values(struct cmd_context *cmd,
 		switch (toupper(token[0])) {
 			case 'I':
 			case 'B':
+				errno = 0;
 				v->v.i = strtoll(token + 1, &r, 10);
-				if (*r)
+				if (errno || *r)
 					goto bad;
 				v->type = DM_CFG_INT;
 				break;
 			case 'F':
+				errno = 0;
 				v->v.f = strtod(token + 1, &r);
-				if (*r)
+				if (errno || *r)
 					goto bad;
 				v->type = DM_CFG_FLOAT;
 				break;
@@ -1877,6 +1934,12 @@ int config_write(struct dm_config_tree *cft,
 	}

 	log_verbose("Dumping configuration to %s", file);
+
+	if (tree_spec->withgeneralpreamble)
+		fprintf(baton.fp, CFG_PREAMBLE_GENERAL);
+	if (tree_spec->withlocalpreamble)
+		fprintf(baton.fp, CFG_PREAMBLE_LOCAL);
+
 	if (!argc) {
 		if (!dm_config_write_node_out(cft->root, &_out_spec, &baton)) {
 			log_error("Failure while writing to %s", file);
--- a/lib/config/config.h
+++ b/lib/config/config.h
@@ -17,12 +17,12 @@
 #define _LVM_CONFIG_H

 #include "libdevmapper.h"
+#include "device.h"

 /* 16 bits: 3 bits for major, 4 bits for minor, 9 bits for patchlevel */
 /* FIXME Max LVM version supported: 7.15.511. Extend bits when needed. */
 #define vsn(major, minor, patchlevel) (major << 13 | minor << 9 | patchlevel)

-struct device;
 struct cmd_context;

 typedef enum {
@@ -141,6 +141,7 @@ typedef struct cfg_def_item {
 	uint16_t deprecated_since_version;				/* version since this item is deprecated */
 	const char *deprecation_comment;				/* comment about reasons for deprecation and settings that supersede this one */
 	const char *comment;						/* comment */
+	const char *file_premable;					/* comment text to use at the start of the file */
 } cfg_def_item_t;

 /* configuration definition tree types */
@@ -173,6 +174,8 @@ struct config_def_tree_spec {
 	unsigned withversions:1;		/* include versions */
 	unsigned withspaces:1;			/* add more spaces in output for better readability */
 	unsigned unconfigured:1;		/* use unconfigured path strings */
+	unsigned withgeneralpreamble:1;		/* include preamble for a general config file */
+	unsigned withlocalpreamble:1;		/* include preamble for a local config file */
 	uint8_t *check_status;			/* status of last tree check (currently needed for CFG_DEF_TREE_MISSING only) */
 };

@@ -236,11 +239,13 @@ config_source_t config_get_source_type(struct dm_config_tree *cft);
 typedef uint32_t (*checksum_fn_t) (uint32_t initial, const uint8_t *buf, uint32_t size);

 struct dm_config_tree *config_open(config_source_t source, const char *filename, int keep_open);
-int config_file_read_fd(struct dm_config_tree *cft, struct device *dev,
+int config_file_read_fd(struct dm_pool *mem, struct dm_config_tree *cft, struct device *dev, dev_io_reason_t reason,
 			off_t offset, size_t size, off_t offset2, size_t size2,
 			checksum_fn_t checksum_fn, uint32_t checksum,
-			int skip_parse, int no_dup_node_check);
-int config_file_read(struct dm_config_tree *cft);
+			int skip_parse, int no_dup_node_check, unsigned ioflags,
+			lvm_callback_fn_t config_file_read_fd_callback, void *config_file_read_fd_context);
+
+int config_file_read(struct dm_pool *mem, struct dm_config_tree *cft);
 struct dm_config_tree *config_file_open_and_read(const char *config_file, config_source_t source,
 						 struct cmd_context *cmd);
 int config_write(struct dm_config_tree *cft, struct config_def_tree_spec *tree_spec,
--- a/lib/config/config_settings.h
+++ b/lib/config/config_settings.h
@@ -121,6 +121,30 @@

 cfg_section(root_CFG_SECTION, "(root)", root_CFG_SECTION, 0, vsn(0, 0, 0), 0, NULL, NULL)

+#define CFG_PREAMBLE_GENERAL \
+	"# This is an example configuration file for the LVM2 system.\n" \
+	"# It contains the default settings that would be used if there was no\n" \
+	"# @DEFAULT_SYS_DIR@/lvm.conf file.\n" \
+	"#\n" \
+	"# Refer to 'man lvm.conf' for further information including the file layout.\n" \
+	"#\n" \
+	"# Refer to 'man lvm.conf' for information about how settings configured in\n" \
+	"# this file are combined with built-in values and command line options to\n" \
+	"# arrive at the final values used by LVM.\n" \
+	"#\n" \
+	"# Refer to 'man lvmconfig' for information about displaying the built-in\n" \
+	"# and configured values used by LVM.\n" \
+	"#\n" \
+	"# If a default value is set in this file (not commented out), then a\n" \
+	"# new version of LVM using this file will continue using that value,\n" \
+	"# even if the new version of LVM changes the built-in default value.\n" \
+	"#\n" \
+	"# To put this file in a different directory and override @DEFAULT_SYS_DIR@ set\n" \
+	"# the environment variable LVM_SYSTEM_DIR before running the tools.\n" \
+	"#\n" \
+	"# N.B. Take care that each setting only appears once if uncommenting\n" \
+	"# example settings in this file.\n\n"
+
 cfg_section(config_CFG_SECTION, "config", root_CFG_SECTION, 0, vsn(2, 2, 99), 0, NULL,
 	"How LVM configuration settings are handled.\n")

@@ -161,6 +185,26 @@ cfg_section(tags_CFG_SECTION, "tags", root_CFG_SECTION, CFG_DEFAULT_COMMENTED, v
 cfg_section(local_CFG_SECTION, "local", root_CFG_SECTION, 0, vsn(2, 2, 117), 0, NULL,
 	"LVM settings that are specific to the local host.\n")

+#define CFG_PREAMBLE_LOCAL \
+	"# This is a local configuration file template for the LVM2 system\n" \
+	"# which should be installed as @DEFAULT_SYS_DIR@/lvmlocal.conf .\n" \
+	"#\n" \
+	"# Refer to 'man lvm.conf' for information about the file layout.\n" \
+	"#\n" \
+	"# To put this file in a different directory and override\n" \
+	"# @DEFAULT_SYS_DIR@ set the environment variable LVM_SYSTEM_DIR before\n" \
+	"# running the tools.\n" \
+	"#\n" \
+	"# The lvmlocal.conf file is normally expected to contain only the\n" \
+	"# \"local\" section which contains settings that should not be shared or\n" \
+	"# repeated among different hosts.  (But if other sections are present,\n" \
+	"# they *will* get processed.  Settings in this file override equivalent\n" \
+	"# ones in lvm.conf and are in turn overridden by ones in any enabled\n" \
+	"# lvm_<tag>.conf files.)\n" \
+	"#\n" \
+	"# Please take care that each setting only appears once if uncommenting\n" \
+	"# example settings in this file and never copy this file between hosts.\n\n"
+
 cfg(config_checks_CFG, "checks", config_CFG_SECTION, 0, CFG_TYPE_BOOL, 1, vsn(2, 2, 99), NULL, 0, NULL,
 	"If enabled, any LVM configuration mismatch is reported.\n"
 	"This implies checking that the configuration key is understood by\n"
@@ -182,6 +226,16 @@ cfg(devices_dir_CFG, "dir", devices_CFG_SECTION, CFG_ADVANCED, CFG_TYPE_STRING,
 cfg_array(devices_scan_CFG, "scan", devices_CFG_SECTION, CFG_ADVANCED, CFG_TYPE_STRING, "#S/dev", vsn(1, 0, 0), NULL, 0, NULL,
 	"Directories containing device nodes to use with LVM.\n")

+cfg(devices_use_aio_CFG, "use_aio", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_BOOL, DEFAULT_USE_AIO, vsn(2, 2, 178), NULL, 0, NULL,
+	"Use linux asynchronous I/O for parallel device access where possible.\n")
+
+cfg(devices_aio_max_CFG, "aio_max", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_INT, DEFAULT_AIO_MAX, vsn(2, 2, 178), NULL, 0, NULL,
+	"Maximum number of asynchronous I/Os to issue concurrently.\n")
+
+cfg(devices_aio_memory_CFG, "aio_memory", devices_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_INT, DEFAULT_AIO_MEMORY, vsn(2, 2, 178), NULL, 0, NULL,
+	"Approximate maximum total amount of memory (in MB) used\n"
+	"for asynchronous I/O buffers.\n")
+
 cfg_array(devices_loopfiles_CFG, "loopfiles", devices_CFG_SECTION, CFG_DEFAULT_UNDEFINED | CFG_UNSUPPORTED, CFG_TYPE_STRING, NULL, vsn(1, 2, 0), NULL, 0, NULL, NULL)

 cfg(devices_obtain_device_list_from_udev_CFG, "obtain_device_list_from_udev", devices_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_OBTAIN_DEVICE_LIST_FROM_UDEV, vsn(2, 2, 85), NULL, 0, NULL,
@@ -236,8 +290,8 @@ cfg_array(devices_filter_CFG, "filter", devices_CFG_SECTION, CFG_DEFAULT_COMMENT
 	"device path names. Each regex is delimited by a vertical bar '|'\n"
 	"(or any character) and is preceded by 'a' to accept the path, or\n"
 	"by 'r' to reject the path. The first regex in the list to match the\n"
-	"path is used, producing the 'a' or 'r' result for the device.\n"
-	"When multiple path names exist for a block device, if any path name\n"
+	"path is used, producing the 'a' or 'r' result for that path.\n"
+	"If any of multiple existing path names for a block device\n"
 	"matches an 'a' pattern before an 'r' pattern, then the device is\n"
 	"accepted. If all the path names match an 'r' pattern first, then the\n"
 	"device is rejected. Unmatching path names do not affect the accept\n"
@@ -469,8 +523,9 @@ cfg(allocation_mirror_logs_require_separate_pvs_CFG, "mirror_logs_require_separa

 cfg(allocation_raid_stripe_all_devices_CFG, "raid_stripe_all_devices", allocation_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_BOOL, DEFAULT_ALLOCATION_STRIPE_ALL_DEVICES, vsn(2, 2, 162), NULL, 0, NULL,
 	"Stripe across all PVs when RAID stripes are not specified.\n"
-	"If enabled, all PVs in the VG or on the command line are used for raid0/4/5/6/10\n"
-	"when the command does not specify the number of stripes to use.\n"
+	"If enabled, all PVs in the VG or on the command line are used for\n"
+	"raid0/4/5/6/10 when the command does not specify the number of\n"
+	"stripes to use.\n"
 	"This was the default behaviour until release 2.02.162.\n")

 cfg(allocation_cache_pool_metadata_require_separate_pvs_CFG, "cache_pool_metadata_require_separate_pvs", allocation_CFG_SECTION, CFG_PROFILABLE | CFG_PROFILABLE_METADATA, CFG_TYPE_BOOL, DEFAULT_CACHE_POOL_METADATA_REQUIRE_SEPARATE_PVS, vsn(2, 2, 106), NULL, 0, NULL,
@@ -660,11 +715,11 @@ cfg(log_activation_CFG, "activation", log_CFG_SECTION, 0, CFG_TYPE_BOOL, 0, vsn(

 cfg(log_activate_file_CFG, "activate_file", log_CFG_SECTION, CFG_DEFAULT_UNDEFINED | CFG_UNSUPPORTED, CFG_TYPE_STRING, NULL, vsn(1, 0, 0), NULL, 0, NULL, NULL)

-cfg_array(log_debug_classes_CFG, "debug_classes", log_CFG_SECTION, CFG_ALLOW_EMPTY, CFG_TYPE_STRING, "#Smemory#Sdevices#Sactivation#Sallocation#Slvmetad#Smetadata#Scache#Slocking#Slvmpolld#Sdbus", vsn(2, 2, 99), NULL, 0, NULL,
+cfg_array(log_debug_classes_CFG, "debug_classes", log_CFG_SECTION, CFG_ALLOW_EMPTY, CFG_TYPE_STRING, "#Smemory#Sdevices#Sio#Sactivation#Sallocation#Slvmetad#Smetadata#Scache#Slocking#Slvmpolld#Sdbus", vsn(2, 2, 99), NULL, 0, NULL,
 	"Select log messages by class.\n"
 	"Some debugging messages are assigned to a class and only appear in\n"
 	"debug output if the class is listed here. Classes currently\n"
-	"available: memory, devices, activation, allocation, lvmetad,\n"
+	"available: memory, devices, io, activation, allocation, lvmetad,\n"
 	"metadata, cache, locking, lvmpolld. Use \"all\" to see everything.\n")

 cfg(backup_backup_CFG, "backup", backup_CFG_SECTION, 0, CFG_TYPE_BOOL, DEFAULT_BACKUP_ENABLED, vsn(1, 0, 0), NULL, 0, NULL,
@@ -934,7 +989,7 @@ cfg(global_use_lvmetad_CFG, "use_lvmetad", global_CFG_SECTION, 0, CFG_TYPE_BOOL,
 	"devices/global_filter.\n")

 cfg(global_lvmetad_update_wait_time_CFG, "lvmetad_update_wait_time", global_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_INT, DEFAULT_LVMETAD_UPDATE_WAIT_TIME, vsn(2, 2, 151), NULL, 0, NULL,
-	"The number of seconds a command will wait for lvmetad update to finish.\n"
+	"Number of seconds a command will wait for lvmetad update to finish.\n"
 	"After waiting for this period, a command will not use lvmetad, and\n"
 	"will revert to disk scanning.\n")

@@ -1035,6 +1090,10 @@ cfg_array(global_cache_check_options_CFG, "cache_check_options", global_CFG_SECT
 cfg_array(global_cache_repair_options_CFG, "cache_repair_options", global_CFG_SECTION, CFG_ALLOW_EMPTY | CFG_DEFAULT_COMMENTED, CFG_TYPE_STRING, DEFAULT_CACHE_REPAIR_OPTIONS_CONFIG, vsn(2, 2, 108), NULL, 0, NULL,
 	"List of options passed to the cache_repair command.\n")

+cfg(global_fsadm_executable_CFG, "fsadm_executable", global_CFG_SECTION, CFG_DEFAULT_COMMENTED, CFG_TYPE_STRING, DEFAULT_FSADM_PATH, vsn(2, 2, 170), "@FSADM_PATH@", 0, NULL,
+	"The full path to the fsadm command.\n"
+	"LVM uses this command to help with lvresize -r operations.\n")
+
 cfg(global_system_id_source_CFG, "system_id_source", global_CFG_SECTION, 0, CFG_TYPE_STRING, DEFAULT_SYSTEM_ID_SOURCE, vsn(2, 2, 117), NULL, 0, NULL,
 	"The method LVM uses to set the local system ID.\n"
 	"Volume Groups can also be given a system ID (by vgcreate, vgchange,\n"
--- a/lib/config/defaults.h
+++ b/lib/config/defaults.h
@@ -32,6 +32,9 @@
 #define DEFAULT_SYSTEM_ID_SOURCE "none"
 #define DEFAULT_OBTAIN_DEVICE_LIST_FROM_UDEV 1
 #define DEFAULT_EXTERNAL_DEVICE_INFO_SOURCE "none"
+#define DEFAULT_USE_AIO 1
+#define DEFAULT_AIO_MAX 128
+#define DEFAULT_AIO_MEMORY 10
 #define DEFAULT_SYSFS_SCAN 1
 #define DEFAULT_MD_COMPONENT_DETECTION 1
 #define DEFAULT_FW_RAID_COMPONENT_DETECTION 0
@@ -104,9 +107,9 @@
 #define DEFAULT_THIN_REPAIR_OPTION1 ""
 #define DEFAULT_THIN_REPAIR_OPTIONS_CONFIG "#S" DEFAULT_THIN_REPAIR_OPTION1
 #define DEFAULT_THIN_POOL_METADATA_REQUIRE_SEPARATE_PVS 0
-#define DEFAULT_THIN_POOL_MAX_METADATA_SIZE (16 * 1024 * 1024)  /* KB */
+#define DEFAULT_THIN_POOL_MAX_METADATA_SIZE (DM_THIN_MAX_METADATA_SIZE / 2)  /* KB */
 #define DEFAULT_THIN_POOL_MIN_METADATA_SIZE 2048  /* KB */
-#define DEFAULT_THIN_POOL_OPTIMAL_SIZE     (128 * 1024 * 1024)	/* KB */
+#define DEFAULT_THIN_POOL_OPTIMAL_METADATA_SIZE (128 * 1024) /* KB */
 #define DEFAULT_THIN_POOL_CHUNK_SIZE_POLICY "generic"
 #define DEFAULT_THIN_POOL_CHUNK_SIZE	    64	  /* KB */
 #define DEFAULT_THIN_POOL_CHUNK_SIZE_PERFORMANCE 512 /* KB */
@@ -135,6 +138,8 @@
 #define DEFAULT_CACHE_METADATA_FORMAT CACHE_METADATA_FORMAT_UNSELECTED /* Autodetect */
 #define DEFAULT_CACHE_MODE "writethrough"

+#define DEFAULT_FSADM_PATH FSADM_PATH
+
 #define DEFAULT_UMASK 0077

 #define DEFAULT_FORMAT "lvm2"
@@ -200,7 +205,7 @@
 #define DEFAULT_ACTIVATION_MODE "degraded"
 #define DEFAULT_USE_LINEAR_TARGET 1
 #define DEFAULT_STRIPE_FILLER "error"
-#define DEFAULT_RAID_REGION_SIZE   512	/* KB */
+#define DEFAULT_RAID_REGION_SIZE   2048	/* KB */
 #define DEFAULT_INTERVAL 15

 #define DEFAULT_MAX_HISTORY 100
--- a/lib/device/dev-cache.c
+++ b/lib/device/dev-cache.c
@@ -320,8 +320,8 @@ static int _compare_paths(const char *path0, const char *path1)
 	/* ASCII comparison */
 	if (strcmp(path0, path1) < 0)
 		return 0;
-	else
-		return 1;
+
+	return 1;
 }

 static int _add_alias(struct device *dev, const char *path)
@@ -706,6 +706,12 @@ static int _insert_dev(const char *path, dev_t d)
 		}
 	}

+	if (dm_hash_lookup(_cache.names, path) == dev) {
+		/* Hash already has matching entry present */
+		log_debug("%s: Path already cached.", path);
+		return 1;
+	}
+
 	if (!(path_copy = dm_pool_strdup(_cache.mem, path))) {
 		log_error("Failed to duplicate path string.");
 		return 0;
@@ -892,10 +898,10 @@ int dev_cache_index_devs(void)
 			if (errno == ENOENT) {
 				sysfs_has_dev_block = 0;
 				return 1;
-			} else {
-				log_sys_error("stat", path);
-				return 0;
 			}
+
+			log_sys_error("stat", path);
+			return 0;
 		}
 	} else if (!sysfs_has_dev_block)
 		return 1;
@@ -933,12 +939,20 @@ static int _insert_udev_dir(struct udev *udev, const char *dir)
 	struct udev_device *device;
 	int r = 1;

-	if (!(udev_enum = udev_enumerate_new(udev)))
-		goto bad;
+	if (!(udev_enum = udev_enumerate_new(udev))) {
+		log_error("Failed to udev_enumerate_new.");
+		return 0;
+	}

-	if (udev_enumerate_add_match_subsystem(udev_enum, "block") ||
-	    udev_enumerate_scan_devices(udev_enum))
-		goto bad;
+	if (udev_enumerate_add_match_subsystem(udev_enum, "block")) {
+		log_error("Failed to udev_enumerate_add_match_subsystem.");
+		goto out;
+	}
+
+	if (udev_enumerate_scan_devices(udev_enum)) {
+		log_error("Failed to udev_enumerate_scan_devices.");
+		goto out;
+	}

 	/*
 	 * Report any missing information as "log_very_verbose" only, do not
@@ -975,13 +989,10 @@ static int _insert_udev_dir(struct udev *udev, const char *dir)
 		udev_device_unref(device);
 	}

+out:
 	udev_enumerate_unref(udev_enum);
-	return r;

-bad:
-	log_error("Failed to enumerate udev device list.");
-	udev_enumerate_unref(udev_enum);
-	return 0;
+	return r;
 }

 static void _insert_dirs(struct dm_list *dirs)
@@ -1236,12 +1247,24 @@ int dev_cache_check_for_open_devices(void)

 int dev_cache_exit(void)
 {
+	struct btree_iter *b;
 	int num_open = 0;

+	dev_async_exit();
+
 	if (_cache.names)
 		if ((num_open = _check_for_open_devices(1)) > 0)
 			log_error(INTERNAL_ERROR "%d device(s) were left open and have been closed.", num_open);

+	if (_cache.devices) {
+		/* FIXME Replace with structured devbuf cache */
+		b = btree_first(_cache.devices);
+		while (b) {
+			devbufs_release(btree_get_data(b));
+			b = btree_next(b);
+		}
+	}
+
 	if (_cache.mem)
 		dm_pool_destroy(_cache.mem);

@@ -1364,6 +1387,19 @@ const char *dev_name_confirmed(struct device *dev, int quiet)
 	return dev_name(dev);
 }

+/* Provide a custom reason when a device is ignored */
+const char *dev_cache_filtered_reason(const char *name)
+{
+	const char *reason = "not found";
+	struct device *d = (struct device *) dm_hash_lookup(_cache.names, name);
+
+	if (d)
+		/* FIXME Record which filter caused the exclusion */
+		reason = "excluded by a filter";
+
+	return reason;
+}
+
 struct device *dev_cache_get(const char *name, struct dev_filter *f)
 {
 	struct stat buf;
@@ -1399,7 +1435,7 @@ struct device *dev_cache_get(const char *name, struct dev_filter *f)
 	if (!d || (f && !(d->flags & DEV_REGULAR) && !(f->passes_filter(f, d))))
 		return NULL;

-	log_debug_devs("Using %s", dev_name(d));
+	log_debug_devs("%s: Using device (%d:%d)", dev_name(d), (int) MAJOR(d->dev), (int) MINOR(d->dev));
 	return d;
 }

@@ -1509,7 +1545,7 @@ struct device *dev_iter_get(struct dev_iter *iter)
 		struct device *d = _iter_next(iter);
 		if (!iter->filter || (d->flags & DEV_REGULAR) ||
 		    iter->filter->passes_filter(iter->filter, d)) {
-			log_debug_devs("Using %s", dev_name(d));
+			log_debug_devs("%s: Using device (%d:%d)", dev_name(d), (int) MAJOR(d->dev), (int) MINOR(d->dev));
 			return d;
 		}
 	}
--- a/lib/device/dev-cache.h
+++ b/lib/device/dev-cache.h
@@ -23,10 +23,10 @@
 * predicate for devices.
 */
 struct dev_filter {
-	int (*passes_filter) (struct dev_filter * f, struct device * dev);
-	void (*destroy) (struct dev_filter * f);
-	void (*wipe) (struct dev_filter * f);
-	int (*dump) (struct dev_filter * f, int merge_existing);
+	int (*passes_filter) (struct dev_filter *f, struct device *dev);
+	void (*destroy) (struct dev_filter *f);
+	void (*wipe) (struct dev_filter *f);
+	int (*dump) (struct dev_filter *f, struct dm_pool *mem, int merge_existing);
 	void *private;
 	unsigned use_count;
 };
@@ -55,6 +55,7 @@ int dev_cache_add_dir(const char *path);
 int dev_cache_add_loopfile(const char *path);
 __attribute__((nonnull(1)))
 struct device *dev_cache_get(const char *name, struct dev_filter *f);
+const char *dev_cache_filtered_reason(const char *name);

 // TODO
 struct device *dev_cache_get_by_devt(dev_t device, struct dev_filter *f);
--- a/lib/device/dev-ext.c
+++ b/lib/device/dev-ext.c
@@ -100,8 +100,6 @@ const char *dev_ext_name(struct device *dev)
 	return _ext_registry[dev->ext.src].name;
 }

-static const char *_ext_attached_msg = "External handle attached to device";
-
 struct dev_ext *dev_ext_get(struct device *dev)
 {
 	struct dev_ext *ext;
@@ -110,10 +108,10 @@ struct dev_ext *dev_ext_get(struct device *dev)
 	handle_ptr = dev->ext.handle;

 	if (!(ext = _ext_registry[dev->ext.src].dev_ext_get(dev)))
-		log_error("Failed to get external handle for device %s [%s].",
+		log_error("%s: Failed to get external handle [%s].",
 			   dev_name(dev), dev_ext_name(dev));
 	else if (handle_ptr != dev->ext.handle)
-		log_debug_devs("%s %s [%s:%p]", _ext_attached_msg, dev_name(dev),
+		log_debug_devs("%s: External handle [%s:%p] attached", dev_name(dev),
 				dev_ext_name(dev), dev->ext.handle);

 	return ext;
@@ -131,10 +129,10 @@ int dev_ext_release(struct device *dev)
 	handle_ptr = dev->ext.handle;

 	if (!(r = _ext_registry[dev->ext.src].dev_ext_release(dev)))
-		log_error("Failed to release external handle for device %s [%s:%p].",
+		log_error("%s: Failed to release external handle [%s:%p]",
 			  dev_name(dev), dev_ext_name(dev), dev->ext.handle);
 	else
-		log_debug_devs("External handle detached from device %s [%s:%p]",
+		log_debug_devs("%s: External handle [%s:%p] detached",
 				dev_name(dev), dev_ext_name(dev), handle_ptr);

 	return r;
@@ -143,7 +141,7 @@ int dev_ext_release(struct device *dev)
 int dev_ext_enable(struct device *dev, dev_ext_t src)
 {
 	if (dev->ext.enabled && (dev->ext.src != src) && !dev_ext_release(dev)) {
-		log_error("Failed to enable external handle for device %s [%s].",
+		log_error("%s: Failed to enable external handle [%s].",
 			   dev_name(dev), _ext_registry[src].name); 
 		return 0;
 	}
@@ -160,7 +158,7 @@ int dev_ext_disable(struct device *dev)
 		return 1;

 	if (!dev_ext_release(dev)) {
-		log_error("Failed to disable external handle for device %s [%s].",
+		log_error("%s: Failed to disable external handle [%s].",
 			   dev_name(dev), dev_ext_name(dev));
 		return 0;
 	}
--- a/lib/device/dev-io.c
+++ b/lib/device/dev-io.c
@@ -53,36 +53,345 @@
 #  endif
 #endif

+/*
+ * Always read at least 8k from disk.
+ * This seems to be a good compromise for the existing LVM2 metadata layout.
+ */
+#define MIN_READ_SIZE (8 * 1024)
+
 static DM_LIST_INIT(_open_devices);
 static unsigned _dev_size_seqno = 1;

+static const char *_reasons[] = {
+	"dev signatures",
+	"PV labels",
+	"VG metadata header",
+	"VG metadata content",
+	"extra VG metadata header",
+	"extra VG metadata content",
+	"LVM1 metadata",
+	"pool metadata",
+	"LV content",
+	"logging",
+};
+
+static const char *_reason_text(dev_io_reason_t reason)
+{
+	return _reasons[(unsigned) reason];
+}
+
+/*
+ * Release the memory holding the last data we read
+ */
+static void _release_devbuf(struct device_buffer *devbuf)
+{
+	dm_free(devbuf->malloc_address);
+	devbuf->malloc_address = NULL;
+}
+
+void devbufs_release(struct device *dev)
+{
+	if ((dev->flags & DEV_REGULAR))
+		return;
+
+	_release_devbuf(&dev->last_devbuf);
+	_release_devbuf(&dev->last_extra_devbuf);
+}
+
+#ifdef AIO_SUPPORT
+
+#  include <libaio.h>
+
+static io_context_t _aio_ctx = 0;
+static struct io_event *_aio_events = NULL;
+static int _aio_max = 0;
+static int64_t _aio_memory_max = 0;
+static int _aio_must_queue = 0;		/* Have we reached AIO capacity? */
+
+static DM_LIST_INIT(_aio_queue);
+
+#define DEFAULT_AIO_COLLECTION_EVENTS 32
+
+int dev_async_setup(struct cmd_context *cmd)
+{
+	int r;
+
+	_aio_max = find_config_tree_int(cmd, devices_aio_max_CFG, NULL);
+	_aio_memory_max = find_config_tree_int(cmd, devices_aio_memory_CFG, NULL) * 1024 * 1024;
+
+	/* Threshold is zero? */
+	if (!_aio_max || !_aio_memory_max) {
+		if (_aio_ctx)
+			dev_async_exit();
+		return 1;
+	}
+
+	/* Already set up? */
+	if (_aio_ctx)
+		return 1;
+
+	log_debug_io("Setting up aio context for up to %" PRId64 " MB across %d events.", _aio_memory_max, _aio_max);
+
+	if (!_aio_events && !(_aio_events = dm_zalloc(sizeof(*_aio_events) * DEFAULT_AIO_COLLECTION_EVENTS))) {
+		log_error("Failed to allocate io_event array for asynchronous I/O.");
+		return 0;
+	}
+
+	if ((r = io_setup(_aio_max, &_aio_ctx)) < 0) {
+		/*
+		 * Possible errors:
+		 *   ENOSYS - aio not available in current kernel
+		 *   EAGAIN - _aio_max is too big
+		 *   EFAULT - invalid pointer
+		 *   EINVAL - _aio_ctx != 0 or kernel aio limits exceeded
+		 *   ENOMEM
+		 */
+		log_warn("WARNING: Asynchronous I/O setup for %d events failed: %s", _aio_max, strerror(-r));
+		log_warn("WARNING: Using only synchronous I/O.");
+		dm_free(_aio_events);
+		_aio_events = NULL;
+		_aio_ctx = 0;
+		return 0;
+	}
+
+	return 1;
+}
+
+/* Reset aio context after fork */
+int dev_async_reset(struct cmd_context *cmd)
+{
+	log_debug_io("Resetting asynchronous I/O context.");
+	_aio_ctx = 0;
+	dm_free(_aio_events);
+	_aio_events = NULL;
+
+	return dev_async_setup(cmd);
+}
+
+/*
+ * Track the amount of in-flight async I/O.
+ * If it exceeds the defined threshold set _aio_must_queue.
+ */
+static void _update_aio_counters(int nr, ssize_t bytes)
+{
+	static int64_t aio_bytes = 0;
+	static int aio_count = 0;
+
+	aio_bytes += bytes;
+	aio_count += nr;
+
+	if (aio_count >= _aio_max || aio_bytes > _aio_memory_max)
+		_aio_must_queue = 1;
+	else
+		_aio_must_queue = 0;
+}
+
+static int _io(struct device_buffer *devbuf, unsigned ioflags);
+
+int dev_async_getevents(void)
+{
+	struct device_buffer *devbuf, *tmp;
+	lvm_callback_fn_t dev_read_callback_fn;
+	void *dev_read_callback_context;
+	int r, event_nr;
+
+	if (!_aio_ctx)
+		return 1;
+
+	do {
+		/* FIXME Add timeout - currently NULL - waits for ever for at least 1 item */
+		r = io_getevents(_aio_ctx, 1, DEFAULT_AIO_COLLECTION_EVENTS, _aio_events, NULL);
+		if (r > 0)
+			break;
+		if (!r)
+			return 1; /* Timeout elapsed */
+		if (r == -EINTR)
+			continue;
+		if (r == -EAGAIN) {
+			usleep(100);
+			return 1; /* Give the caller the opportunity to do other work before repeating */
+		}
+		/*
+		 * ENOSYS - not supported by kernel
+		 * EFAULT - memory invalid
+		 * EINVAL - _aio_ctx invalid or min_nr/nr/timeout out of range
+		 */
+		log_error("Asynchronous event collection failed: %s", strerror(-r));
+		return 0;
+	} while (1);
+
+	for (event_nr = 0; event_nr < r; event_nr++) {
+		devbuf = _aio_events[event_nr].obj->data;
+		dm_free(_aio_events[event_nr].obj);
+
+		_update_aio_counters(-1, -devbuf->where.size);
+
+		dev_read_callback_fn = devbuf->dev_read_callback_fn;
+		dev_read_callback_context = devbuf->dev_read_callback_context;
+
+		/* Clear the callbacks as a precaution */
+		devbuf->dev_read_callback_context = NULL;
+		devbuf->dev_read_callback_fn = NULL;
+
+		if (_aio_events[event_nr].res == devbuf->where.size) {
+			if (dev_read_callback_fn)
+				dev_read_callback_fn(0, AIO_SUPPORTED_CODE_PATH, dev_read_callback_context, (char *)devbuf->buf + devbuf->data_offset);
+		} else {
+			/* FIXME If partial read is possible, resubmit remainder */
+			log_error_once("%s: Asynchronous I/O failed: read only %" PRIu64 " of %" PRIu64 " bytes at %" PRIu64,
+				       dev_name(devbuf->where.dev),
+				       (uint64_t) _aio_events[event_nr].res, (uint64_t) devbuf->where.size,
+				       (uint64_t) devbuf->where.start);
+			_release_devbuf(devbuf);
+			if (dev_read_callback_fn)
+				dev_read_callback_fn(1, AIO_SUPPORTED_CODE_PATH, dev_read_callback_context, NULL);
+			else
+				r = 0;
+		}
+	}
+
+	/* Submit further queued events if we can */
+        dm_list_iterate_items_gen_safe(devbuf, tmp, &_aio_queue, aio_queued) {
+		if (_aio_must_queue)
+			break;
+                dm_list_del(&devbuf->aio_queued);
+		_io(devbuf, 1);
+        }
+
+	return 1;
+}
+
+static int _io_async(struct device_buffer *devbuf)
+{
+	struct device_area *where = &devbuf->where;
+	struct iocb *iocb;
+	int r;
+
+	_update_aio_counters(1, devbuf->where.size);
+
+	if (!(iocb = dm_malloc(sizeof(*iocb)))) {
+		log_error("Failed to allocate I/O control block array for asynchronous I/O.");
+		return 0;
+	}
+
+	io_prep_pread(iocb, dev_fd(where->dev), devbuf->buf, where->size, where->start);
+	iocb->data = devbuf;
+
+	do {
+		r = io_submit(_aio_ctx, 1L, &iocb);
+		if (r ==1)
+			break;	/* Success */
+		if (r == -EAGAIN) {
+			/* Try to release some resources then retry */
+			usleep(100);
+			if (dev_async_getevents())
+				return_0;
+			/* FIXME Add counter/timeout so we can't get stuck here for ever */
+			continue;
+		}
+		/*
+		 * Possible errors:
+		 *   EFAULT - invalid data
+		 *   ENOSYS - no aio support in kernel
+		 *   EBADF  - bad file descriptor in iocb
+		 *   EINVAL - invalid _aio_ctx / iocb not initialised / invalid operation for this fd
+		 */
+		log_error("Asynchronous event submission failed: %s", strerror(-r));
+		return 0;
+	} while (1);
+
+	return 1;
+}
+
+void dev_async_exit(void)
+{
+	struct device_buffer *devbuf, *tmp;
+	lvm_callback_fn_t dev_read_callback_fn;
+	void *dev_read_callback_context;
+	int r;
+
+	if (!_aio_ctx)
+		return;
+
+	/* Discard any queued requests */
+        dm_list_iterate_items_gen_safe(devbuf, tmp, &_aio_queue, aio_queued) {
+                dm_list_del(&devbuf->aio_queued);
+
+		_update_aio_counters(-1, -devbuf->where.size);
+
+		dev_read_callback_fn = devbuf->dev_read_callback_fn;
+		dev_read_callback_context = devbuf->dev_read_callback_context;
+
+		_release_devbuf(devbuf);
+
+		if (dev_read_callback_fn)
+			dev_read_callback_fn(1, AIO_SUPPORTED_CODE_PATH, dev_read_callback_context, NULL);
+        }
+
+	log_debug_io("Destroying aio context.");
+	if ((r = io_destroy(_aio_ctx)) < 0)
+		/* Returns -ENOSYS if aio not in kernel or -EINVAL if _aio_ctx invalid */
+		log_error("Failed to destroy asynchronous I/O context: %s", strerror(-r));
+
+	dm_free(_aio_events);
+	_aio_events = NULL;
+
+	_aio_ctx = 0;
+}
+
+static void _queue_aio(struct device_buffer *devbuf)
+{
+	dm_list_add(&_aio_queue, &devbuf->aio_queued);
+	log_debug_io("Queueing aio.");
+}
+
+#else
+
+static int _aio_ctx = 0;
+static int _aio_must_queue = 0;
+
+int dev_async_setup(struct cmd_context *cmd)
+{
+	return 1;
+}
+
+int dev_async_reset(struct cmd_context *cmd)
+{
+	return 1;
+}
+
+int dev_async_getevents(void)
+{
+	return 1;
+}
+
+void dev_async_exit(void)
+{
+}
+
+static int _io_async(struct device_buffer *devbuf)
+{
+	return 0;
+}
+
+static void _queue_aio(struct device_buffer *devbuf)
+{
+}
+
+#endif /* AIO_SUPPORT */
+
 /*-----------------------------------------------------------------
 * The standard io loop that keeps submitting an io until it's
 * all gone.
 *---------------------------------------------------------------*/
-static int _io(struct device_area *where, char *buffer, int should_write)
+static int _io_sync(struct device_buffer *devbuf)
 {
+	struct device_area *where = &devbuf->where;
 	int fd = dev_fd(where->dev);
+	char *buffer = devbuf->buf;
 	ssize_t n = 0;
 	size_t total = 0;

-	if (fd < 0) {
-		log_error("Attempt to read an unopened device (%s).",
-			  dev_name(where->dev));
-		return 0;
-	}
-
-	/*
-	 * Skip all writes in test mode.
-	 */
-	if (should_write && test_mode())
-		return 1;
-
-	if (where->size > SSIZE_MAX) {
-		log_error("Read size too large: %" PRIu64, where->size);
-		return 0;
-	}
-
 	if (lseek(fd, (off_t) where->start, SEEK_SET) == (off_t) -1) {
 		log_error("%s: lseek %" PRIu64 " failed: %s",
 			  dev_name(where->dev), (uint64_t) where->start,
@@ -92,7 +401,7 @@ static int _io(struct device_area *where, char *buffer, int should_write)

 	while (total < (size_t) where->size) {
 		do
-			n = should_write ?
+			n = devbuf->write ?
 			    write(fd, buffer, (size_t) where->size - total) :
 			    read(fd, buffer, (size_t) where->size - total);
 		while ((n < 0) && ((errno == EINTR) || (errno == EAGAIN)));
@@ -100,7 +409,7 @@ static int _io(struct device_area *where, char *buffer, int should_write)
 		if (n < 0)
 			log_error_once("%s: %s failed after %" PRIu64 " of %" PRIu64
 				       " at %" PRIu64 ": %s", dev_name(where->dev),
-				       should_write ? "write" : "read",
+				       devbuf->write ? "write" : "read",
 				       (uint64_t) total,
 				       (uint64_t) where->size,
 				       (uint64_t) where->start, strerror(errno));
@@ -115,6 +424,42 @@ static int _io(struct device_area *where, char *buffer, int should_write)
 	return (total == (size_t) where->size);
 }

+static int _io(struct device_buffer *devbuf, unsigned ioflags)
+{
+	struct device_area *where = &devbuf->where;
+	int fd = dev_fd(where->dev);
+	int async = (!devbuf->write && _aio_ctx && aio_supported_code_path(ioflags) && devbuf->dev_read_callback_fn) ? 1 : 0;
+
+	if (fd < 0) {
+		log_error("Attempt to read an unopened device (%s).",
+			  dev_name(where->dev));
+		return 0;
+	}
+
+	if (!devbuf->buf && !(devbuf->malloc_address = devbuf->buf = dm_malloc_aligned((size_t) devbuf->where.size, 0))) {
+		log_error("I/O buffer malloc failed");
+		return 0;
+	}
+
+	log_debug_io("%s %s(fd %d):%8" PRIu64 " bytes (%ssync) at %" PRIu64 "%s (for %s)",
+		     devbuf->write ? "Write" : "Read ", dev_name(where->dev), fd,
+		     where->size, async ? "a" : "", (uint64_t) where->start,
+		     (devbuf->write && test_mode()) ? " (test mode - suppressed)" : "", _reason_text(devbuf->reason));
+
+	/*
+	 * Skip all writes in test mode.
+	 */
+	if (devbuf->write && test_mode())
+		return 1;
+
+	if (where->size > SSIZE_MAX) {
+		log_error("Read size too large: %" PRIu64, where->size);
+		return 0;
+	}
+
+	return async ? _io_async(devbuf) : _io_sync(devbuf);
+}
+
 /*-----------------------------------------------------------------
 * LVM2 uses O_DIRECT when performing metadata io, which requires
 * block size aligned accesses.  If any io is not aligned we have
@@ -142,7 +487,7 @@ int dev_get_block_size(struct device *dev, unsigned int *physical_block_size, un
 			r = 0;
 			goto out;
 		}
-		log_debug_devs("%s: block size is %u bytes", name, dev->block_size);
+		log_debug_devs("%s: Block size is %u bytes", name, dev->block_size);
 	}

 #ifdef BLKPBSZGET
@@ -153,7 +498,7 @@ int dev_get_block_size(struct device *dev, unsigned int *physical_block_size, un
 			r = 0;
 			goto out;
 		}
-		log_debug_devs("%s: physical block size is %u bytes", name, dev->phys_block_size);
+		log_debug_devs("%s: Physical block size is %u bytes", name, dev->phys_block_size);
 	}
 #elif defined (BLKSSZGET)
 	/* if we can't get physical block size, just use logical block size instead */
@@ -163,15 +508,13 @@ int dev_get_block_size(struct device *dev, unsigned int *physical_block_size, un
 			r = 0;
 			goto out;
 		}
-		log_debug_devs("%s: physical block size can't be determined, using logical "
-			       "block size of %u bytes", name, dev->phys_block_size);
+		log_debug_devs("%s: Physical block size can't be determined: Using logical block size of %u bytes", name, dev->phys_block_size);
 	}
 #else
 	/* if even BLKSSZGET is not available, use default 512b */
 	if (dev->phys_block_size == -1) {
 		dev->phys_block_size = 512;
-		log_debug_devs("%s: physical block size can't be determined, using block "
-			       "size of %u bytes instead", name, dev->phys_block_size);
+		log_debug_devs("%s: Physical block size can't be determined: Using block size of %u bytes instead", name, dev->phys_block_size);
 	}
 #endif

@@ -206,14 +549,16 @@ static void _widen_region(unsigned int block_size, struct device_area *region,
 		result->size += block_size - delta;
 }

-static int _aligned_io(struct device_area *where, char *buffer,
-		       int should_write)
+static int _aligned_io(struct device_area *where, char *write_buffer,
+		       int should_write, dev_io_reason_t reason,
+		       unsigned ioflags, lvm_callback_fn_t dev_read_callback_fn, void *dev_read_callback_context)
 {
-	char *bounce, *bounce_buf;
 	unsigned int physical_block_size = 0;
 	unsigned int block_size = 0;
+	unsigned buffer_was_widened = 0;
 	uintptr_t mask;
 	struct device_area widened;
+	struct device_buffer *devbuf;
 	int r = 0;

 	if (!(where->dev->flags & DEV_REGULAR) &&
@@ -223,53 +568,93 @@ static int _aligned_io(struct device_area *where, char *buffer,
 	if (!block_size)
 		block_size = lvm_getpagesize();

+	/* Apply minimum read size */
+	if (!should_write && block_size < MIN_READ_SIZE)
+		block_size = MIN_READ_SIZE;
+
+	mask = block_size - 1;
+
 	_widen_region(block_size, where, &widened);

-	/* Do we need to use a bounce buffer? */
-	mask = block_size - 1;
-	if (!memcmp(where, &widened, sizeof(widened)) &&
-	    !((uintptr_t) buffer & mask))
-		return _io(where, buffer, should_write);
+	/* Did we widen the buffer?  When writing, this means means read-modify-write. */
+	if (where->size != widened.size || where->start != widened.start) {
+		buffer_was_widened = 1;
+		log_debug_io("Widening request for %" PRIu64 " bytes at %" PRIu64 " to %" PRIu64 " bytes at %" PRIu64 " on %s (for %s)",
+			     where->size, (uint64_t) where->start, widened.size, (uint64_t) widened.start, dev_name(where->dev), _reason_text(reason));
+	} 

-	/* Allocate a bounce buffer with an extra block */
-	if (!(bounce_buf = bounce = dm_malloc((size_t) widened.size + block_size))) {
-		log_error("Bounce buffer malloc failed");
-		return 0;
+	devbuf = DEV_DEVBUF(where->dev, reason);
+	_release_devbuf(devbuf);
+	devbuf->where.dev = where->dev;
+	devbuf->where.start = widened.start;
+	devbuf->where.size = widened.size;
+	devbuf->write = should_write;
+	devbuf->reason = reason;
+	devbuf->dev_read_callback_fn = dev_read_callback_fn;
+	devbuf->dev_read_callback_context = dev_read_callback_context;
+
+	/* Store location of requested data relative to start of buf */
+	devbuf->data_offset = where->start - devbuf->where.start;
+
+	if (should_write && !buffer_was_widened && !((uintptr_t) write_buffer & mask))
+		/* Perform the I/O directly. */
+		devbuf->buf = write_buffer;
+	else if (!should_write)
+		/* Postpone buffer allocation until we're about to issue the I/O */
+		devbuf->buf = NULL;
+	else {
+		/* Allocate a bounce buffer with an extra block */
+		if (!(devbuf->malloc_address = devbuf->buf = dm_malloc((size_t) devbuf->where.size + block_size))) {
+			log_error("Bounce buffer malloc failed");
+			return 0;
+		}
+
+		/*
+		 * Realign start of bounce buffer (using the extra sector)
+		 */
+		if (((uintptr_t) devbuf->buf) & mask)
+			devbuf->buf = (char *) ((((uintptr_t) devbuf->buf) + mask) & ~mask);
 	}

-	/*
-	 * Realign start of bounce buffer (using the extra sector)
-	 */
-	if (((uintptr_t) bounce) & mask)
-		bounce = (char *) ((((uintptr_t) bounce) + mask) & ~mask);
+	/* If we've reached our concurrent AIO limit, add this request to the queue */
+	if (!devbuf->write && _aio_ctx && aio_supported_code_path(ioflags) && dev_read_callback_fn && _aio_must_queue) {
+		_queue_aio(devbuf);
+		return 1;
+	}

-	/* channel the io through the bounce buffer */
-	if (!_io(&widened, bounce, 0)) {
+	devbuf->write = 0;
+
+	/* Do we need to read into the bounce buffer? */
+	if ((!should_write || buffer_was_widened) && !_io(devbuf, ioflags)) {
 		if (!should_write)
-			goto_out;
+			goto_bad;
+		/* FIXME Handle errors properly! */
 		/* FIXME pre-extend the file */
-		memset(bounce, '\n', widened.size);
+		memset(devbuf->buf, '\n', devbuf->where.size);
 	}

-	if (should_write) {
-		memcpy(bounce + (where->start - widened.start), buffer,
-		       (size_t) where->size);
+	if (!should_write)
+		return 1;

-		/* ... then we write */
-		if (!(r = _io(&widened, bounce, 1)))
-			stack;
-			
-		goto out;
+	/* writes */
+
+	if (devbuf->malloc_address) {
+		memcpy((char *) devbuf->buf + devbuf->data_offset, write_buffer, (size_t) where->size);
+		log_debug_io("Overwriting %" PRIu64 " bytes at %" PRIu64 " (for %s)", where->size,
+			     (uint64_t) where->start, _reason_text(devbuf->reason));
 	}

-	memcpy(buffer, bounce + (where->start - widened.start),
-	       (size_t) where->size);
+	/* ... then we write */
+	devbuf->write = 1;
+	if (!(r = _io(devbuf, 0)))
+		goto_bad;

-	r = 1;
+	_release_devbuf(devbuf);
+	return 1;

-out:
-	dm_free(bounce_buf);
-	return r;
+bad:
+	_release_devbuf(devbuf);
+	return 0;
 }

 static int _dev_get_size_file(struct device *dev, uint64_t *size)
@@ -405,8 +790,8 @@ int dev_get_size(struct device *dev, uint64_t *size)

 	if ((dev->flags & DEV_REGULAR))
 		return _dev_get_size_file(dev, size);
-	else
-		return _dev_get_size_dev(dev, size);
+
+	return _dev_get_size_dev(dev, size);
 }

 int dev_get_read_ahead(struct device *dev, uint32_t *read_ahead)
@@ -463,11 +848,12 @@ int dev_open_flags(struct device *dev, int flags, int direct, int quiet)
 			return 1;
 		}

-		if (dev->open_count && !need_excl) {
-			log_debug_devs("%s already opened read-only. Upgrading "
+		if (dev->open_count && !need_excl)
+			log_debug_devs("%s: Already opened read-only. Upgrading "
 				       "to read-write.", dev_name(dev));
-			dev->open_count++;
-		}
+
+		/* dev_close_immediate will decrement this */
+		dev->open_count++;

 		dev_close_immediate(dev);
 		// FIXME: dev with DEV_ALLOCED is released
@@ -621,6 +1007,7 @@ static void _close(struct device *dev)
 	dev->phys_block_size = -1;
 	dev->block_size = -1;
 	dm_list_del(&dev->open_list);
+	devbufs_release(dev);

 	log_debug_devs("Closed %s", dev_name(dev));

@@ -693,72 +1080,138 @@ static void _dev_inc_error_count(struct device *dev)
 			 dev->max_error_count, dev_name(dev));
 }

-int dev_read(struct device *dev, uint64_t offset, size_t len, void *buffer)
+/*
+ * Data is returned (read-only) at DEV_DEVBUF_DATA(dev, reason).
+ * If dev_read_callback_fn is supplied, we always return 1 and take
+ * responsibility for calling it exactly once.  This might happen before the
+ * function returns (if there's an error or the I/O is synchronous) or after.
+ * Any error is passed to that function, which must track it if required.
+ */
+static int _dev_read_callback(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason,
+			      unsigned ioflags, lvm_callback_fn_t dev_read_callback_fn, void *callback_context)
 {
 	struct device_area where;
-	int ret;
+	struct device_buffer *devbuf;
+	uint64_t buf_end;
+	int cached = 0;
+	int ret = 0;

-	if (!dev->open_count)
-		return_0;
+	if (!dev->open_count) {
+		log_error(INTERNAL_ERROR "Attempt to access device %s while closed.", dev_name(dev));
+		goto out;
+	}

 	if (!_dev_is_valid(dev))
-		return 0;
+		goto_out;
+
+	/*
+	 * Can we satisfy this from data we stored last time we read?
+	 */
+	if ((devbuf = DEV_DEVBUF(dev, reason)) && devbuf->malloc_address) {
+		buf_end = devbuf->where.start + devbuf->where.size - 1;
+		if (offset >= devbuf->where.start && offset <= buf_end && offset + len - 1 <= buf_end) {
+			/* Reuse this buffer */
+			cached = 1;
+			devbuf->data_offset = offset - devbuf->where.start;
+			log_debug_io("Cached read for %" PRIu64 " bytes at %" PRIu64 " on %s (for %s)",
+				     (uint64_t) len, (uint64_t) offset, dev_name(dev), _reason_text(reason));
+			ret = 1;
+			goto out;
+		}
+	}

 	where.dev = dev;
 	where.start = offset;
 	where.size = len;

-	// fprintf(stderr, "READ: %s, %lld, %d\n", dev_name(dev), offset, len);
-
-	ret = _aligned_io(&where, buffer, 0);
-	if (!ret)
+	ret = _aligned_io(&where, NULL, 0, reason, ioflags, dev_read_callback_fn, callback_context);
+	if (!ret) {
+		log_error("Read from %s failed", dev_name(dev));
 		_dev_inc_error_count(dev);
+	}
+
+out:
+	/* If we had an error or this was sync I/O, pass the result to any callback fn */
+	if ((!ret || !_aio_ctx || !aio_supported_code_path(ioflags) || cached) && dev_read_callback_fn) {
+		dev_read_callback_fn(!ret, ioflags, callback_context, DEV_DEVBUF_DATA(dev, reason));
+		return 1;
+	}

 	return ret;
 }

-/*
- * Read from 'dev' into 'buf', possibly in 2 distinct regions, denoted
- * by (offset,len) and (offset2,len2).  Thus, the total size of
- * 'buf' should be len+len2.
- */
-int dev_read_circular(struct device *dev, uint64_t offset, size_t len,
-		      uint64_t offset2, size_t len2, char *buf)
+void dev_read_callback(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason,
+		      unsigned ioflags, lvm_callback_fn_t dev_read_callback_fn, void *callback_context)
 {
-	if (!dev_read(dev, offset, len, buf)) {
+	/* Always returns 1 if callback fn is supplied */
+	if (!_dev_read_callback(dev, offset, len, reason, ioflags, dev_read_callback_fn, callback_context))
+		log_error(INTERNAL_ERROR "_dev_read_callback failed");
+}
+
+/* Returns pointer to read-only buffer. Caller does not free it.  */
+const char *dev_read(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason)
+{
+	if (!_dev_read_callback(dev, offset, len, reason, 0, NULL, NULL))
+		return_NULL;
+
+	return DEV_DEVBUF_DATA(dev, reason);
+}
+
+/* Read into supplied retbuf owned by the caller. */
+int dev_read_buf(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, void *retbuf)
+{
+	if (!_dev_read_callback(dev, offset, len, reason, 0, NULL, NULL)) {
 		log_error("Read from %s failed", dev_name(dev));
 		return 0;
 	}
-
-	/*
-	 * The second region is optional, and allows for
-	 * a circular buffer on the device.
-	 */
-	if (!len2)
-		return 1;
-
-	if (!dev_read(dev, offset2, len2, buf + len)) {
-		log_error("Circular read from %s failed",
-			  dev_name(dev));
-		return 0;
-	}
+	
+	memcpy(retbuf, DEV_DEVBUF_DATA(dev, reason), len);

 	return 1;
 }

+/*
+ * Read from 'dev' in 2 distinct regions, denoted by (offset,len) and (offset2,len2).
+ * Caller is responsible for dm_free().
+ */
+const char *dev_read_circular(struct device *dev, uint64_t offset, size_t len,
+			uint64_t offset2, size_t len2, dev_io_reason_t reason)
+{
+	char *buf = NULL;
+
+	if (!(buf = dm_malloc(len + len2))) {
+		log_error("Buffer allocation failed for split metadata.");
+		return NULL;
+	}
+
+	if (!dev_read_buf(dev, offset, len, reason, buf)) {
+		log_error("Read from %s failed", dev_name(dev));
+		dm_free(buf);
+		return NULL;
+	}
+
+	if (!dev_read_buf(dev, offset2, len2, reason, buf + len)) {
+		log_error("Circular read from %s failed", dev_name(dev));
+		dm_free(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
 /* FIXME If O_DIRECT can't extend file, dev_extend first; dev_truncate after.
 *       But fails if concurrent processes writing
 */

 /* FIXME pre-extend the file */
-int dev_append(struct device *dev, size_t len, char *buffer)
+int dev_append(struct device *dev, size_t len, dev_io_reason_t reason, char *buffer)
 {
 	int r;

 	if (!dev->open_count)
 		return_0;

-	r = dev_write(dev, dev->end, len, buffer);
+	r = dev_write(dev, dev->end, len, reason, buffer);
 	dev->end += (uint64_t) len;

 #ifndef O_DIRECT_SUPPORT
@@ -767,7 +1220,7 @@ int dev_append(struct device *dev, size_t len, char *buffer)
 	return r;
 }

-int dev_write(struct device *dev, uint64_t offset, size_t len, void *buffer)
+int dev_write(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, void *buffer)
 {
 	struct device_area where;
 	int ret;
@@ -778,23 +1231,28 @@ int dev_write(struct device *dev, uint64_t offset, size_t len, void *buffer)
 	if (!_dev_is_valid(dev))
 		return 0;

+	if (!len) {
+		log_error(INTERNAL_ERROR "Attempted to write 0 bytes to %s at " FMTu64, dev_name(dev), offset);
+		return 0;
+	}
+
 	where.dev = dev;
 	where.start = offset;
 	where.size = len;

 	dev->flags |= DEV_ACCESSED_W;

-	ret = _aligned_io(&where, buffer, 1);
+	ret = _aligned_io(&where, buffer, 1, reason, 0, NULL, NULL);
 	if (!ret)
 		_dev_inc_error_count(dev);

 	return ret;
 }

-int dev_set(struct device *dev, uint64_t offset, size_t len, int value)
+int dev_set(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, int value)
 {
 	size_t s;
-	char buffer[4096] __attribute__((aligned(8)));
+	char buffer[4096] __attribute__((aligned(4096)));

 	if (!dev_open(dev))
 		return_0;
@@ -810,7 +1268,7 @@ int dev_set(struct device *dev, uint64_t offset, size_t len, int value)
 	memset(buffer, value, sizeof(buffer));
 	while (1) {
 		s = len > sizeof(buffer) ? sizeof(buffer) : len;
-		if (!dev_write(dev, offset, s, buffer))
+		if (!dev_write(dev, offset, s, reason, buffer))
 			break;

 		len -= s;
--- a/lib/device/dev-luks.c
+++ b/lib/device/dev-luks.c
@@ -31,7 +31,7 @@ int dev_is_luks(struct device *dev, uint64_t *offset_found)
 	if (offset_found)
 		*offset_found = 0;

-	if (!dev_read(dev, 0, LUKS_SIGNATURE_SIZE, buf))
+	if (!dev_read_buf(dev, 0, LUKS_SIGNATURE_SIZE, DEV_IO_SIGNATURES, buf))
 		goto_out;

 	ret = memcmp(buf, LUKS_SIGNATURE, LUKS_SIGNATURE_SIZE) ? 0 : 1;
--- a/lib/device/dev-md.c
+++ b/lib/device/dev-md.c
@@ -28,7 +28,7 @@
 #define MD_SB_MAGIC 0xa92b4efc
 #define MD_RESERVED_BYTES (64 * 1024ULL)
 #define MD_RESERVED_SECTORS (MD_RESERVED_BYTES / 512)
-#define MD_NEW_SIZE_SECTORS(x) ((x & ~(MD_RESERVED_SECTORS - 1)) \
+#define MD_NEW_SIZE_SECTORS(x) (((x) & ~(MD_RESERVED_SECTORS - 1)) \
 				- MD_RESERVED_SECTORS)
 #define MD_MAX_SYSFS_SIZE 64

@@ -37,7 +37,7 @@ static int _dev_has_md_magic(struct device *dev, uint64_t sb_offset)
 	uint32_t md_magic;

 	/* Version 1 is little endian; version 0.90.0 is machine endian */
-	if (dev_read(dev, sb_offset, sizeof(uint32_t), &md_magic) &&
+	if (dev_read_buf(dev, sb_offset, sizeof(uint32_t), DEV_IO_SIGNATURES, &md_magic) &&
 	    ((md_magic == MD_SB_MAGIC) ||
 	     ((MD_SB_MAGIC != xlate32(MD_SB_MAGIC)) && (md_magic == xlate32(MD_SB_MAGIC)))))
 		return 1;
@@ -261,8 +261,7 @@ out:
 /*
 * Retrieve chunk size from md device using sysfs.
 */
-static unsigned long dev_md_chunk_size(struct dev_types *dt,
-				       struct device *dev)
+static unsigned long _dev_md_chunk_size(struct dev_types *dt, struct device *dev)
 {
 	const char *attribute = "chunk_size";
 	unsigned long chunk_size_bytes = 0UL;
@@ -280,7 +279,7 @@ static unsigned long dev_md_chunk_size(struct dev_types *dt,
 /*
 * Retrieve level from md device using sysfs.
 */
-static int dev_md_level(struct dev_types *dt, struct device *dev)
+static int _dev_md_level(struct dev_types *dt, struct device *dev)
 {
 	char level_string[MD_MAX_SYSFS_SIZE];
 	const char *attribute = "level";
@@ -303,7 +302,7 @@ static int dev_md_level(struct dev_types *dt, struct device *dev)
 /*
 * Retrieve raid_disks from md device using sysfs.
 */
-static int dev_md_raid_disks(struct dev_types *dt, struct device *dev)
+static int _dev_md_raid_disks(struct dev_types *dt, struct device *dev)
 {
 	const char *attribute = "raid_disks";
 	int raid_disks = 0;
@@ -327,15 +326,15 @@ unsigned long dev_md_stripe_width(struct dev_types *dt, struct device *dev)
 	unsigned long stripe_width_sectors = 0UL;
 	int level, raid_disks, data_disks;

-	chunk_size_sectors = dev_md_chunk_size(dt, dev);
+	chunk_size_sectors = _dev_md_chunk_size(dt, dev);
 	if (!chunk_size_sectors)
 		return 0;

-	level = dev_md_level(dt, dev);
+	level = _dev_md_level(dt, dev);
 	if (level < 0)
 		return 0;

-	raid_disks = dev_md_raid_disks(dt, dev);
+	raid_disks = _dev_md_raid_disks(dt, dev);
 	if (!raid_disks)
 		return 0;

--- a/lib/device/dev-swap.c
+++ b/lib/device/dev-swap.c
@@ -20,8 +20,7 @@
 #define MAX_PAGESIZE	(64 * 1024)
 #define SIGNATURE_SIZE  10

-static int
-_swap_detect_signature(const char *buf)
+static int _swap_detect_signature(const char *buf)
 {
 	if (memcmp(buf, "SWAP-SPACE", 10) == 0 ||
            memcmp(buf, "SWAPSPACE2", 10) == 0)
@@ -61,8 +60,7 @@ int dev_is_swap(struct device *dev, uint64_t *offset_found)
 			continue;
 		if (size < (page >> SECTOR_SHIFT))
 			break;
-		if (!dev_read(dev, page - SIGNATURE_SIZE,
-			      SIGNATURE_SIZE, buf)) {
+		if (!dev_read_buf(dev, page - SIGNATURE_SIZE, SIGNATURE_SIZE, DEV_IO_SIGNATURES, buf)) {
 			ret = -1;
 			break;
 		}
--- a/lib/device/dev-type.c
+++ b/lib/device/dev-type.c
@@ -76,7 +76,7 @@ struct dev_types *create_dev_types(const char *proc_dir,
 			i++;

 		/* If it's not a number it may be name of section */
-		line_maj = atoi(((char *) (line + i)));
+		line_maj = atoi(line + i);

 		if (line_maj < 0 || line_maj >= NUMBER_OF_MAJORS) {
 			/*
@@ -363,7 +363,7 @@ static int _has_partition_table(struct device *dev)
 		uint16_t magic;
 	} __attribute__((packed)) buf; /* sizeof() == SECTOR_SIZE */

-	if (!dev_read(dev, UINT64_C(0), sizeof(buf), &buf))
+	if (!dev_read_buf(dev, UINT64_C(0), sizeof(buf), DEV_IO_SIGNATURES, &buf))
 		return_0;

 	/* FIXME Check for other types of partition table too */
@@ -615,38 +615,38 @@ static int _blkid_wipe(blkid_probe probe, struct device *dev, const char *name,
 			if (force < DONT_PROMPT) {
 				log_error(MSG_FAILED_SIG_OFFSET, type, name);
 				return 0;
-			} else {
-				log_error("WARNING: " MSG_FAILED_SIG_OFFSET MSG_WIPING_SKIPPED, type, name);
-				return 2;
 			}
+
+			log_error("WARNING: " MSG_FAILED_SIG_OFFSET MSG_WIPING_SKIPPED, type, name);
+			return 2;
 		}
 		if (blkid_probe_lookup_value(probe, "SBMAGIC", &magic, &len)) {
 			if (force < DONT_PROMPT) {
 				log_error(MSG_FAILED_SIG_LENGTH, type, name);
 				return 0;
-			} else {
-				log_warn("WARNING: " MSG_FAILED_SIG_LENGTH MSG_WIPING_SKIPPED, type, name);
-				return 2;
 			}
+
+			log_warn("WARNING: " MSG_FAILED_SIG_LENGTH MSG_WIPING_SKIPPED, type, name);
+			return 2;
 		}
 	} else if (!blkid_probe_lookup_value(probe, "PTTYPE", &type, NULL)) {
 		if (blkid_probe_lookup_value(probe, "PTMAGIC_OFFSET", &offset, NULL)) {
 			if (force < DONT_PROMPT) {
 				log_error(MSG_FAILED_SIG_OFFSET, type, name);
 				return 0;
-			} else {
-				log_warn("WARNING: " MSG_FAILED_SIG_OFFSET MSG_WIPING_SKIPPED, type, name);
-				return 2;
 			}
+
+			log_warn("WARNING: " MSG_FAILED_SIG_OFFSET MSG_WIPING_SKIPPED, type, name);
+			return 2;
 		}
 		if (blkid_probe_lookup_value(probe, "PTMAGIC", &magic, &len)) {
 			if (force < DONT_PROMPT) {
 				log_error(MSG_FAILED_SIG_LENGTH, type, name);
 				return 0;
-			} else {
-				log_warn("WARNING: " MSG_FAILED_SIG_LENGTH MSG_WIPING_SKIPPED, type, name);
-				return 2;
 			}
+
+			log_warn("WARNING: " MSG_FAILED_SIG_LENGTH MSG_WIPING_SKIPPED, type, name);
+			return 2;
 		}
 		usage = "partition table";
 	} else
@@ -675,7 +675,7 @@ static int _blkid_wipe(blkid_probe probe, struct device *dev, const char *name,
 	} else
 		log_verbose(_msg_wiping, type, name);

-	if (!dev_set(dev, offset_value, len, 0)) {
+	if (!dev_set(dev, offset_value, len, DEV_IO_SIGNATURES, 0)) {
 		log_error("Failed to wipe %s signature on %s.", type, name);
 		return 0;
 	}
@@ -772,7 +772,7 @@ static int _wipe_signature(struct device *dev, const char *type, const char *nam
 	}

 	log_print_unless_silent("Wiping %s on %s.", type, name);
-	if (!dev_set(dev, offset_found, wipe_len, 0)) {
+	if (!dev_set(dev, offset_found, wipe_len, DEV_IO_SIGNATURES, 0)) {
 		log_error("Failed to wipe %s on %s.", type, name);
 		return 0;
 	}
--- a/lib/device/device.h
+++ b/lib/device/device.h
@@ -32,6 +32,18 @@
 #define DEV_ASSUMED_FOR_LV	0x00000200	/* Is device assumed for an LV */
 #define DEV_NOT_O_NOATIME	0x00000400	/* Don't use O_NOATIME */

+/* ioflags */
+#define AIO_SUPPORTED_CODE_PATH	0x00000001	/* Set if the code path supports AIO */
+
+#define aio_supported_code_path(ioflags)       (((ioflags) & AIO_SUPPORTED_CODE_PATH) ? 1 : 0)
+
+/*
+ * Standard format for callback functions.
+ * When provided, callback functions are called exactly once.
+ * If failed is set, data cannot be accessed.
+ */
+typedef void (*lvm_callback_fn_t)(int failed, unsigned ioflags, void *context, const void *data);
+
 /*
 * Support for external device info.
 * Any new external device info source needs to be
@@ -49,6 +61,48 @@ struct dev_ext {
 	void *handle;
 };

+/*
+ * All I/O is annotated with the reason it is performed.
+ */
+typedef enum dev_io_reason {
+	DEV_IO_SIGNATURES = 0,	/* Scanning device signatures */
+	DEV_IO_LABEL,		/* LVM PV disk label */
+	DEV_IO_MDA_HEADER,	/* Text format metadata area header */
+	DEV_IO_MDA_CONTENT,	/* Text format metadata area content */
+	DEV_IO_MDA_EXTRA_HEADER,	/* Header of any extra metadata areas on device */
+	DEV_IO_MDA_EXTRA_CONTENT,	/* Content of any extra metadata areas on device */
+	DEV_IO_FMT1,		/* Original LVM1 metadata format */
+	DEV_IO_POOL,		/* Pool metadata format */
+	DEV_IO_LV,		/* Content written to an LV */
+	DEV_IO_LOG		/* Logging messages */
+} dev_io_reason_t;
+
+/*
+ * Is this I/O for a device's extra metadata area?
+ */
+#define EXTRA_IO(reason) ((reason) == DEV_IO_MDA_EXTRA_HEADER || (reason) == DEV_IO_MDA_EXTRA_CONTENT)
+#define DEV_DEVBUF(dev, reason) (EXTRA_IO((reason)) ? &(dev)->last_extra_devbuf : &(dev)->last_devbuf)
+#define DEV_DEVBUF_DATA(dev, reason) ((char *) DEV_DEVBUF((dev), (reason))->buf + DEV_DEVBUF((dev), (reason))->data_offset)
+
+struct device_area {
+	struct device *dev;
+	uint64_t start;		/* Bytes */
+	uint64_t size;		/* Bytes */
+};
+
+struct device_buffer {
+	uint64_t data_offset;	/* Offset to start of requested data within buf */
+	void *malloc_address;	/* Start of allocated memory */
+	void *buf;		/* Aligned buffer that contains data within it */
+	struct device_area where;	/* Location of buf */
+	dev_io_reason_t reason;
+	unsigned write:1;	/* 1 if write; 0 if read */
+
+	lvm_callback_fn_t dev_read_callback_fn;
+	void *dev_read_callback_context;
+	struct dm_list aio_queued;	/* Queue of async I/O waiting to be issued */
+};
+
 /*
 * All devices in LVM will be represented by one of these.
 * pointer comparisons are valid.
@@ -71,6 +125,8 @@ struct device {
 	uint64_t end;
 	struct dm_list open_list;
 	struct dev_ext ext;
+	struct device_buffer last_devbuf;       /* Last data buffer read from the device */
+	struct device_buffer last_extra_devbuf; /* Last data buffer read from the device for extra metadata area */

 	const char *vgid; /* if device is an LV */
 	const char *lvid; /* if device is an LV */
@@ -84,12 +140,6 @@ struct device_list {
 	struct device *dev;
 };

-struct device_area {
-	struct device *dev;
-	uint64_t start;		/* Bytes */
-	uint64_t size;		/* Bytes */
-};
-
 /*
 * Support for external device info.
 */
@@ -129,19 +179,37 @@ int dev_test_excl(struct device *dev);
 int dev_fd(struct device *dev);
 const char *dev_name(const struct device *dev);

-int dev_read(struct device *dev, uint64_t offset, size_t len, void *buffer);
-int dev_read_circular(struct device *dev, uint64_t offset, size_t len,
-		      uint64_t offset2, size_t len2, char *buf);
-int dev_write(struct device *dev, uint64_t offset, size_t len, void *buffer);
-int dev_append(struct device *dev, size_t len, char *buffer);
-int dev_set(struct device *dev, uint64_t offset, size_t len, int value);
+/* Returns a read-only buffer */
+const char *dev_read(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason);
+const char *dev_read_circular(struct device *dev, uint64_t offset, size_t len,
+			      uint64_t offset2, size_t len2, dev_io_reason_t reason);
+
+/* Passes the data (or error) to dev_read_callback_fn */
+void dev_read_callback(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason,
+		       unsigned ioflags, lvm_callback_fn_t dev_read_callback_fn, void *callback_context);
+
+/* Read data and copy it into a supplied private buffer. */
+/* Only use for tiny reads or on unimportant code paths. */
+int dev_read_buf(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, void *retbuf);
+
+int dev_write(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, void *buffer);
+int dev_append(struct device *dev, size_t len, dev_io_reason_t reason, char *buffer);
+int dev_set(struct device *dev, uint64_t offset, size_t len, dev_io_reason_t reason, int value);
 void dev_flush(struct device *dev);

 struct device *dev_create_file(const char *filename, struct device *dev,
 			       struct dm_str_list *alias, int use_malloc);
 void dev_destroy_file(struct device *dev);

+void devbufs_release(struct device *dev);
+
 /* Return a valid device name from the alias list; NULL otherwise */
 const char *dev_name_confirmed(struct device *dev, int quiet);

+struct cmd_context;
+int dev_async_getevents(void);
+int dev_async_setup(struct cmd_context *cmd);
+void dev_async_exit(void);
+int dev_async_reset(struct cmd_context *cmd);
+
 #endif
--- a/lib/display/display.c
+++ b/lib/display/display.c
@@ -1,6 +1,6 @@
 /*
 * Copyright (C) 2001-2004 Sistina Software, Inc. All rights reserved.
- * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2004-2017 Red Hat, Inc. All rights reserved.
 *
 * This file is part of LVM2.
 *
@@ -152,6 +152,30 @@ const char *display_lvname(const struct logical_volume *lv)
 	return name;
 }

+/* Display percentage with (TODO) configurable precision */
+const char *display_percent(struct cmd_context *cmd, dm_percent_t percent)
+{
+	char *buf;
+	int r;
+
+        /* Reusing same  ring buffer we use for displaying LV names */
+	if ((cmd->display_lvname_idx + NAME_LEN) >= sizeof((cmd->display_buffer)))
+		cmd->display_lvname_idx = 0;
+
+	buf = cmd->display_buffer + cmd->display_lvname_idx;
+	/* TODO: Make configurable hardcoded 2 digits */
+	r = dm_snprintf(buf, NAME_LEN, "%.2f", dm_percent_to_round_float(percent, 2));
+
+	if (r < 0) {
+		log_error("Percentage %d does not fit.", percent);
+		return NULL;
+	}
+
+	cmd->display_lvname_idx += r + 1;
+
+	return buf;
+}
+
 /* Size supplied in sectors */
 static const char *_display_size(const struct cmd_context *cmd,
 				 uint64_t size, dm_size_suffix_t suffix_type)
@@ -493,8 +517,8 @@ int lvdisplay_full(struct cmd_context *cmd,
 		log_print("LV Pool metadata       %s", seg->metadata_lv->name);
 		log_print("LV Pool data           %s", seg_lv(seg, 0)->name);
 	} else if (lv_is_cache_origin(lv)) {
-		log_print("LV origin of Cache LV  %s",
-			  get_only_segment_using_this_lv(lv)->lv->name);
+		if ((seg = get_only_segment_using_this_lv(lv)))
+			log_print("LV origin of Cache LV  %s", seg->lv->name);
 	} else if (lv_is_cache(lv)) {
 		seg = first_seg(lv);
 		if (inkernel && !lv_cache_status(lv, &cache_status))
@@ -525,12 +549,12 @@ int lvdisplay_full(struct cmd_context *cmd,
 			       snap_seg ? snap_seg->origin->size : lv->size));

 	if (cache_status) {
-		log_print("Cache used blocks      %.2f%%",
-			  dm_percent_to_float(cache_status->data_usage));
-		log_print("Cache metadata blocks  %.2f%%",
-			  dm_percent_to_float(cache_status->metadata_usage));
-		log_print("Cache dirty blocks     %.2f%%",
-			  dm_percent_to_float(cache_status->dirty_usage));
+		log_print("Cache used blocks      %s%%",
+			  display_percent(cmd, cache_status->data_usage));
+		log_print("Cache metadata blocks  %s%%",
+			  display_percent(cmd, cache_status->metadata_usage));
+		log_print("Cache dirty blocks     %s%%",
+			  display_percent(cmd, cache_status->dirty_usage));
 		log_print("Cache read hits/misses " FMTu64 " / " FMTu64,
 			  cache_status->cache->read_hits,
 			  cache_status->cache->read_misses);
@@ -546,16 +570,16 @@ int lvdisplay_full(struct cmd_context *cmd,
 	}

 	if (thin_data_active)
-		log_print("Allocated pool data    %.2f%%",
-			  dm_percent_to_float(thin_data_percent));
+		log_print("Allocated pool data    %s%%",
+			  display_percent(cmd, thin_data_percent));

 	if (thin_metadata_active)
-		log_print("Allocated metadata     %.2f%%",
-			  dm_percent_to_float(thin_metadata_percent));
+		log_print("Allocated metadata     %s%%",
+			  display_percent(cmd, thin_metadata_percent));

 	if (thin_active)
-		log_print("Mapped size            %.2f%%",
-			  dm_percent_to_float(thin_percent));
+		log_print("Mapped size            %s%%",
+			  display_percent(cmd, thin_percent));

 	log_print("Current LE             %u",
 		  snap_seg ? snap_seg->origin->le_count : lv->le_count);
@@ -566,8 +590,8 @@ int lvdisplay_full(struct cmd_context *cmd,
 		log_print("COW-table LE           %u", lv->le_count);

 		if (snap_active)
-			log_print("Allocated to snapshot  %.2f%%",
-				  dm_percent_to_float(snap_percent));
+			log_print("Allocated to snapshot  %s%%",
+				  display_percent(cmd, snap_percent));

 		log_print("Snapshot chunk size    %s",
 			  display_size(cmd, (uint64_t) snap_seg->chunk_size));
--- a/lib/display/display.h
+++ b/lib/display/display.h
@@ -1,6 +1,6 @@
 /*
 * Copyright (C) 2001-2004 Sistina Software, Inc. All rights reserved.
- * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved.
+ * Copyright (C) 2004-2017 Red Hat, Inc. All rights reserved.
 *
 * This file is part of LVM2.
 *
@@ -24,6 +24,8 @@

 const char *display_lvname(const struct logical_volume *lv);

+const char *display_percent(struct cmd_context *cmd, dm_percent_t percent);
+
 /* Specify size in KB */
 const char *display_size(const struct cmd_context *cmd, uint64_t size);
 const char *display_size_long(const struct cmd_context *cmd, uint64_t size);
--- a/lib/filters/filter-composite.c
+++ b/lib/filters/filter-composite.c
@@ -52,13 +52,13 @@ static void _composite_destroy(struct dev_filter *f)
 	dm_free(f);
 }

-static int _dump(struct dev_filter *f, int merge_existing)
+static int _dump(struct dev_filter *f, struct dm_pool *mem, int merge_existing)
 {
 	struct dev_filter **filters;

 	for (filters = (struct dev_filter **) f->private; *filters; ++filters)
 		if ((*filters)->dump &&
-		    !(*filters)->dump(*filters, merge_existing))
+		    !(*filters)->dump(*filters, mem, merge_existing))
 			return_0;

 	return 1;
--- a/lib/filters/filter-internal.c
+++ b/lib/filters/filter-internal.c
@@ -74,7 +74,7 @@ struct dev_filter *internal_filter_create(void)
 	f->destroy = _destroy;
 	f->use_count = 0;

-	log_debug_devs("internal filter initialised.");
+	log_debug_devs("Internal filter initialised.");

 	return f;
 }
--- a/Show More
+++ b/Show More