1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-12-21 13:34:40 +03:00
Commit Graph

703 Commits

Author SHA1 Message Date
Zdenek Kabelac
395ce6c2bb cov: explicitely ignore return value 2021-04-23 23:00:55 +02:00
Zdenek Kabelac
7e13586837 cov: check _insert_dev return value
Although we try later to validate device was inserted,
we can validate return value and early-exit.
2021-04-23 23:00:55 +02:00
Zdenek Kabelac
d7237ca63a cov: add checks to prevent NULL dereference 2021-04-23 23:00:55 +02:00
Zdenek Kabelac
cfe26470e3 dev-cache: change message level to debug
This case happens when i.e. we convert LV to another type,
when we change existing LV into a different type - so change
to debug level and avoid confusing users with message about
Device path  not match.

We may eventually enhnace caching code to drop cached info
after taking lock and reading VG.
2021-04-23 22:58:45 +02:00
Zdenek Kabelac
2b90466f78 devicesfile: use pool memory
Switch to use command mempool instead of zalloc() as relase
part would be required otherwise.
2021-04-23 22:58:45 +02:00
Zdenek Kabelac
80ef913872 device_id: fix memleak and free idname
Remove extra code path used only for 'free()'
and free(idname) on all paths that do not add it to list
and avoid memleak in few cases.
2021-04-23 22:57:08 +02:00
David Teigland
b94f2a8b55 remove unused flag DEV_UDEV_INFO_MISSING 2021-04-16 16:01:19 -05:00
Zdenek Kabelac
09621725d0 gcc: declaration of tmpfile shadows a global
Rename tmpfile to tmppath to avoid declaration shadowing of:

/usr/include/stdio.h:174: warning: shadowed declaration is here
2021-03-22 22:35:56 +01:00
Zdenek Kabelac
8a92f70709 cov: void unused result 2021-03-15 11:13:24 +01:00
Zdenek Kabelac
8a03675241 cov: variable initialization 2021-03-10 01:34:58 +01:00
Zdenek Kabelac
bee9b5c1d8 cov: mask uninitialized value
Coverity doesn't track ioctl() too well, so let's just make it quiet.
2021-03-10 01:34:27 +01:00
Zdenek Kabelac
d95c0e977c cov: remove unnecessary headers 2021-03-10 01:29:44 +01:00
Zdenek Kabelac
a6075fe2f2 cov: memleak on error path 2021-03-10 01:29:44 +01:00
Zdenek Kabelac
75037bee5d debug: more tracing
Check result of device_ids_write() and at least provide stack;
2021-03-10 01:27:13 +01:00
Zdenek Kabelac
7342ab06fc debug: change sys_error to sys_debug
These messages do not cause command error - so changing logging level
to just 'sys_debug' (so visible only with -vvvv)
2021-03-10 01:11:52 +01:00
Zdenek Kabelac
17802084c9 bcache: fix incorrect pointer check
With commit b44db5d1a7
needs to check allocated pointer for failed malloc().

Existing check was actually no checking anything so failing
malloc here would result in segfault (although with very
low chance to ever happen).
2021-03-10 00:59:05 +01:00
Zdenek Kabelac
2d64ffaee5 hash: use individual hint sizes
Use different 'hint' size for dm_hash_create() call - so
when debug info about hash is printed we can recognize which
hash was in use.

This patch doesn't change actual used size since that is always
rounded to be power of 2 and >=16 - so as such is only a
help to developer.

We could eventually use 'name' arg, but since this would have changed
API and this patchset will be routed to libdm & stable - we will
just use this small trick.
2021-03-08 15:33:15 +01:00
Zdenek Kabelac
1042cd9a61 cleanup: simplify condition 2021-03-02 22:57:35 +01:00
Zdenek Kabelac
eb3dcc72eb cleanup: free already checks for NULL 2021-03-02 22:57:35 +01:00
Zdenek Kabelac
fa64c51428 dev-cache: optimize dir scanning
Use 'C' for alphasort - there is no need to use localized and slower
sorting for internal directory scanning.

Ensure on all code paths allocated dirent entries are released.

Optimize full path construction.
2021-03-02 22:54:40 +01:00
Zdenek Kabelac
9dd759c6b1 dev-cache: replace inefficient looking for dev
Use btree loopkup to find dev structure by major:minor.
This could have slow down lvm2 commands significantly with
higher amount of LVs.
2021-03-02 22:54:40 +01:00
David Teigland
83fe6e720f device usage based on devices file
The LVM devices file lists devices that lvm can use.  The default
file is /etc/lvm/devices/system.devices, and the lvmdevices(8)
command is used to add or remove device entries.  If the file
does not exist, or if lvm.conf includes use_devicesfile=0, then
lvm will not use a devices file.  When the devices file is in use,
the regex filter is not used, and the filter settings in lvm.conf
or on the command line are ignored.

LVM records devices in the devices file using hardware-specific
IDs, such as the WWID, and attempts to use subsystem-specific
IDs for virtual device types.  These device IDs are also written
in the VG metadata.  When no hardware or virtual ID is available,
lvm falls back using the unstable device name as the device ID.
When devnames are used, lvm performs extra scanning to find
devices if their devname changes, e.g. after reboot.

When proper device IDs are used, an lvm command will not look
at devices outside the devices file, but when devnames are used
as a fallback, lvm will scan devices outside the devices file
to locate PVs on renamed devices.  A config setting
search_for_devnames can be used to control the scanning for
renamed devname entries.

Related to the devices file, the new command option
--devices <devnames> allows a list of devices to be specified for
the command to use, overriding the devices file.  The listed
devices act as a sort of devices file in terms of limiting which
devices lvm will see and use.  Devices that are not listed will
appear to be missing to the lvm command.

Multiple devices files can be kept in /etc/lvm/devices, which
allows lvm to be used with different sets of devices, e.g.
system devices do not need to be exposed to a specific application,
and the application can use lvm on its own set of devices that are
not exposed to the system.  The option --devicesfile <filename> is
used to select the devices file to use with the command.  Without
the option set, the default system devices file is used.

Setting --devicesfile "" causes lvm to not use a devices file.

An existing, empty devices file means lvm will see no devices.

The new command vgimportdevices adds PVs from a VG to the devices
file and updates the VG metadata to include the device IDs.
vgimportdevices -a will import all VGs into the system devices file.

LVM commands run by dmeventd not use a devices file by default,
and will look at all devices on the system.  A devices file can
be created for dmeventd (/etc/lvm/devices/dmeventd.devices)  If
this file exists, lvm commands run by dmeventd will use it.

Internal implementaion:

- device_ids_read - read the devices file
  . add struct dev_use (du) to cmd->use_devices for each devices file entry
- dev_cache_scan - get /dev entries
  . add struct device (dev) to dev_cache for each device on the system
- device_ids_match - match devices file entries to /dev entries
  . match each du on cmd->use_devices to a dev in dev_cache, using device ID
  . on match, set du->dev, dev->id, dev->flags MATCHED_USE_ID
- label_scan - read lvm headers and metadata from devices
  . filters are applied, those that do not need data from the device
  . filter-deviceid skips devs without MATCHED_USE_ID, i.e.
    skips /dev entries that are not listed in the devices file
  . read lvm label from dev
  . filters are applied, those that use data from the device
  . read lvm metadata from dev
  . add info/vginfo structs for PVs/VGs (info is "lvmcache")
- device_ids_find_renamed_devs - handle devices with unstable devname ID
  where devname changed
  . this step only needed when devs do not have proper device IDs,
    and their dev names change, e.g. after reboot sdb becomes sdc.
  . detect incorrect match because PVID in the devices file entry
    does not match the PVID found when the device was read above
  . undo incorrect match between du and dev above
  . search system devices for new location of PVID
  . update devices file with new devnames for PVIDs on renamed devices
  . label_scan the renamed devs
- continue with command processing
2021-02-23 16:43:32 -06:00
David Teigland
12667e9897 fix check for md raid imsm signature on 4k devices
On devices with 4k logical block size, the imsm signature
is located 8k from the end of the device, not 1k as is
the case for devices with 512 LBS.
2021-02-18 11:42:32 -06:00
Zdenek Kabelac
96910de4c7 dev-cache: remove duplicated allocation
Merge mistake missed to remove allocation that is now postponed
until it's really needed.
2021-02-10 15:38:18 +01:00
David Teigland
f74f94c2dd dev_get_primary_dev: fix invalid path check
Fix commit bee9f4efdd "filter-mpath: work with nvme devices"
which removed setting the path for readlink.
2021-02-09 09:52:53 -06:00
Zdenek Kabelac
427121efc7 dev-type: sysfs attrs without sectors
Split function for reading attrs in sectors.
2021-02-09 00:49:14 +01:00
Zdenek Kabelac
d422aa7924 dev-type: convert to use log_warn
Keep log_error designated only for 'erroring' condition of command
and replace these errors with log_warn() WARNING.

Also do some indent changes.
2021-02-08 23:43:38 +01:00
Zdenek Kabelac
3bf2ca11d9 dev-type: use fopen for sysfs file
Directly open sysfs files and save extra stat() call which
is not adding any extra safety in sysfs dir.
2021-02-08 23:43:38 +01:00
Zdenek Kabelac
e429e69b65 dev-type: dev_is_pmem reuses topology read code 2021-02-08 23:43:38 +01:00
Zdenek Kabelac
2c597c73a8 dev-cache: better code reuse for _add_alias
Move path copying into _add_alish together with hashing.
Remove duplicated code.
2021-02-08 23:43:38 +01:00
Zdenek Kabelac
be9b731f44 dev-cache: check for nvme name while adding alias
Instead of repeated list retest, compare name once during add of alias.
2021-02-08 23:43:38 +01:00
David Teigland
bee9f4efdd filter-mpath: work with nvme devices
Recognize when a device is nvme, and apply filter-mpath to
nvme devices in addition to scsi devices.
2021-02-02 13:01:20 -06:00
David Teigland
37227b8ad6 devs: remove invalid path name aliases
Make dev_cache_get() verify aliases and drop any
that are invalid before returning a dev for a given
name.
2021-01-15 16:31:50 -06:00
David Teigland
c601ec0d6e filters: allow filter wipe for one device
as passes_filter already does
2020-10-21 16:24:16 -05:00
Zdenek Kabelac
dd8212365d debug: update messages 2020-10-02 21:04:16 +02:00
Zdenek Kabelac
b44db5d1a7 bcache: use flexible arrays
Cleanup, allocate whole struct with a single malloc call.
2020-10-02 21:00:26 +02:00
Zdenek Kabelac
b3c7a2b3f0 bcache: support interrupts when waiting on IO
Since lvm2 normally block signals during protected
phase where it does not want to be interrupted.
Support interruptible processing when allowed
in section between sigint_allow() ... sigint_restore())
and let the 'io_getenvents()'  finish with EINTR.
2020-10-02 20:57:50 +02:00
Zdenek Kabelac
0fe58fc54f bcache: fix busy loop with too many errors
When bcache tries to write data to a faulty device,
it may get out of caching blocks and then just busy-loops
on a CPU - so this check protects this by checking
if there is already max_io (~64) errored blocks.
2020-10-02 20:56:55 +02:00
Zdenek Kabelac
41f9e372c0 bcache: fix waiting problem for completed IO
Call _wait_all() which does check whether there is still
some pending IO before sleep. Otherwise it may happen
our submitted IO operations have been already dispatched
and this call then endlessly waits for IO which are all done.
This can be reproduced when device returns quickly errors
on write requests.
2020-10-02 20:53:41 +02:00
David Teigland
450f272b31 devices: support printing the filter that rejects a device
Use of this new message function needs to be added
to various commands to improve the output.
2020-10-01 12:00:09 -05:00
Zdenek Kabelac
6728788bf5 debug: remove stacktrace on regular path
Here _insert is expected to also fail, so just regular 'return 0'.
2020-09-29 10:43:56 +02:00
Zdenek Kabelac
6c769eb460 bache: fix error return value
Return 0 as failure (as checked for).
Also add INTERNAL_ERROR if  'DI' would be -1.
2020-09-19 23:00:50 +02:00
David Teigland
1570e76233 bcache: use indirection table for fd
Add a "device index" (di) for each device, and use this
in the bcache api to the rest of lvm.  This replaces the
file descriptor (fd) in the api.  The rest of lvm uses
new functions bcache_set_fd(), bcache_clear_fd(), and
bcache_change_fd() to control which fd bcache uses for
io to a particular device.

. lvm opens a dev and gets and fd.
  fd = open(dev);

. lvm passes fd to the bcache layer and gets a di
  to use in the bcache api for the dev.
  di = bcache_set_fd(fd);

. lvm uses bcache functions, passing di for the dev.
  bcache_write_bytes(di, ...), etc.

. bcache translates di to fd to do io.

. lvm closes the device and clears the di/fd bcache state.
  close(fd);
  bcache_clear_fd(di);

In the bcache layer, a di-to-fd translation table
(int *_fd_table) is added.  When bcache needs to
perform io on a di, it uses _fd_table[di].

In the following commit, lvm will make use of the new
bcache_change_fd() function to change the fd that
bcache uses for the dev, without dropping cached blocks.
2020-09-18 15:10:11 -05:00
Zdenek Kabelac
a481f42630 cov: always initialized values
Make sure values are initialized for all possible paths.
2020-09-01 17:57:50 +02:00
Zdenek Kabelac
fd96f1014b gcc: zero-sized array to fexlible array C99
Switch remaining zero sized struct to flexible arrays to be C99
complient.

These simple rules should apply:

- The incomplete array type must be the last element within the structure.
- There cannot be an array of structures that contain a flexible array member.
- Structures that contain a flexible array member cannot be used as a member of another structure.
- The structure must contain at least one named member in addition to the flexible array member.

Although some of the code pieces should be still improved.
2020-09-01 17:57:50 +02:00
Zdenek Kabelac
7880896f0d gcc: calc size in compile time 2020-08-28 21:43:02 +02:00
Zdenek Kabelac
ce202c3b1c gcc: keep unsigned arithmetic
Avoid conversion to int.
2020-08-28 21:43:02 +02:00
David Teigland
00c9a788cc devices: simplify md superblock checking code 2020-07-09 10:48:34 -05:00
David Teigland
23774f997e devices: detect md ddf and imsm superblocks 2020-07-09 10:48:21 -05:00
Zdenek Kabelac
6eb9eba59b bcache: support longer writes
When initiated larger write request, it may have happened, bcache
got out of free chunks - fix the loop, that is supposed to wait
until next free chunk becomes avain available.
2020-06-24 15:01:03 +02:00
David Teigland
8e2938c963 improve get_fs_block_size string to number 2020-06-11 15:05:47 -05:00
David Teigland
9fbad5bb0f fix libblkid BLOCK_SIZE check 2020-06-11 12:43:07 -05:00
Zhao Heming
b59127a838 Change dev->bcache_fd default value from 0 to -1
This fix can avoid bcache_fd will mistakenly open/close in later.

Signed-off-by: Zhao Heming <heming.zhao@suse.com>
2020-06-01 12:22:15 -05:00
David Teigland
2f29765e7f devs: add some checks for a dev with no path name
It's possible for a dev-cache entry to remain after all
paths for it have been removed, and other parts of the
code expect that a dev always has a name.  A better fix
may be to remove a device from dev-cache after all paths
to it have been removed.
2020-05-13 16:26:26 -05:00
David Teigland
d9e8895a96 Allow dm-integrity to be used for raid images
dm-integrity stores checksums of the data written to an
LV, and returns an error if data read from the LV does
not match the previously saved checksum.  When used on
raid images, dm-raid will correct the error by reading
the block from another image, and the device user sees
no error.  The integrity metadata (checksums) are stored
on an internal LV allocated by lvm for each linear image.
The internal LV is allocated on the same PV as the image.

Create a raid LV with an integrity layer over each
raid image (for raid levels 1,4,5,6,10):

lvcreate --type raidN --raidintegrity y [options]

Add an integrity layer to images of an existing raid LV:

lvconvert --raidintegrity y LV

Remove the integrity layer from images of a raid LV:

lvconvert --raidintegrity n LV

Settings

Use --raidintegritymode journal|bitmap (journal is default)
to configure the method used by dm-integrity to ensure
crash consistency.

Initialization

When integrity is added to an LV, the kernel needs to
initialize the integrity metadata/checksums for all blocks
in the LV.  The data corruption checking performed by
dm-integrity will only operate on areas of the LV that
are already initialized.  The progress of integrity
initialization is reported by the "syncpercent" LV
reporting field (and under the Cpy%Sync lvs column.)

Example: create a raid1 LV with integrity:

$ lvcreate --type raid1 -m1 --raidintegrity y -n rr -L1G foo
  Creating integrity metadata LV rr_rimage_0_imeta with size 12.00 MiB.
  Logical volume "rr_rimage_0_imeta" created.
  Creating integrity metadata LV rr_rimage_1_imeta with size 12.00 MiB.
  Logical volume "rr_rimage_1_imeta" created.
  Logical volume "rr" created.
$ lvs -a foo
  LV                  VG  Attr       LSize  Origin              Cpy%Sync
  rr                  foo rwi-a-r---  1.00g                     4.93
  [rr_rimage_0]       foo gwi-aor---  1.00g [rr_rimage_0_iorig] 41.02
  [rr_rimage_0_imeta] foo ewi-ao---- 12.00m
  [rr_rimage_0_iorig] foo -wi-ao----  1.00g
  [rr_rimage_1]       foo gwi-aor---  1.00g [rr_rimage_1_iorig] 39.45
  [rr_rimage_1_imeta] foo ewi-ao---- 12.00m
  [rr_rimage_1_iorig] foo -wi-ao----  1.00g
  [rr_rmeta_0]        foo ewi-aor---  4.00m
  [rr_rmeta_1]        foo ewi-aor---  4.00m
2020-04-15 12:10:32 -05:00
David Teigland
957904933b reduce device path error messsages
When /dev entries or sysfs entries are changing
due to concurrent lvm commands, it can cause
warning/error messages about missing paths.
2020-03-12 10:18:51 -05:00
Zdenek Kabelac
f439716b75 container_of: use offsetof from stddef
Use standardized offsetof() macro from stddef.
Helps to build valid code with latest gcc10 with -O2.
2020-03-05 17:38:55 +01:00
Zdenek Kabelac
c5e5ae4c95 bcache: fix memleak on error path
clang: free io on error path.
2020-02-04 17:22:06 +01:00
Zdenek Kabelac
cff16b062b debug: avoid to slashes in debug message 2019-12-10 15:44:16 +01:00
David Teigland
56a295f78c bcache: add invalidate_bytes function 2019-11-26 16:52:28 -06:00
Zdenek Kabelac
43f149526d devtype: simplify code
Update code with simpler form and check for fclose().
2019-11-14 18:06:14 +01:00
Joe Thornber
25e7bf021a [bcache] bcache_invalidate_fd, only remove prefixes on success. 2019-10-29 15:21:11 +00:00
Joe Thornber
7e8296f478 [bcache] reverse earlier patch.
It broke some unit tests, for v. little benefit
2019-10-29 15:14:07 +00:00
Joe Thornber
2b3c39e402 [bcache] pass up the error from io_submit rather than using generic -EIO
Author: Heming Zhao
2019-10-29 10:39:20 +00:00
Joe Thornber
2938b4dcca [bcache] add bcache_abort()
This gives us a way to cope with write failures.
2019-10-28 15:00:53 +00:00
Zdenek Kabelac
a7563dc6a1 gcc: older version can't see udev is always set 2019-10-22 13:39:22 +02:00
David Teigland
fcbffbdbc0 bcache: change log level for prefetch message
The "new new blocks" message was printed as an error
but it's not an error condition.
2019-09-03 12:02:09 -05:00
David Teigland
0534cd9cd4 pvscan: disable sleeping and retrying for udev
When systemd is running pvscans, udev may not be
entirely initialized, so the pvscan should not
sleep and retry waiting for udev info.
2019-08-16 14:41:26 -05:00
David Teigland
eb6aa5fefe devices: put ifdef around BLKPBSZGET
BLKPBSZGET is not defined before kernel version 2.6.32
(e.g. rhel5)
2019-08-08 15:45:03 -05:00
David Teigland
09bc2d0fd1 devices: clean up block size functions
Replace calls to the old dev_get_block_size function
with calls to the new dev_get_direct_block_size function,
and remove the old function.
2019-08-07 11:48:10 -05:00
David Teigland
7f347698e3 Fix rounding writes up to sector size
Do this at two levels, although one would be enough to
fix the problem seen recently:

- Ignore any reported sector size other than 512 of 4096.
  If either sector size (physical or logical) is reported
  as 512, then use 512.  If neither are reported as 512,
  and one or the other is reported as 4096, then use 4096.
  If neither is reported as either 512 or 4096, then use 512.

- When rounding up a limited write in bcache to be a multiple
  of the sector size, check that the resulting write size is
  not larger than the bcache block itself.  (This shouldn't
  happen if the sector size is 512 or 4096.)
2019-07-26 14:21:08 -05:00
David Teigland
4567c6a2b2 enable full md component detection at the right time
An active md device with an end superblock causes lvm to
enable full md component detection.  This was being done
within the filter loop instead of before, so the full
filtering of some devs could be missed.

Also incorporate the recently added config setting that
controls the md component detection.
2019-07-10 13:30:50 -05:00
David Teigland
db98a6e362 Additional MD component checking
If udev info is missing for a device, (which would indicate
if it's an MD component), then do an end-of-device read to
check if a PV is an MD component.  (This is skipped when
using hints since we already know devs in hints are good.)

A new config setting md_component_checks can be used to
disable the additional end-of-device MD checks, or to
always enable end-of-device MD checks.

When both hints and udev info are disabled/unavailable,
the end of PVs will now be scanned by default.  If md
devices with end-of-device superblocks are not being
used, the extra I/O overhead can be avoided by setting
md_component_checks="start".
2019-06-07 13:27:16 -05:00
David Teigland
60bf9c9f33 hints: exclude md components
In some cases md components could be included in
the hints, so add a check to hint creation to make
sure they are excluded.
2019-05-21 11:58:01 -05:00
David Teigland
19ef399ea7 devs: rename dev_is_md dev_is_md_component
The naming was confusing and misleading since
it it's testing if a device is an md component,
not an md device.
2019-05-21 11:44:39 -05:00
David Teigland
6f18186bfd pvscan: print more reasons for ignoring devices 2019-04-05 15:48:12 -05:00
David Teigland
3ed9256985 remove unused io functions 2019-02-28 10:58:00 -06:00
David Teigland
3ebce8dbd2 apply obtain_device_list_from_udev to all libudev usage
udev_dev_is_md_component and udev_dev_is_mpath_component
are not used for obtaining the device list, but they still
use libudev for device info.  When there are problems with
udev, these functions can get stuck. So, use the existing
obtain_device_list_from_udev config setting to also control
whether these "is component" functions are used, which gives
us a way to avoid using libudev entirely when it's causing
problems.
2019-02-05 10:15:40 -06:00
David Teigland
6620dc9475 add device hints to reduce scanning
Save the list of PVs in /run/lvm/hints.  These hints
are used to reduce scanning in a number of commands
to only the PVs on the system, or only the PVs in a
requested VG (rather than all devices on the system.)
2019-01-15 10:23:47 -06:00
Zdenek Kabelac
2724a09e58 debug: tracing close errors 2018-12-21 21:45:08 +01:00
Zdenek Kabelac
82f66834ef bcache: fix memory leak on error path
Coverity noticed missing free of io struct on error path.
2018-12-21 21:45:03 +01:00
Zdenek Kabelac
cc5cfb88d7 cleanup: some local headers first 2018-12-14 15:14:48 +01:00
Zdenek Kabelac
0b19387dae headers: use configure.h as 1st. header
Ensure configure.h is always 1st. included header.
Maybe we could eventually introduce gcc -include option, but for now
this better uses dependency tracking.

Also move _REENTRANT and _GNU_SOURCE into configure.h so it
doesn't need to be present in various source files.
This ensures consistent compilation of headers like stdio.h since
it may produce different declaration.
2018-12-14 15:09:13 +01:00
David Teigland
a063d2d123 devs: use udev info to improve md component detection
Use udev info to supplement native md component detection.
2018-12-03 12:58:28 -06:00
Peter Rajnoha
cb04b84c79 scan: md metadata version 0.90 is at the end of disk
commit de28637
  scan: use full md filter when md 1.0 devices are present

missed the fact that md superblock version 0.90 also puts
metadata at the end of the device, so the full md filter
needs to be used when either 0.90 or 1.0 is present.
2018-11-29 12:35:54 -06:00
David Teigland
7e721ca048 bcache: sync io fixes
fix lseek error check
fix read/write error checks
handle zero return from read and write
don't return an error for short io
fix partial read/write loop
2018-11-20 09:19:18 -06:00
David Teigland
ca66d52032 io: use sync io if aio fails
io_setup() for aio may fail if a system has reached the
aio request limit.  In this case, fall back to using
sync io.  Also, lvm use of aio can be disabled entirely
with config setting global/use_aio=0.

The system limit for aio requests can be seen from
  /proc/sys/fs/aio-max-nr

The current usage of aio requests can be seen from
  /proc/sys/fs/aio-nr

The system limit for aio requests can be increased by
setting fs.aio-max-nr using sysctl.

Also add last-byte limit to the sync io code.
2018-11-20 09:13:20 -06:00
David Teigland
1dc5603f73 devices: reuse bcache fd when getting block size
This avoids an unnecessary open() on the device.
2018-11-06 16:36:18 -06:00
David Teigland
3ae5569570 Add dm-writecache support
dm-writecache is used like dm-cache with a standard LV
as the cache.

$ lvcreate -n main -L 128M -an foo /dev/loop0

$ lvcreate -n fast -L 32M -an foo /dev/pmem0

$ lvconvert --type writecache --cachepool fast foo/main

$ lvs -a foo -o+devices
  LV            VG  Attr       LSize   Origin        Devices
  [fast]        foo -wi-------  32.00m               /dev/pmem0(0)
  main          foo Cwi------- 128.00m [main_wcorig] main_wcorig(0)
  [main_wcorig] foo -wi------- 128.00m               /dev/loop0(0)

$ lvchange -ay foo/main

$ dmsetup table
foo-main_wcorig: 0 262144 linear 7:0 2048
foo-main: 0 262144 writecache p 253:4 253:3 4096 0
foo-fast: 0 65536 linear 259:0 2048

$ lvchange -an foo/main

$ lvconvert --splitcache foo/main

$ lvs -a foo -o+devices
  LV   VG  Attr       LSize   Devices
  fast foo -wi-------  32.00m /dev/pmem0(0)
  main foo -wi------- 128.00m /dev/loop0(0)
2018-11-06 14:18:41 -06:00
Zdenek Kabelac
9a6f0e64f9 debug: missing backtrace 2018-11-05 17:25:11 +01:00
Zdenek Kabelac
aa8b2d6a0f cleanup: move cast to det_t into MKDEV macro 2018-11-05 17:25:11 +01:00
Zdenek Kabelac
70e3d0a613 cov: remove unused assigns 2018-11-05 17:25:11 +01:00
David Teigland
aecf542126 metadata: prevent writing beyond metadata area
lvm uses a bcache block size of 128K.  A bcache block
at the end of the metadata area will overlap the PEs
from which LVs are allocated.  How much depends on
alignments.  When lvm reads and writes one of these
bcache blocks to update VG metadata, it can also be
reading and writing PEs that belong to an LV.

If these overlapping PEs are being written to by the
LV user (e.g. filesystem) at the same time that lvm
is modifying VG metadata in the overlapping bcache
block, then the user's updates to the PEs can be lost.

This patch is a quick hack to prevent lvm from writing
past the end of the metadata area.
2018-10-29 16:53:17 -05:00
Zdenek Kabelac
fdd76da33d cov: drop uneeded header files 2018-10-15 17:49:44 +02:00
Zdenek Kabelac
b1ff52ca14 cov: check dev_close_immediate
Function can report log_error() on fail path.
2018-10-15 17:49:44 +02:00
Zdenek Kabelac
eb566e034f cov: add check for positive value
As pgsize parameter for _init_free_list() can't be negative,
report problem in case for any reason we would get negative number.
2018-10-15 17:49:44 +02:00
Zdenek Kabelac
9b85ecb85b cov: fix memleak on bcache io error path
Drop allocated IO.

merge free bache
2018-10-15 17:49:44 +02:00
Joe Thornber
3255e384db [bcache] Remove unused 'hash' field from blocks.
We use a radix tree these days rather than a hash table.
2018-09-11 13:17:29 +01:00
David Teigland
fade9ca3b6 bcache: reduce MAX_IO to 256
This is the number of concurrent async io requests that
the scan layer will submit to the bcache layer.  There
will be an open fd for each of these, so it is best to
keep this well below the default limit for max open files
(1024), otherwise lvm may get EMFILE from open(2) when
there are around 1024 devices to scan on the system.
2018-08-24 14:55:12 -05:00
Zdenek Kabelac
c8b4f9414c dev_io: no discard in testmode
When lvm2 command is executed in test mode, discard ioctl is skipped.
This may cause even data-loose in case, issuing discard for released
areas was enabled and user 'tested'  lvreduce.
2018-07-09 00:19:30 +02:00
Marian Csontos
a14f21bf1d bcache: Fix null pointer dereferencing 2018-06-26 17:04:18 +02:00
Zdenek Kabelac
c728d88e11 build: include configure.h
It's important to consistenly include  configure.h as the 1st. header.
It containts #defines influencing behavior of other included header
files.
2018-06-22 23:11:44 +02:00
David Teigland
15826214f9 Remove code for using files as devices
It appears this has not been used in a long time,
and it seems to have no point since loop devices exist.
2018-06-21 09:33:21 -05:00
David Teigland
42f7caf1c2 scan: work around udev problems by avoiding open RDWR
udev creates a train wreck of events if we open devices
with RDWR.  Until we can fix/disable/scrap udev, work around
this by opening RDONLY and then closing/reopening RDWR when
a write is needed.  This invalidates the bcache blocks for
the device before writing so it can trigger unnecessary
rereading.
2018-06-20 14:08:12 -05:00
David Teigland
f85a010a6b bcache: remove extraneous error message
an error from io_submit is already recognized by
the caller like errors during completion.
2018-06-18 12:02:22 -05:00
David Teigland
328303d4d4 Remove unused device error counting 2018-06-15 14:04:39 -05:00
David Teigland
3fd75d1bcd scan: use full md filter when md 1.0 devices are present
The md filter can operate in two native modes:
- normal: reads only the start of each device
- full: reads both the start and end of each device

md 1.0 devices place the superblock at the end of the device,
so components of this version will only be identified and
excluded when lvm uses the full md filter.

Previously, the full md filter was only used in commands
that could write to the device.  Now, the full md filter
is also applied when there is an md 1.0 device present
on the system.  This means the 'pvs' command can avoid
displaying md 1.0 components (at the cost of doubling
the i/o to every device on the system.)

(The md filter can operate in a third mode, using udev,
but this is disabled by default because there have been
problems with reliability of the info returned from udev.)
2018-06-15 12:21:25 -05:00
David Teigland
8eab37593e Add cmd arg to more functions
so that it can be used in the filter code
2018-06-15 11:03:55 -05:00
Joe Thornber
d5da55ed85 device_mapper: remove dbg_malloc.
I wrote dbg_malloc before we had valgrind.  These days there's just
no need.
2018-06-08 13:40:53 +01:00
Joe Thornber
286c1ba336 device_mapper: rename libdevmapper.h -> all.h
I'm paranoid a file will include the global one in /usr/include
by accident.
2018-06-08 12:31:45 +01:00
David Teigland
1539e51721 devices: clean up io error messages
Remove the io error message from bcache.c since it is not
very useful without the device path.

Make the io error messages from dev_read_bytes/dev_write_bytes
more user friendly.
2018-06-07 16:17:04 +01:00
Joe Thornber
dbba1e9b93 Merge branch 'master' into 2018-05-11-fork-libdm 2018-06-01 13:04:12 +01:00
Joe Thornber
d4d39d0f90 Merge branch 'master' into 2018-05-30-bcache-radix-tree 2018-05-31 16:36:04 +01:00
David Teigland
6d14d5d16b scan: removed failed paths for devices
Drop a device path when the scan fails to open it.
2018-05-30 09:05:18 -05:00
Joe Thornber
7635df8cce bcache: switch to storing blocks in a radix tree.
Rather than a hash table.  This will make invalidate_fd() more
efficient since we can iterate just those blocks that are on
a particular dev.
2018-05-30 14:17:26 +01:00
David Teigland
28c8e95d19 scan: refresh paths and retry open
If scanning fails to open any devices, refresh the
device paths in dev cache, and retry the opens.
2018-05-25 13:09:07 -05:00
David Teigland
3c9ed33f83 scan: move warnings about duplicate devices
We have been warning about duplicate devices (and disabling lvmetad)
immediately when the dup was detected (during label_scan).  Move the
warnings (and the disabling) to happen later, after label_scan is
finished.

This lets us avoid an unwanted warning message about duplicates
in the special case were md components are eliminated during the
duplicate device resolution.
2018-05-21 16:48:02 -05:00
Joe Thornber
5052970da3 bcache: Don't call sysconf for every io 2018-05-17 10:05:10 +01:00
Alex Bennée
c6ca81a38d bcache: don't use PAGE_SIZE compile const
PAGE_SIZE is not a compile time constant. Use sysconf instead like
elsewhere in the code.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2018-05-17 10:38:16 +02:00
Joe Thornber
89fdc0b588 Merge branch 'master' into 2018-05-11-fork-libdm 2018-05-16 13:43:02 +01:00
Joe Thornber
ccc35e2647 device-mapper: Fork libdm internally.
The device-mapper directory now holds a copy of libdm source.  At
the moment this code is identical to libdm.  Over time code will
migrate out to appropriate places (see doc/refactoring.txt).

The libdm directory still exists, and contains the source for the
libdevmapper shared library, which we will continue to ship (though
not neccessarily update).

All code using libdm should now use the version in device-mapper.
2018-05-16 13:00:50 +01:00
Joe Thornber
e296f784c9 Merge branch 'master' of git://sourceware.org/git/lvm2 2018-05-16 10:11:58 +01:00
Joe Thornber
df2acbbb97 bcache: nr_ios_pending wasn't being incremented
... but it was being decremented on completion.  Which meant
it wrapped, and no prefetches were ever issued after the
first completion.
2018-05-16 10:09:17 +01:00
Joe Thornber
7f97c7ea9a build: Don't generate symlinks in include/ dir
As we start refactoring the code to break dependencies (see doc/refactoring.txt),
I want us to use full paths in the includes (eg, #include "base/data-struct/list.h").
This makes it more obvious when we're breaking abstraction boundaries, eg, including a file in
metadata/ from base/
2018-05-14 10:30:20 +01:00
Zdenek Kabelac
ac768a9d2b bcache: do not use libdm header files
Logging for libdm differs from lvm logging - keep using consisten
logging function calls.
2018-05-12 18:18:23 +02:00
David Teigland
09fcc8eaa8 scan: ignore duplicates that are md component devs
md devices using an older superblock version have
superblocks at the end of the md device.  For commands
that skip reading the end of devices during filtering,
the md component devs will be scanned, and will appear
as duplicate PVs to the original md device.  Remove
these md components from the list of unused duplicate
devices, so they are treated as if they had been
ignored during filtering.  This avoids the restrictions
that are placed on using PVs with duplicates.
2018-05-11 15:52:22 -05:00
David Teigland
73578e36fa dev_cache: remove the lvmcache check when closing fd
This is no longer used since devices are not held
open in dev_cache.
2018-05-11 14:30:10 -05:00
David Teigland
3e3cb22f2a dev_cache: fix close in utility functions
All these functions are now used as utilities,
e.g. for ioctl (not for io), and need to
open/close the device each time they are called.
(Many of the opens can probably be eliminated by
just using the bcache fd for the ioctl.)
2018-05-11 14:25:08 -05:00
David Teigland
b5d9914628 devs: recognize md devices in subsystem check
If md components appear as duplicate PVs, let the
existing subsystem check recognize the md device.
2018-05-11 14:00:19 -05:00
David Teigland
ccab54677c dev_cache: fix close in dev_get_block_size 2018-05-11 13:53:19 -05:00
David Teigland
bbb8040456 dev_cache: drop open_list
devices are now held open only in bcache,
so drop the dev_cache list of open devices
which is unused.
2018-05-11 12:47:56 -05:00
Joe Thornber
3b02b35c3e Merge branch 'master' of git+ssh://sourceware.org/git/lvm2 2018-05-11 05:39:27 +01:00
Joe Thornber
5f780813f2 bcache/sync io engine: handle short ios 2018-05-11 05:37:47 +01:00
David Teigland
57bb46c5e7 filter: use bcache for filter reads
Filters are still applied before any device reading or
the label scan, but any filter checks that want to read
the device are skipped and the device is flagged.

After bcache is populated, but before lvm looks for
devices (i.e. before label scan), the filters are
reapplied to the devices that were flagged above.
The filters will then find the data they need in
bcache.
2018-05-10 16:03:19 -05:00
Joe Thornber
ae50374811 bcache: Add sync io engine
Something to fall back to when testing.
2018-05-10 14:29:26 +01:00
Joe Thornber
67b80e2d9d bcache: knock out err param.
Dave used this for debugging.  Not needed in general.
2018-05-10 13:26:08 +01:00
Joe Thornber
1c5c99afce bcache-utils: bcache_set_bytes() 2018-05-09 11:05:29 +01:00
Joe Thornber
dfc320f5b8 bcache-utils: rewrite
They take care to avoid redundant reads now.
2018-05-03 11:36:29 +01:00
Joe Thornber
2688aafefb bcache: rename bcache_write_zeroes() -> bcache_zero_bytes()
Now matches the other util functions:

bcache_{prefetch,read,write,zero}_bytes()
2018-05-03 10:21:14 +01:00
Joe Thornber
8b755f1e04 bcache: rewrite bcache_write_zeros()
It now uses GF_ZERO to avoid reading blocks that are going to be
completely zeroed.
2018-05-03 10:14:56 +01:00
Joe Thornber
dc30d4b2f2 bcache: switch off_t -> uint64_t
We always want it to be 64bit
2018-05-03 09:37:43 +01:00
Joe Thornber
efad84ebc2 bcache: Move the utils to a separate file.
This makes it clearer that they don't access the cache internals.
2018-05-03 09:34:41 +01:00
Joe Thornber
b3c41bce3d bcache: add bcache_block_sectors() query fn 2018-05-03 09:33:55 +01:00
Joe Thornber
65912ce44d bcache: add a comment 2018-05-03 09:21:10 +01:00
Joe Thornber
90d0ff6636 bcache: reorder includes in .c file too 2018-05-02 19:45:06 +01:00
Joe Thornber
8fd300f7df device/bcache: reorder includes 2018-05-02 18:59:43 +01:00
David Teigland
24e7745d7a devices: ignore lvm1 and pool devices 2018-05-01 15:18:47 -05:00
Joe Thornber
bfc61a9543 bcache: squash some warnings on rhel6 2018-05-01 13:21:53 +01:00
Joe Thornber
f564e78d98 bcache: rewrite bcache_{write,zero}_bytes
These are utility functions so should only use the public interface.

Also write_bytes was flushing, which will kill performance.
2018-05-01 12:07:33 +01:00
Joe Thornber
e890c37704 [bcache] Some work on bcache_invalidate()
bcache_invalidate() now returns a bool to indicate success.  If fails
if the block is currently held, or the block is dirty and writeback
fails.

Added a bunch of unit tests for the invalidate functions.

Fixed some bugs to do with invalidating errored blocks.
2018-04-27 10:56:13 +01:00
Joe Thornber
1c97fda425 [bcache] get all unit tests passing again 2018-04-26 13:13:27 +01:00
Zdenek Kabelac
fcdac700f9 gcc: remove duplicate typedef 2018-04-23 22:42:18 +02:00
David Teigland
c0973e70a5 dev_cache: clean up scan
Pull out all of the twisted logic and simply call dev_cache_scan
at the start of the command prior to label scan.
2018-04-20 11:22:48 -05:00
David Teigland
6d05859862 bcache: let caller see an error 2018-04-20 11:22:48 -05:00
David Teigland
570c6239ee bcache: fix error handling
The error handling code wasn't working, but it
appears that just removing it is what we need.
The doesn't really need any different behavior
related to bcache blocks on an io error, it just
wants to know if there was an error.
2018-04-20 11:22:47 -05:00
David Teigland
4331182964 bcache: add some error messages for debugging 2018-04-20 11:22:47 -05:00
David Teigland
e49b114f7e bcache: use wrappers for bcache read write in lvm
Using a wrapper makes it easier to disable bcache if needed.
2018-04-20 11:22:47 -05:00
David Teigland
8065492046 bcache: do all writes through bcache 2018-04-20 11:22:47 -05:00
David Teigland
8b26a007b1 misc bcache fixes from ejt 2018-04-20 11:22:47 -05:00
David Teigland
6c67c7557c scan: use separate fd for bcache
Create a new dev->bcache_fd that the scanning code owns
and is in charge of opening/closing.  This prevents other
parts of lvm code (which do various open/close) from
interfering with the bcache fd.  A number of dev_open
and dev_close are removed from the reading path since
the read path now uses the bcache.

With that in place, open(O_EXCL) for pvcreate/pvremove
can then be fixed.  That wouldn't work previously because
of other open fds.
2018-04-20 11:22:46 -05:00
David Teigland
a7cb76ae94 scan: use bcache for label scan and vg read
New label_scan function populates bcache for each device
on the system.

The two read paths are updated to get data from bcache.

The bcache is not yet used for writing.  bcache blocks
for a device are invalidated when the device is written.
2018-04-20 11:19:24 -05:00
David Teigland
93fc937429 [device/bcache] bcache_read_bytes should put blocks 2018-04-20 11:12:50 -05:00
David Teigland
7be54bd687 [device/bcache] fix min() function 2018-04-20 11:12:50 -05:00
David Teigland
d9e6298edb [device/bcache] fix missing max_io fn in bcache async engine 2018-04-20 11:12:50 -05:00
Joe Thornber
dc8034f5eb [device/bcache] more work on bcache 2018-04-20 11:12:50 -05:00
Joe Thornber
6a57ed17a2 [device/bcache] add bcache_prefetch_bytes() and bcache_read_bytes()
Not tested yet.
2018-04-20 11:12:50 -05:00
Joe Thornber
467adfa082 [device/bcache] More tests and some bug fixes 2018-04-20 11:12:50 -05:00
Joe Thornber
19647d1cd4 [device/bcache] fix bug in _alloc_block 2018-04-20 11:12:50 -05:00
Joe Thornber
1563b93691 [device/bcache] Add bcache_max_prefetches()
Ignore prefetches if max io is in flight.
2018-04-20 11:12:50 -05:00
Joe Thornber
c4c4acfd42 [device/bcache] Add a couple of invalidate methods 2018-04-20 11:12:50 -05:00
Joe Thornber
0f0eb04edb [device/bcache] some more work on bcache 2018-04-20 11:12:50 -05:00
Joe Thornber
46867a45d2 [device/bcache] stub a unit test 2018-04-20 11:12:50 -05:00
Joe Thornber
da7e13ef88 [lib/device/bcache] Tweaks after Kabi's review 2018-04-20 11:10:45 -05:00
Joe Thornber
acb42ec465 [device/bcache] Initial code drop.
Compiles.  Not written tests yet.
2018-04-20 11:10:45 -05:00
Joe Thornber
00f1b208a1 [io paths] Unpick agk's aio stuff 2018-04-20 11:03:58 -05:00
Zdenek Kabelac
e878c3fc32 cleanup: correct casting 2018-04-20 12:17:01 +02:00
Zdenek Kabelac
f2d0eefa77 coverity: make use of defined variable
Since we declare 'r', let's use the value for something.
2018-03-17 23:33:58 +01:00
Zdenek Kabelac
285413b502 cleanup: missing dots and indent 2018-03-15 11:01:04 +01:00
Zdenek Kabelac
70ad633638 devcache: add reason and always log_error
With these read errors it's useful to know the reason.
Also avoid to log error just once so we know exactly
how many times we did failing read.

On the other hand reduce repeated log_error() on code 'backtrace'
path and change severity of message to just log_debug() so the
actual read error is printed once for one read.
2018-03-15 10:50:28 +01:00
Zdenek Kabelac
b6e7a0b490 cleanup: more usage of dm_strncpy
Use existing wrapper function arournd  strncpy + buf[] = 0;
2018-03-06 15:40:34 +01:00
Zdenek Kabelac
6b48868cf0 io: keep 64b arithmetic
Widen to 64b arithmetic from start.
2018-02-28 21:05:18 +01:00
Alasdair G Kergon
d6cabbbc53 device: Fix basic async I/O error handling 2018-02-08 20:19:21 +00:00
Alasdair G Kergon
3e29c80122 device: Queue any aio beyond defined limits. 2018-02-08 20:15:37 +00:00
Alasdair G Kergon
db41fe6c5d lvmcache: Use asynchronous I/O when scanning devices. 2018-02-08 20:15:29 +00:00
Alasdair G Kergon
8c7bbcfb0f device: Basic config and setup to support async I/O. 2018-02-08 20:15:14 +00:00
Alasdair G Kergon
7a9af3cd0e device: Add flag to indicate that a code path can support AIO
Until the whole source supports AIO, library code can check for
AIO_SUPPORTED_CODE_PATH to determine whether or not it is OK
to use AIO.
2018-02-06 01:11:00 +00:00
Alasdair G Kergon
e869a52cc4 callbacks: Miscellaneous fixes for recent changes 2018-02-06 01:09:39 +00:00
Zdenek Kabelac
a1cfef9f26 dev_io: fix writes for unaligned buffers
Actually the removed code is necessary - since not all writes are
getting alligned buffer - older compilers seems to be not able
to create 4K aligned buffers on stack - this the aligning code still
need to be present for write path.
2018-01-23 13:36:12 +01:00
Zdenek Kabelac
6e9148e7ab debug: drop DEBUG_MEM path
Memory is not allocated so no DEBUG_MEM part is needed.
2018-01-23 11:45:18 +01:00
Alasdair G Kergon
9194610f42 device: Add ioflags parameter to transfer additional state.
Flags are set on the initial I/O and passed to any callbacks that
may in turn issue further I/O using the inherited flags.
2018-01-21 21:10:23 +00:00
Alasdair G Kergon
c26458339e device: Move buffer allocation nearer to the I/O.
Don't allocate memory until it's needed - later we'll add
some of the I/O to an internal queue instead of issuing it
immediately.
2018-01-16 01:12:08 +00:00
Alasdair G Kergon
081902b4c1 device: Merge _dev_read and dev_read_callback. 2018-01-16 00:41:42 +00:00
Alasdair G Kergon
b825987b2f device: Rearrange _aligned_io(). 2018-01-15 20:10:54 +00:00
Alasdair G Kergon
c90582344d device: Add reason to devbuf. 2018-01-15 19:38:18 +00:00
Alasdair G Kergon
1f01eaa612 device: Store offset to data instead of pointer.
We want to save the relative offset before we've allocated the
buffer's memory.
2018-01-15 19:32:59 +00:00
Alasdair G Kergon
61d3296f2a device: Reorder device.h before change. 2018-01-15 19:24:01 +00:00
Alasdair G Kergon
6210c1ec28 device: Mark read-only device buffers const. 2018-01-10 19:57:10 +00:00
Alasdair G Kergon
c350f96c09 device: Eliminate unnecessary buffer from dev_read. 2018-01-10 18:48:01 +00:00
Alasdair G Kergon
366493a1d1 device: Suppress repeated reads of the same data.
If the data being requested is present in last_[extra_]devbuf,
return that directly instead of reading it from disk again.

Typical LVM2 access patterns request data within two adjacent 4k blocks
so we eliminate some read() system calls by always reading at least 8k.
2018-01-10 15:52:03 +00:00
Alasdair G Kergon
dcb2a5a611 device: Remove some data copying between buffers.
Callers that read larger amounts of data now get a pointer to read-only
data directly without copying it through an intermediate buffer.  This
data is owned by the device layer so the callers no longer free it.
2018-01-10 15:48:03 +00:00