1
0
mirror of git://sourceware.org/git/lvm2.git synced 2024-12-22 17:35:59 +03:00
Commit Graph

2767 Commits

Author SHA1 Message Date
Zdenek Kabelac
d60d59a5f3 cleanup: use unsigned type 2019-05-03 13:17:22 +02:00
David Teigland
8c87dda195 locking: unify global lock for flock and lockd
There have been two file locks used to protect lvm
"global state": "ORPHANS" and "GLOBAL".

Commands that used the ORPHAN flock in exclusive mode:
  pvcreate, pvremove, vgcreate, vgextend, vgremove,
  vgcfgrestore

Commands that used the ORPHAN flock in shared mode:
  vgimportclone, pvs, pvscan, pvresize, pvmove,
  pvdisplay, pvchange, fullreport

Commands that used the GLOBAL flock in exclusive mode:
  pvchange, pvscan, vgimportclone, vgscan

Commands that used the GLOBAL flock in shared mode:
  pvscan --cache, pvs

The ORPHAN lock covers the important cases of serializing
the use of orphan PVs.  It also partially covers the
reporting of orphan PVs (although not correctly as
explained below.)

The GLOBAL lock doesn't seem to have a clear purpose
(it may have eroded over time.)

Neither lock correctly protects the VG namespace, or
orphan PV properties.

To simplify and correct these issues, the two separate
flocks are combined into the one GLOBAL flock, and this flock
is used from the locking sites that are in place for the
lvmlockd global lock.

The logic behind the lvmlockd (distributed) global lock is
that any command that changes "global state" needs to take
the global lock in ex mode.  Global state in lvm is: the list
of VG names, the set of orphan PVs, and any properties of
orphan PVs.  Reading this global state can use the global lock
in sh mode to ensure it doesn't change while being reported.

The locking of global state now looks like:

lockd_global()
  previously named lockd_gl(), acquires the distributed
  global lock through lvmlockd.  This is unchanged.
  It serializes distributed lvm commands that are changing
  global state.  This is a no-op when lvmlockd is not in use.

lockf_global()
  acquires an flock on a local file.  It serializes local lvm
  commands that are changing global state.

lock_global()
  first calls lockf_global() to acquire the local flock for
  global state, and if this succeeds, it calls lockd_global()
  to acquire the distributed lock for global state.

Replace instances of lockd_gl() with lock_global(), so that the
existing sites for lvmlockd global state locking are now also
used for local file locking of global state.  Remove the previous
file locking calls lock_vol(GLOBAL) and lock_vol(ORPHAN).

The following commands which change global state are now
serialized with the exclusive global flock:

pvchange (of orphan), pvresize (of orphan), pvcreate, pvremove,
vgcreate, vgextend, vgremove, vgreduce, vgrename,
vgcfgrestore, vgimportclone, vgmerge, vgsplit

Commands that use a shared flock to read global state (and will
be serialized against the prior list) are those that use
process_each functions that are based on processing a list of
all VG names, or all PVs.  The list of all VGs or all PVs is
global state and the shared lock prevents those lists from
changing while the command is processing them.

The ORPHAN lock previously attempted to produce an accurate
listing of orphan PVs, but it was only acquired at the end of
the command during the fake vg_read of the fake orphan vg.
This is not when orphan PVs were determined; they were
determined by elimination beforehand by processing all real
VGs, and subtracting the PVs in the real VGs from the list
of all PVs that had been identified during the initial scan.
This is fixed by holding the single global lock in shared mode
while processing all VGs to determine the list of orphan PVs.
2019-04-29 13:01:05 -05:00
David Teigland
ccd1386070 wipe_lv: initially open LV in writable mode
wipe_lv knows it's going to write the device, so it
can open rw from the start.  It was opening readonly,
and then dev_write needed to reopen it readwrite.
2019-04-26 14:49:27 -05:00
David Teigland
c33770c02d lvmlockd: do not allow mirror LV to be activated shared
This reverts 518a8e8cfb
  "lvmlockd: activate mirror LVs in shared mode with cmirrord"

because while activating a mirror LV with cmirrord worked,
changes to the active cmirror did not work.
2019-04-04 13:21:38 -05:00
Zdenek Kabelac
fcec6691f0 thin: fix maintenance of _pmspare
When metadata grows lvm2 may need to extend also _pmspare volume.
2019-04-03 13:28:54 +02:00
Zdenek Kabelac
e27d027155 thin: resize metadata with data
When data are growing, adapt also size of metadata.
As we get way too many reports from users doing huge growths of
data portion while keep metadata small and avoiding using monitoring.

So to enhance the user-experience in case user requests grown of
thin-pool (without passing PV list for growth) - lvm2 will automaticaly
grown also the metadata part of thin-pool (if possible).
2019-04-03 13:28:22 +02:00
Zdenek Kabelac
7c3de2fd93 thin: introduce estimate_thin_pool_metadata_size
Add function for estimation of thin-pool metadata size for given size of
data. Function is using already existing internal API so it can
be reused for resize of thin-pool data.
2019-04-03 13:27:17 +02:00
David Teigland
85e68a8333 lvextend: refresh shared LV remotely using dlm/corosync
When lvextend extends an LV that is active with a shared
lock, use this as a signal that other hosts may also have
the LV active, with gfs2 mounted, and should have the LV
refreshed to reflect the new size.  Use the libdlmcontrol
run api, which uses dlm_controld/corosync to run an
lvchange --refresh command on other cluster nodes.
2019-03-21 12:38:20 -05:00
David Teigland
d369de8399 lvextend: allow on LV active with a shared lock
Detect when a shared lock exists, don't require the
normal exclusive lock, and allow the lvextend.
2019-03-21 12:38:20 -05:00
Zdenek Kabelac
677aa84be3 vdo: enable caching for vdopool LV and vdo LV
Allow using caching with VDO.
User can either cache a single vdopool or
a vdo LV - difference when the caching is put-in depends on a use-case
and it's upto user to decide which kind of speed is expected.
2019-03-20 14:38:31 +01:00
Zdenek Kabelac
0db22c5f81 lv_manip: insert remove layer skips pools
Fixing renaming of subLVs when removing and inserting layers - this
got visible when using stacked VDO pools.
2019-03-20 14:38:05 +01:00
Zdenek Kabelac
1cc690e911 thin: max thin 2019-03-20 14:37:44 +01:00
David Teigland
4e20ebd6a1 pvscan: ignore online for shared and foreign PVs
Activation would not be allowed anyway, but we can
check for these cases early and avoid wasted time in
pvscan managing online files an attempting activation.
2019-03-05 15:19:05 -06:00
David Teigland
a9eaab6beb Use "cachevol" to refer to cache on a single LV
and "cachepool" to refer to a cache on a cache pool object.

The problem was that the --cachepool option was being used
to refer to both a cache pool object, and to a standard LV
used for caching.  This could be somewhat confusing, and it
made it less clear when each kind would be used.  By
separating them, it's clear when a cachepool or a cachevol
should be used.

Previously:

- lvm would use the cache pool approach when the user passed
  a cache-pool LV to the --cachepool option.

- lvm would use the cache vol approach when the user passed
  a standard LV in the --cachepool option.

Now:

- lvm will always use the cache pool approach when the user
  uses the --cachepool option.

- lvm will always use the cache vol approach when the user
  uses the --cachevol option.
2019-02-27 08:52:34 -06:00
Zdenek Kabelac
d19e372795 cleanup: indent 2019-01-28 22:39:10 +01:00
Zdenek Kabelac
78dd9d820d thin: select chunk size as power of 2
Whenever thin-pool chunk size is unspecified and left for lvm calculation
try to select the size as nearest highest power-of-2 instead of
just being a multiple of 64KiB.
2019-01-28 22:17:25 +01:00
Zdenek Kabelac
58ad831c72 cache: select chunk size as power of 2
When cache chunk size is not configured, and left for lvm deduction,
select the value which is power-of-2.
2019-01-28 22:17:14 +01:00
Zdenek Kabelac
105a8edea1 lv_manip: better work with PERCENT_VG modifier with lvresize
Fixing recent commit 022ebb0cfe
Resize already has size that needs to be counted with,
otherwise upsizing operation could turn into size reduction one.
2019-01-21 15:39:24 +01:00
Zdenek Kabelac
f3c52a515b vdo: enable dmeventd resize 2019-01-21 12:53:16 +01:00
Zdenek Kabelac
a16d914d34 cleanup: better naming 2019-01-21 12:53:16 +01:00
Zdenek Kabelac
08cabe9b83 vdo: allow resize of VDO and VDO pool volumes
Now with newer VDO kvdo target we can start to use standard mechanism
to enable resize of VDO volumes.

VDO pool can be grown.

Virtual volume grows on top of VDO pool when is not big enough.
Reduced VDOLV is calling discard for reduced areas - this can
take long time!

TODO: implement some pollable mechanism for out-of-lock TRIM.
2019-01-21 12:53:16 +01:00
Zdenek Kabelac
bd6709cec6 vdo: size reduction requires VDO to be active
To be able to send discard to reduced areas - the VDO LV needs to
be active.
2019-01-21 12:53:16 +01:00
Zdenek Kabelac
f1ad4b0679 vdo: discard reduced area
Implement sending discard to reduced LV area.
2019-01-21 12:53:16 +01:00
Zdenek Kabelac
ca72d19691 vdo: estimate virtual size after resize 2019-01-21 12:53:16 +01:00
Zdenek Kabelac
ab031d673d vdo: introduce function for estimation of virtual size 2019-01-21 12:53:16 +01:00
Zdenek Kabelac
022ebb0cfe lv_manip: better work with PERCENT_VG modifier
When using 'lvcreate -l100%VG' and there is big disproportion between
real available space and requested setting - automatically fallback
to 100%FREE.

Difference can be seen when VG is big and already most space was
allocated, so the requestion 100%VG can end (and by spec for % modifier
it's correct) as LV with size of 1%VG.  Usually this is not a big
problem - buit in some cases - like cache-pool allocation, this
can result a big difference for chunksize selection.

With this patch it's more closely match common-sense logic without
the need of reitteration of too big changes in lvm2 core ATM.

TODO: in the future there should be allocator solving all allocations
in a single call.
2019-01-21 12:53:15 +01:00
Zdenek Kabelac
26ead4bf45 cov: extent_size cannot be 0
Make this obvious to coverity.
2018-12-21 21:45:08 +01:00
Zdenek Kabelac
9dfb1a11b7 cov: drop unneeded header file
MAX macro no longer needed in pe_align.
2018-12-21 21:45:08 +01:00
Zdenek Kabelac
3320ab8334 lib: move towards v2 version of VDO format
Drop very old original format of VDO target and focus on V2 version.
So some variables were renamed or replaced.
There is no compatibility preserved (with assumption so far this is
experimental feature and there is no real user).

Note - version currently VDO calls this version 6.2.
2018-12-20 13:26:55 +01:00
Heinz Mauelshagen
e82303fd6a lvcreate/lvconvert: optionally reenable mirrored mirror log for testing purposes only
This is a followup patch to commit edb72cb70c
to support related lvm2 test suite tests.

A 'global/support_mirrored_mirror_log' bool configuration variable gets
introduced allowing the creation of, or conversion to mirrored 'mirror'
logs if set.  The capability to create these in turn allows the rest of
the tests to perform activation of such existing LVs and their conversions
to disk/core 'mirror' logs.

Display a disclaimer warning if enabled that this is not for regular use.

Add definition of the enabled config option to respective test scripts.

Related: rhbz1643562
2018-12-17 19:28:54 +01:00
Ming-Hung Tsai
859feb81e5 lvmanip: uninitialized members in struct pv_list (#10)
Scenario: Given an existed LV `lvol0`, I want to create another LV
on the PVs used by `lvol0`.

I use `build_parallel_areas_from_lv()` to obtain the `pv_list` of each segments.
However, the returned `pv_list` is not properly initialized, which causes
segfault in subsequent operations.
2018-12-14 15:23:18 +01:00
Heinz Mauelshagen
dd5716ddf2 raid: fix (de)activation of RaidLVs with visible SubLVs
There's a small window during creation of a new RaidLV when
rmeta SubLVs are made visible to wipe them in order to prevent
erroneous discovery of stale RAID metadata.  In case a crash
prevents the SubLVs from being committed hidden after such
wiping, the RaidLV can still be activated with the SubLVs visible.
During deactivation though, a deadlock occurs because the visible
SubLVs are deactivated before the RaidLV.

The patch adds _check_raid_sublvs to the raid validation in merge.c,
an activation check to activate.c (paranoid, because the merge.c check
will prevent activation in case of visible SubLVs) and shares the
existing wiping function _clear_lvs in raid_manip.c moved to lv_manip.c
and renamed to activate_and_wipe_lvlist to remove code duplication.
Whilst on it, introduce activate_and_wipe_lv to share with
(lvconvert|lvchange).c.

Resolves: rhbz1633167
2018-12-11 16:35:34 +01:00
Heinz Mauelshagen
edb72cb70c lvcreate/lvconvert: prohibit creation of/conversion to mirrored mirror logs
In RHEL7 we marked mirrored mirror logs as deprecated and
added a related message.  This patch prohibits creating new
'mirror' LVs with that log type or converting existing LVs
to have one.

Existing LVs with mirrored mirror log can be activated
and converted to disk/core logs.

Avoid double deprecation message when running lvconvert.

Resolves: rhbz1643562
2018-12-08 02:52:50 +01:00
David Teigland
904e1e3d26 Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
  nothing.  The first PE continues to be placed at 1 MiB
  resulting in a metadata area size of 1020 KiB (for
  4K page sizes; slightly smaller for larger page sizes.)

. When default_data_alignment is disabled in lvm.conf,
  align pe_start at 1 MiB, based on a default metadata area
  size that adapts to the page size.  Previously, disabling
  this option would result in mda_size that was too small
  for common use, and produced a 64 KiB aligned pe_start.

. Customized pe_start and mda_size values continue to be
  set as before in lvm.conf and command line.

. Remove the configure option for setting default_data_alignment
  at build time.

. Improve alignment related option descriptions.

. Add section about alignment to pvcreate man page.

Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.

The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.)  The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.

If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.

This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-26 16:36:50 -06:00
David Teigland
3ae5569570 Add dm-writecache support
dm-writecache is used like dm-cache with a standard LV
as the cache.

$ lvcreate -n main -L 128M -an foo /dev/loop0

$ lvcreate -n fast -L 32M -an foo /dev/pmem0

$ lvconvert --type writecache --cachepool fast foo/main

$ lvs -a foo -o+devices
  LV            VG  Attr       LSize   Origin        Devices
  [fast]        foo -wi-------  32.00m               /dev/pmem0(0)
  main          foo Cwi------- 128.00m [main_wcorig] main_wcorig(0)
  [main_wcorig] foo -wi------- 128.00m               /dev/loop0(0)

$ lvchange -ay foo/main

$ dmsetup table
foo-main_wcorig: 0 262144 linear 7:0 2048
foo-main: 0 262144 writecache p 253:4 253:3 4096 0
foo-fast: 0 65536 linear 259:0 2048

$ lvchange -an foo/main

$ lvconvert --splitcache foo/main

$ lvs -a foo -o+devices
  LV   VG  Attr       LSize   Devices
  fast foo -wi-------  32.00m /dev/pmem0(0)
  main foo -wi------- 128.00m /dev/loop0(0)
2018-11-06 14:18:41 -06:00
David Teigland
cac4a9743a Allow dm-cache cache device to be standard LV
If a single, standard LV is specified as the cache, use
it directly instead of converting it into a cache-pool
object with two separate LVs (for data and metadata).

With a single LV as the cache, lvm will use blocks at the
beginning for metadata, and the rest for data.  Separate
dm linear devices are set up to point at the metadata and
data areas of the LV.  These dm devs are given to the
dm-cache target to use.

The single LV cache cannot be resized without recreating it.

If the --poolmetadata option is used to specify an LV for
metadata, then a cache pool will be created (with separate
LVs for data and metadata.)

Usage:

$ lvcreate -n main -L 128M vg /dev/loop0

$ lvcreate -n fast -L 64M vg /dev/loop1

$ lvs -a vg
  LV   VG Attr       LSize   Type   Devices
  main vg -wi-a----- 128.00m linear /dev/loop0(0)
  fast vg -wi-a-----  64.00m linear /dev/loop1(0)

$ lvconvert --type cache --cachepool fast vg/main

$ lvs -a vg
  LV           VG Attr       LSize   Origin       Pool  Type   Devices
  [fast]       vg Cwi---C---  64.00m                     linear /dev/loop1(0)
  main         vg Cwi---C--- 128.00m [main_corig] [fast] cache  main_corig(0)
  [main_corig] vg owi---C--- 128.00m                     linear /dev/loop0(0)

$ lvchange -ay vg/main

$ dmsetup ls
vg-fast_cdata   (253:4)
vg-fast_cmeta   (253:5)
vg-main_corig   (253:6)
vg-main (253:24)
vg-fast (253:3)

$ dmsetup table
vg-fast_cdata: 0 98304 linear 253:3 32768
vg-fast_cmeta: 0 32768 linear 253:3 0
vg-main_corig: 0 262144 linear 7:0 2048
vg-main: 0 262144 cache 253:5 253:4 253:6 128 2 metadata2 writethrough mq 0
vg-fast: 0 131072 linear 7:1 2048

$ lvchange -an vg/min

$ lvconvert --splitcache vg/main

$ lvs -a vg
  LV   VG Attr       LSize   Type   Devices
  fast vg -wi-------  64.00m linear /dev/loop1(0)
  main vg -wi------- 128.00m linear /dev/loop0(0)
2018-11-06 13:44:54 -06:00
David Teigland
a686391eca cache: reorganize cache_set_policy
to prepare for future addition
2018-11-06 11:36:29 -06:00
David Teigland
23948e99b3 cache: improve error message about flush 2018-11-06 11:36:29 -06:00
David Teigland
3e547fa952 cache: improve warning message about cached thin data 2018-11-06 11:36:28 -06:00
David Teigland
e26dacf30a cache: factor getting cache mode
so part can be called separately
2018-11-06 11:36:28 -06:00
David Teigland
8d7075528f cache: add cache_mode_num_to_str
Requires only string and number, no specific lv/seg type.
2018-11-06 11:36:28 -06:00
Zdenek Kabelac
70e3d0a613 cov: remove unused assigns 2018-11-05 17:25:11 +01:00
David Teigland
aecf542126 metadata: prevent writing beyond metadata area
lvm uses a bcache block size of 128K.  A bcache block
at the end of the metadata area will overlap the PEs
from which LVs are allocated.  How much depends on
alignments.  When lvm reads and writes one of these
bcache blocks to update VG metadata, it can also be
reading and writing PEs that belong to an LV.

If these overlapping PEs are being written to by the
LV user (e.g. filesystem) at the same time that lvm
is modifying VG metadata in the overlapping bcache
block, then the user's updates to the PEs can be lost.

This patch is a quick hack to prevent lvm from writing
past the end of the metadata area.
2018-10-29 16:53:17 -05:00
Heinz Mauelshagen
8df2dd66ce Revert "raid: fix left behind SubLVs"
This reverts commit 16ae968d24.

We need to come up with a better fix, because we fall short
wiping all known signatures when not using the wipe_lv API.
2018-10-25 14:35:56 +02:00
Heinz Mauelshagen
16ae968d24 raid: fix left behind SubLVs
lvm metadata writes, commits and activations are performed
for (newly) allocated RAID metadata SubLVs to wipe any preexisiting
data thus avoid false raid superblock positives on RaidLV activation.

This process can be interrupted by command or system crashs
thus leaving stale SubLVs in the lvm metadata as a problem.

Because we hold an exclusive lock in this metadata SubLV wiping
process, we can address this problem by avoiding aforementioned
commits/writes/activations altogether wiping the respective first
sector of the first physical extent allocated to any metadata SubLV
directly via the existing dev_set() API.

Succeeds all LVM RAID tests.

Related: rhbz1633167
2018-10-24 16:35:30 +02:00
Zdenek Kabelac
fdd76da33d cov: drop uneeded header files 2018-10-15 17:49:44 +02:00
Zdenek Kabelac
253989ecd9 cov: fix error path
Avoid calling 'bad:' section since we have not set 'fd' yet
and instead directly return failing 0 value.
2018-10-15 17:49:44 +02:00
Zdenek Kabelac
fbfbbf6d6a cov: drop check for pointer
Pointer must be always set and it's been already dereferenced.
2018-10-15 14:24:28 +02:00
Heinz Mauelshagen
989626926c lvconvert: allow raid4 -> linear conversion request
Allow "lvconvert --type linear RaidLV" on a raid4 LV
providing convenient interim steps to convert to linear.

Add respective new test
   lvconvert-raid-takeover-raid4_to_linear.sh
and
   lvconvert-raid-takeover-linear_to_raid4.sh
for linear to raid4 once on it.
2018-09-10 18:43:21 +02:00
Heinz Mauelshagen
e2e30a64ab lvconvert: fix interim segtype regression on raid6 conversions
When converting from striped/raid0/raid0_meta
to raid6 with > 2 stripes, allow possible
direct conversion (to raid6_n_6).

In case of 2 stripes, first convert to raid5_n to restripe
to at least 3 data stripes (the raid6 minimum in lvm2) in
a second conversion before finally converting to raid6_n_6.

As before, raid6_n_6 then can be converted
to any other raid6 layout.

Enhance lvconvert-raid-takeover.sh to test the
2 stripes conversions to raid6.

Resolves: rhbz1624038
2018-09-07 13:48:19 +02:00