2001-09-25 12:49:28 +00:00
/*
2008-01-30 14:00:02 +00:00
* Copyright ( C ) 2001 - 2004 Sistina Software , Inc . All rights reserved .
2012-02-23 00:11:01 +00:00
* Copyright ( C ) 2004 - 2012 Red Hat , Inc . All rights reserved .
2001-09-25 12:49:28 +00:00
*
2004-03-30 19:35:44 +00:00
* This file is part of LVM2 .
*
* This copyrighted material is made available to anyone wishing to use ,
* modify , copy , or redistribute it subject to the terms and conditions
2007-08-20 20:55:30 +00:00
* of the GNU Lesser General Public License v .2 .1 .
2004-03-30 19:35:44 +00:00
*
2007-08-20 20:55:30 +00:00
* You should have received a copy of the GNU Lesser General Public License
2004-03-30 19:35:44 +00:00
* along with this program ; if not , write to the Free Software Foundation ,
2016-01-21 11:49:46 +01:00
* Inc . , 51 Franklin Street , Fifth Floor , Boston , MA 02110 - 1301 USA
2001-09-25 12:49:28 +00:00
*/
2018-05-14 10:30:20 +01:00
# include "lib/misc/lib.h"
# include "lib/device/device.h"
# include "lib/metadata/metadata.h"
# include "lib/commands/toolcontext.h"
# include "lib/misc/lvm-string.h"
# include "lib/misc/lvm-file.h"
# include "lib/cache/lvmcache.h"
# include "lib/mm/memlock.h"
# include "lib/datastruct/str_list.h"
# include "lib/metadata/pv_alloc.h"
# include "lib/metadata/segtype.h"
# include "lib/activate/activate.h"
# include "lib/display/display.h"
# include "lib/locking/locking.h"
# include "lib/format_text/archiver.h"
2019-02-06 12:32:26 -06:00
# include "lib/format_text/format-text.h"
# include "lib/format_text/layout.h"
# include "lib/format_text/import-export.h"
2018-05-14 10:30:20 +01:00
# include "lib/config/defaults.h"
# include "lib/locking/lvmlockd.h"
# include "lib/notify/lvmnotify.h"
2024-10-15 15:28:10 +02:00
# include "base/data-struct/radix-tree.h"
2001-09-25 12:49:28 +00:00
2019-02-06 12:32:26 -06:00
# include <time.h>
2010-07-05 22:23:15 +00:00
# include <math.h>
2006-08-17 19:53:36 +00:00
2008-01-30 14:00:02 +00:00
static struct physical_volume * _pv_read ( struct cmd_context * cmd ,
2017-11-06 12:09:52 -06:00
const struct format_type * fmt ,
struct volume_group * vg ,
struct lvmcache_info * info ) ;
2007-06-11 18:29:30 +00:00
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
static int _check_pv_ext ( struct cmd_context * cmd , struct volume_group * vg )
{
struct lvmcache_info * info ;
uint32_t ext_version , ext_flags ;
struct pv_list * pvl ;
if ( vg_is_foreign ( vg ) )
return 1 ;
if ( vg_is_shared ( vg ) )
return 1 ;
dm_list_iterate_items ( pvl , & vg - > pvs ) {
if ( is_missing_pv ( pvl - > pv ) )
continue ;
/* is_missing_pv doesn't catch NULL dev */
if ( ! pvl - > pv - > dev )
continue ;
if ( ! ( info = lvmcache_info_from_pvid ( pvl - > pv - > dev - > pvid , pvl - > pv - > dev , 0 ) ) )
continue ;
ext_version = lvmcache_ext_version ( info ) ;
if ( ext_version < PV_HEADER_EXTENSION_VSN ) {
log_warn ( " WARNING: PV %s in VG %s is using an old PV header, modify the VG to update. " ,
dev_name ( pvl - > pv - > dev ) , vg - > name ) ;
continue ;
}
ext_flags = lvmcache_ext_flags ( info ) ;
if ( ! ( ext_flags & PV_EXT_USED ) ) {
log_warn ( " WARNING: PV %s in VG %s is missing the used flag in PV header. " ,
dev_name ( pvl - > pv - > dev ) , vg - > name ) ;
}
}
return 1 ;
}
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
/*
* Historically , DEFAULT_PVMETADATASIZE was 255 for many years ,
* but that value was only used if default_data_alignment was
* disabled . Using DEFAULT_PVMETADATASIZE 255 , pe_start was
* rounded up to 192 KB from aligning it with 64 K
* ( DEFAULT_PE_ALIGN_OLD 128 sectors ) . Given a 4 KB mda_start ,
* and 192 KB pe_start , the mda_size between the two was 188 KB .
* This metadata area size was too small to be a good default ,
* and disabling default_data_alignment , with no other change ,
* does not imply that the default mda_size or pe_start should
* change .
*/
int get_default_pvmetadatasize_sectors ( void )
2010-08-12 04:11:48 +00:00
{
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
int pagesize = lvm_getpagesize ( ) ;
/*
* This returns the default size of the metadata area in units of
* 512 byte sectors .
*
* We want the default pe_start to consistently be 1 MiB ( 1024 KiB ) ,
* ( even if default_data_alignment is disabled . )
*
* The mda start is at pagesize offset from the start of the device .
*
* The metadata size is the space between mda start and pe_start .
*
* So , if set set default metadata size to 1024 KiB - < pagesize > KiB ,
* it will consistently produce pe_start of 1 MiB .
*
* pe_start 1024 KiB = 2048 sectors .
*
* pagesizes :
* 4096 = 8 sectors .
* 8192 = 16 sectors .
2023-04-07 09:05:07 -05:00
* 16384 = 32 sectors .
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
* 65536 = 128 sectors .
*/
switch ( pagesize ) {
case 4096 :
return 2040 ;
case 8192 :
return 2032 ;
2023-04-07 09:05:07 -05:00
case 16384 :
return 2016 ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
case 65536 :
return 1920 ;
}
log_warn ( " Using metadata size 960 KiB for non-standard page size %d. " , pagesize ) ;
return 1920 ;
2010-08-12 04:11:48 +00:00
}
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
# define ONE_MB_IN_SECTORS 2048 /* 2048 * 512 = 1048576 */
void set_pe_align ( struct physical_volume * pv , uint64_t data_alignment_sectors )
2006-08-17 19:30:59 +00:00
{
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
uint64_t default_data_alignment_mb ;
uint64_t pe_align_sectors ;
uint64_t temp_pe_align_sectors ;
uint32_t page_size_sectors ;
2010-08-12 04:11:48 +00:00
2008-09-19 05:19:09 +00:00
if ( pv - > pe_align )
goto out ;
2008-09-19 04:28:58 +00:00
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
if ( data_alignment_sectors ) {
/* Always use specified alignment */
log_debug ( " Requested PE alignment is %llu sectors " , ( unsigned long long ) data_alignment_sectors ) ;
pe_align_sectors = data_alignment_sectors ;
pv - > pe_align = data_alignment_sectors ;
2010-08-20 20:59:05 +00:00
goto out ;
}
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
/*
* By default the first PE is placed at 1 MiB .
*
* If default_data_alignment is 2 , then the first PE
* is placed at 2 * 1 MiB .
*
* If default_data_alignment is 3 , then the first PE
* is placed at 3 * 1 MiB .
*/
2010-08-20 20:59:05 +00:00
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
default_data_alignment_mb = find_config_tree_int ( pv - > fmt - > cmd , devices_default_data_alignment_CFG , NULL ) ;
if ( default_data_alignment_mb )
pe_align_sectors = default_data_alignment_mb * FIRST_PE_AT_ONE_MB_IN_SECTORS ;
2009-10-06 16:00:38 +00:00
else
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
pe_align_sectors = FIRST_PE_AT_ONE_MB_IN_SECTORS ;
pv - > pe_align = pe_align_sectors ;
log_debug ( " Standard PE alignment is %llu sectors " , ( unsigned long long ) pe_align_sectors ) ;
2010-08-20 20:59:05 +00:00
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
page_size_sectors = lvm_getpagesize ( ) > > SECTOR_SHIFT ;
if ( page_size_sectors > pe_align_sectors ) {
/* This shouldn't happen */
log_debug ( " Increasing PE alignment to page size %u sectors " , page_size_sectors ) ;
pe_align_sectors = page_size_sectors ;
pv - > pe_align = page_size_sectors ;
}
2008-09-19 05:19:09 +00:00
2008-10-03 14:22:18 +00:00
if ( ! pv - > dev )
goto out ;
2009-02-22 19:00:26 +00:00
/*
2009-07-06 19:04:24 +00:00
* Align to stripe - width of underlying md device if present
2009-02-22 19:00:26 +00:00
*/
2013-06-25 12:31:53 +02:00
if ( find_config_tree_bool ( pv - > fmt - > cmd , devices_md_chunk_alignment_CFG , NULL ) ) {
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
temp_pe_align_sectors = dev_md_stripe_width ( pv - > fmt - > cmd - > dev_types , pv - > dev ) ;
if ( temp_pe_align_sectors & & ( pe_align_sectors % temp_pe_align_sectors ) ) {
log_debug ( " Adjusting PE alignment from %llu sectors to md stripe width %llu sectors for %s " ,
( unsigned long long ) pe_align_sectors ,
( unsigned long long ) temp_pe_align_sectors ,
dev_name ( pv - > dev ) ) ;
pe_align_sectors = temp_pe_align_sectors ;
pv - > pe_align = temp_pe_align_sectors ;
}
2010-08-12 04:11:48 +00:00
}
2008-09-19 05:19:09 +00:00
2009-08-01 17:08:43 +00:00
/*
* Align to topology ' s minimum_io_size or optimal_io_size if present
* - minimum_io_size - the smallest request the device can perform
* w / o incurring a read - modify - write penalty ( e . g . MD ' s chunk size )
* - optimal_io_size - the device ' s preferred unit of receiving I / O
* ( e . g . MD ' s stripe width )
*/
2013-06-25 12:31:53 +02:00
if ( find_config_tree_bool ( pv - > fmt - > cmd , devices_data_alignment_detection_CFG , NULL ) ) {
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
temp_pe_align_sectors = dev_minimum_io_size ( pv - > fmt - > cmd - > dev_types , pv - > dev ) ;
if ( temp_pe_align_sectors & & ( pe_align_sectors % temp_pe_align_sectors ) ) {
2024-08-29 23:06:04 +02:00
log_debug ( " Adjusting PE alignment from %llu sectors to minimum io size %llu sectors for %s " ,
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
( unsigned long long ) pe_align_sectors ,
( unsigned long long ) temp_pe_align_sectors ,
dev_name ( pv - > dev ) ) ;
pe_align_sectors = temp_pe_align_sectors ;
pv - > pe_align = temp_pe_align_sectors ;
}
temp_pe_align_sectors = dev_optimal_io_size ( pv - > fmt - > cmd - > dev_types , pv - > dev ) ;
2009-08-01 17:08:43 +00:00
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
if ( temp_pe_align_sectors & & ( pe_align_sectors % temp_pe_align_sectors ) ) {
log_debug ( " Adjusting PE alignment from %llu sectors to optimal io size %llu sectors for %s " ,
( unsigned long long ) pe_align_sectors ,
( unsigned long long ) temp_pe_align_sectors ,
dev_name ( pv - > dev ) ) ;
pe_align_sectors = temp_pe_align_sectors ;
pv - > pe_align = temp_pe_align_sectors ;
}
2009-08-01 17:08:43 +00:00
}
2010-08-20 20:59:05 +00:00
out :
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
log_debug ( " Setting PE alignment to %llu sectors for %s. " ,
( unsigned long long ) pv - > pe_align , dev_name ( pv - > dev ) ) ;
2006-08-17 19:30:59 +00:00
}
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
void set_pe_align_offset ( struct physical_volume * pv , uint64_t data_alignment_offset_sectors )
2009-07-30 17:45:28 +00:00
{
if ( pv - > pe_align_offset )
goto out ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
if ( data_alignment_offset_sectors ) {
2010-08-20 20:59:05 +00:00
/* Always use specified data_alignment_offset */
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
pv - > pe_align_offset = data_alignment_offset_sectors ;
2010-08-20 20:59:05 +00:00
goto out ;
}
2009-07-30 17:45:28 +00:00
if ( ! pv - > dev )
goto out ;
2013-06-25 12:31:53 +02:00
if ( find_config_tree_bool ( pv - > fmt - > cmd , devices_data_alignment_offset_detection_CFG , NULL ) ) {
2013-06-12 12:08:56 +02:00
int align_offset = dev_alignment_offset ( pv - > fmt - > cmd - > dev_types , pv - > dev ) ;
2010-03-02 21:56:14 +00:00
/* must handle a -1 alignment_offset; means dev is misaligned */
if ( align_offset < 0 )
align_offset = 0 ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
pv - > pe_align_offset = align_offset ;
2010-03-02 21:56:14 +00:00
}
2009-08-01 17:07:36 +00:00
2010-08-20 20:59:05 +00:00
out :
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
log_debug ( " Setting PE alignment offset to %llu sectors for %s. " ,
( unsigned long long ) pv - > pe_align_offset , dev_name ( pv - > dev ) ) ;
2009-07-30 17:45:28 +00:00
}
2010-04-06 14:04:54 +00:00
void add_pvl_to_vgs ( struct volume_group * vg , struct pv_list * pvl )
{
dm_list_add ( & vg - > pvs , & pvl - > list ) ;
vg - > pv_count + + ;
2010-04-13 17:26:36 +00:00
pvl - > pv - > vg = vg ;
2011-03-11 14:50:13 +00:00
pv_set_fid ( pvl - > pv , vg - > fid ) ;
2010-04-06 14:04:54 +00:00
}
2010-04-13 17:25:44 +00:00
void del_pvl_from_vgs ( struct volume_group * vg , struct pv_list * pvl )
{
2021-10-01 14:25:59 +02:00
char pvid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2012-02-10 02:53:03 +00:00
struct lvmcache_info * info ;
2011-03-11 14:50:13 +00:00
2010-04-13 17:25:44 +00:00
vg - > pv_count - - ;
dm_list_del ( & pvl - > list ) ;
2011-03-11 14:50:13 +00:00
2021-10-01 14:25:59 +02:00
pvid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( pvid , & pvl - > pv - > id . uuid , ID_LEN ) ;
2012-02-10 02:53:03 +00:00
pvl - > pv - > vg = vg - > fid - > fmt - > orphan_vg ; /* orphan */
2021-08-03 15:32:33 -05:00
if ( ( info = lvmcache_info_from_pvid ( pvid , pvl - > pv - > dev , 0 ) ) )
lvmcache_fid_add_mdas ( info , vg - > fid - > fmt - > orphan_vg - > fid , pvid , ID_LEN ) ;
2012-02-10 02:53:03 +00:00
pv_set_fid ( pvl - > pv , vg - > fid - > fmt - > orphan_vg - > fid ) ;
2010-04-13 17:25:44 +00:00
}
2007-07-12 05:04:42 +00:00
/**
* add_pv_to_vg - Add a physical volume to a volume group
* @ vg - volume group to add to
* @ pv_name - name of the pv ( to be removed )
* @ pv - physical volume to add to volume group
*
* Returns :
* 0 - failure
* 1 - success
* FIXME : remove pv_name - obtain safely from pv
*/
2017-10-18 19:29:32 +01:00
int add_pv_to_vg ( struct volume_group * vg , const char * pv_name ,
struct physical_volume * pv , int new_pv )
2001-10-12 14:25:53 +00:00
{
2001-10-15 18:39:40 +00:00
struct pv_list * pvl ;
2007-07-02 21:48:30 +00:00
struct format_instance * fid = vg - > fid ;
2009-04-10 10:01:08 +00:00
struct dm_pool * mem = vg - > vgmem ;
2010-07-09 15:34:40 +00:00
char uuid [ 64 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2015-03-10 11:25:14 +01:00
int used ;
2001-10-12 14:25:53 +00:00
2001-10-15 18:39:40 +00:00
log_verbose ( " Adding physical volume '%s' to volume group '%s' " ,
2001-11-09 22:01:04 +00:00
pv_name , vg - > name ) ;
2001-10-15 18:39:40 +00:00
2005-10-16 23:03:59 +00:00
if ( ! ( pvl = dm_pool_zalloc ( mem , sizeof ( * pvl ) ) ) ) {
2001-10-15 18:39:40 +00:00
log_error ( " pv_list allocation for '%s' failed " , pv_name ) ;
2001-10-12 14:25:53 +00:00
return 0 ;
}
2007-11-02 13:06:42 +00:00
if ( ! is_orphan_vg ( pv - > vg_name ) ) {
2001-10-15 18:39:40 +00:00
log_error ( " Physical volume '%s' is already in volume group "
" '%s' " , pv_name , pv - > vg_name ) ;
2001-10-15 12:49:58 +00:00
return 0 ;
2017-07-19 16:16:12 +02:00
}
if ( ! new_pv ) {
2015-03-10 11:25:14 +01:00
if ( ( used = is_used_pv ( pv ) ) < 0 )
return_0 ;
if ( used ) {
2016-02-25 14:12:08 -06:00
log_error ( " PV %s is used by a VG but its metadata is missing. " , pv_name ) ;
2015-03-10 11:25:14 +01:00
return 0 ;
}
2001-10-15 12:49:58 +00:00
}
2001-10-12 14:25:53 +00:00
2002-11-18 14:04:08 +00:00
if ( pv - > fmt ! = fid - > fmt ) {
log_error ( " Physical volume %s is of different format type (%s) " ,
pv_name , pv - > fmt - > name ) ;
return 0 ;
}
2005-10-25 19:08:21 +00:00
/* Ensure PV doesn't depend on another PV already in the VG */
2006-05-11 17:58:58 +00:00
if ( pv_uses_vg ( pv , vg ) ) {
2005-10-25 19:08:21 +00:00
log_error ( " Physical volume %s might be constructed from same "
" volume group %s " , pv_name , vg - > name ) ;
return 0 ;
}
2005-10-16 23:03:59 +00:00
if ( ! ( pv - > vg_name = dm_pool_strdup ( mem , vg - > name ) ) ) {
2001-10-15 18:39:40 +00:00
log_error ( " vg->name allocation failed for '%s' " , pv_name ) ;
2001-10-12 14:25:53 +00:00
return 0 ;
}
2021-08-03 15:32:33 -05:00
/* both are struct id */
memcpy ( & pv - > vg_id , & vg - > id , sizeof ( struct id ) ) ;
2006-04-12 21:23:04 +00:00
2001-10-15 20:29:15 +00:00
/* Units of 512-byte sectors */
2001-10-12 14:25:53 +00:00
pv - > pe_size = vg - > extent_size ;
/*
2006-10-07 23:06:18 +00:00
* pe_count must always be calculated by pv_setup
2001-10-12 14:25:53 +00:00
*/
2002-04-24 18:20:51 +00:00
pv - > pe_alloc_count = 0 ;
2001-10-12 14:25:53 +00:00
2016-01-14 00:46:45 +00:00
/* LVM1 stores this outside a VG; LVM2 only stores it inside */
/* FIXME Default from config file? vgextend cmdline flag? */
pv - > status | = ALLOCATABLE_PV ;
2011-02-21 12:24:15 +00:00
if ( ! fid - > fmt - > ops - > pv_setup ( fid - > fmt , pv , vg ) ) {
2002-01-27 21:30:47 +00:00
log_error ( " Format-specific setup of physical volume '%s' "
2001-10-15 18:39:40 +00:00
" failed. " , pv_name ) ;
return 0 ;
}
2013-03-19 13:58:02 +01:00
if ( find_pv_in_vg ( vg , pv_name ) | |
find_pv_in_vg_by_uuid ( vg , & pv - > id ) ) {
2010-04-08 15:18:35 +00:00
if ( ! id_write_format ( & pv - > id , uuid , sizeof ( uuid ) ) ) {
stack ;
uuid [ 0 ] = ' \0 ' ;
}
2012-02-23 13:11:07 +00:00
log_error ( " Physical volume '%s (%s)' already in the VG. " ,
2010-04-08 15:18:35 +00:00
pv_name , uuid ) ;
2001-10-12 14:25:53 +00:00
return 0 ;
}
2003-11-06 20:33:34 +00:00
if ( vg - > pv_count & & ( vg - > pv_count = = vg - > max_pv ) ) {
2001-10-15 18:39:40 +00:00
log_error ( " No space for '%s' - volume group '%s' "
" holds max %d physical volume(s). " , pv_name ,
vg - > name , vg - > max_pv ) ;
return 0 ;
}
2008-01-30 13:19:47 +00:00
if ( ! alloc_pv_segment_whole_pv ( mem , pv ) )
return_0 ;
2001-10-15 18:39:40 +00:00
2011-11-04 22:49:53 +00:00
if ( ( uint64_t ) vg - > extent_count + pv - > pe_count > MAX_EXTENT_COUNT ) {
2006-11-10 18:24:11 +00:00
log_error ( " Unable to add %s to %s: new extent count (% "
PRIu64 " ) exceeds limit (% " PRIu32 " ). " ,
pv_name , vg - > name ,
( uint64_t ) vg - > extent_count + pv - > pe_count ,
2011-11-04 22:49:53 +00:00
MAX_EXTENT_COUNT ) ;
2006-11-10 18:24:11 +00:00
return 0 ;
}
2010-04-06 14:03:43 +00:00
pvl - > pv = pv ;
2010-04-06 14:04:54 +00:00
add_pvl_to_vgs ( vg , pvl ) ;
2001-11-06 19:02:26 +00:00
vg - > extent_count + = pv - > pe_count ;
vg - > free_count + = pv - > pe_count ;
2001-10-12 14:25:53 +00:00
2013-02-19 03:13:59 +01:00
dm_list_iterate_items ( pvl , & fid - > fmt - > orphan_vg - > pvs )
if ( pv = = pvl - > pv ) { /* unlink from orphan */
dm_list_del ( & pvl - > list ) ;
break ;
}
2001-10-12 14:25:53 +00:00
return 1 ;
}
2014-04-28 12:11:44 +02:00
static int _move_pv ( struct volume_group * vg_from , struct volume_group * vg_to ,
const char * pv_name , int enforce_pv_from_source )
2009-07-14 02:15:21 +00:00
{
struct physical_volume * pv ;
struct pv_list * pvl ;
/* FIXME: handle tags */
if ( ! ( pvl = find_pv_in_vg ( vg_from , pv_name ) ) ) {
2014-04-25 14:53:34 -05:00
if ( ! enforce_pv_from_source & &
2014-06-24 14:58:53 +02:00
find_pv_in_vg ( vg_to , pv_name ) )
2014-04-25 14:53:34 -05:00
/*
* PV has already been moved . This can happen if an
* LV is being moved that has multiple sub - LVs on the
* same PV .
*/
return 1 ;
2009-07-14 02:15:21 +00:00
log_error ( " Physical volume %s not in volume group %s " ,
pv_name , vg_from - > name ) ;
return 0 ;
}
2017-10-18 19:29:32 +01:00
if ( vg_bad_status_bits ( vg_from , RESIZEABLE_VG ) | |
vg_bad_status_bits ( vg_to , RESIZEABLE_VG ) )
2009-07-14 02:16:05 +00:00
return 0 ;
2010-04-13 17:26:03 +00:00
del_pvl_from_vgs ( vg_from , pvl ) ;
add_pvl_to_vgs ( vg_to , pvl ) ;
2009-07-14 02:15:21 +00:00
pv = pvl - > pv ;
vg_from - > extent_count - = pv_pe_count ( pv ) ;
vg_to - > extent_count + = pv_pe_count ( pv ) ;
vg_from - > free_count - = pv_pe_count ( pv ) - pv_pe_alloc_count ( pv ) ;
vg_to - > free_count + = pv_pe_count ( pv ) - pv_pe_alloc_count ( pv ) ;
return 1 ;
}
2014-04-25 14:53:34 -05:00
int move_pv ( struct volume_group * vg_from , struct volume_group * vg_to ,
const char * pv_name )
{
return _move_pv ( vg_from , vg_to , pv_name , 1 ) ;
}
2021-01-25 01:17:14 +01:00
struct vg_from_to {
struct volume_group * from ;
struct volume_group * to ;
} ;
static int _move_pvs_used_by_lv_cb ( struct logical_volume * lv , void * data )
{
struct vg_from_to * v = ( struct vg_from_to * ) data ;
struct lv_segment * lvseg ;
unsigned s ;
dm_list_iterate_items ( lvseg , & lv - > segments )
for ( s = 0 ; s < lvseg - > area_count ; s + + )
if ( seg_type ( lvseg , s ) = = AREA_PV )
if ( ! _move_pv ( v - > from , v - > to ,
pv_dev_name ( seg_pv ( lvseg , s ) ) , 0 ) )
return_0 ;
return 1 ;
}
2009-07-14 02:15:21 +00:00
int move_pvs_used_by_lv ( struct volume_group * vg_from ,
struct volume_group * vg_to ,
const char * lv_name )
{
2021-01-25 01:17:14 +01:00
struct vg_from_to data = { . from = vg_from , . to = vg_to } ;
2009-07-14 02:15:21 +00:00
struct lv_list * lvl ;
/* FIXME: handle tags */
if ( ! ( lvl = find_lv_in_vg ( vg_from , lv_name ) ) ) {
log_error ( " Logical volume %s not in volume group %s " ,
lv_name , vg_from - > name ) ;
return 0 ;
}
2021-01-25 01:17:14 +01:00
if ( vg_bad_status_bits ( vg_from , RESIZEABLE_VG ) ) {
log_error ( " Cannot move PV(s) from non resize volume group %s. " , vg_from - > name ) ;
2009-07-14 02:16:05 +00:00
return 0 ;
2021-01-25 01:17:14 +01:00
}
2009-07-14 02:16:05 +00:00
2021-01-25 01:17:14 +01:00
if ( vg_bad_status_bits ( vg_to , RESIZEABLE_VG ) ) {
log_error ( " Cannot move PV(s) to non resize volume group %s. " , vg_to - > name ) ;
return 0 ;
2009-07-14 02:15:21 +00:00
}
2021-01-25 01:17:14 +01:00
if ( ! for_each_sub_lv ( lvl - > lv , _move_pvs_used_by_lv_cb , & data ) )
return_0 ;
if ( ! _move_pvs_used_by_lv_cb ( lvl - > lv , & data ) )
return_0 ;
2009-07-14 02:15:21 +00:00
return 1 ;
}
2013-09-26 11:37:40 -05:00
int validate_new_vg_name ( struct cmd_context * cmd , const char * vg_name )
2008-01-15 22:56:30 +00:00
{
2011-11-18 19:31:09 +00:00
static char vg_path [ PATH_MAX ] ;
2013-09-26 11:37:40 -05:00
name_error_t name_error ;
2008-01-15 22:56:30 +00:00
2013-09-26 11:37:40 -05:00
name_error = validate_name_detailed ( vg_name ) ;
if ( NAME_VALID ! = name_error ) {
display_name_error ( name_error ) ;
log_error ( " New volume group name \" %s \" is invalid. " , vg_name ) ;
2012-10-16 10:20:02 +02:00
return 0 ;
}
2008-01-15 22:56:30 +00:00
2011-11-18 19:31:09 +00:00
snprintf ( vg_path , sizeof ( vg_path ) , " %s%s " , cmd - > dev_dir , vg_name ) ;
2008-01-15 22:56:30 +00:00
if ( path_exists ( vg_path ) ) {
log_error ( " %s: already exists in filesystem " , vg_path ) ;
return 0 ;
}
return 1 ;
}
int validate_vg_rename_params ( struct cmd_context * cmd ,
const char * vg_name_old ,
const char * vg_name_new )
{
unsigned length ;
char * dev_dir ;
dev_dir = cmd - > dev_dir ;
length = strlen ( dev_dir ) ;
/* Check sanity of new name */
if ( strlen ( vg_name_new ) > NAME_LEN - length - 2 ) {
log_error ( " New volume group path exceeds maximum length "
" of %d! " , NAME_LEN - length - 2 ) ;
return 0 ;
}
2012-10-16 10:20:02 +02:00
if ( ! validate_new_vg_name ( cmd , vg_name_new ) )
return_0 ;
2008-01-15 22:56:30 +00:00
if ( ! strcmp ( vg_name_old , vg_name_new ) ) {
log_error ( " Old and new volume group names must differ " ) ;
return 0 ;
}
return 1 ;
}
2002-12-19 23:25:55 +00:00
int vg_rename ( struct cmd_context * cmd , struct volume_group * vg ,
const char * new_name )
{
2009-04-10 10:01:08 +00:00
struct dm_pool * mem = vg - > vgmem ;
2005-06-01 16:51:55 +00:00
struct pv_list * pvl ;
2002-12-19 23:25:55 +00:00
2010-04-14 13:09:16 +00:00
vg - > old_name = vg - > name ;
2005-10-16 23:03:59 +00:00
if ( ! ( vg - > name = dm_pool_strdup ( mem , new_name ) ) ) {
2002-12-19 23:25:55 +00:00
log_error ( " vg->name allocation failed for '%s' " , new_name ) ;
return 0 ;
}
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs ) {
2017-10-27 21:38:16 +01:00
/* Skip if VG didn't change e.g. with vgsplit */
if ( pvl - > pv - > vg_name & & ! strcmp ( new_name , pvl - > pv - > vg_name ) )
continue ;
2005-10-16 23:03:59 +00:00
if ( ! ( pvl - > pv - > vg_name = dm_pool_strdup ( mem , new_name ) ) ) {
2002-12-19 23:25:55 +00:00
log_error ( " pv->vg_name allocation failed for '%s' " ,
2007-10-12 14:29:32 +00:00
pv_dev_name ( pvl - > pv ) ) ;
2002-12-19 23:25:55 +00:00
return 0 ;
}
2017-10-27 21:38:16 +01:00
2017-10-06 02:12:42 +01:00
/* Mark the PVs that still hold metadata with the old VG name */
2017-10-27 21:38:16 +01:00
log_debug_metadata ( " Marking PV %s as moved to VG %s " , dev_name ( pvl - > pv - > dev ) , new_name ) ;
pvl - > pv - > status | = PV_MOVED_VG ;
2002-12-19 23:25:55 +00:00
}
return 1 ;
}
2009-09-02 21:39:29 +00:00
int vg_remove_check ( struct volume_group * vg )
2007-08-21 17:38:20 +00:00
{
2008-12-04 15:54:26 +00:00
unsigned lv_count ;
2007-08-21 17:38:20 +00:00
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( vg_missing_pv_count ( vg ) ) {
2008-09-19 06:42:00 +00:00
log_error ( " Volume group \" %s \" not found, is inconsistent "
2009-07-10 20:05:29 +00:00
" or has PVs missing. " , vg ? vg - > name : " " ) ;
2007-08-21 17:38:20 +00:00
log_error ( " Consider vgreduce --removemissing if metadata "
" is inconsistent. " ) ;
return 0 ;
}
2009-05-13 21:27:43 +00:00
lv_count = vg_visible_lvs ( vg ) ;
2008-12-04 15:54:26 +00:00
if ( lv_count ) {
log_error ( " Volume group \" %s \" still contains %u "
2009-07-10 20:05:29 +00:00
" logical volume(s) " , vg - > name , lv_count ) ;
2007-08-21 17:38:20 +00:00
return 0 ;
}
2010-06-30 18:03:52 +00:00
return 1 ;
}
void vg_remove_pvs ( struct volume_group * vg )
{
struct pv_list * pvl , * tpvl ;
2009-09-02 21:39:49 +00:00
dm_list_iterate_items_safe ( pvl , tpvl , & vg - > pvs ) {
2010-04-13 17:26:03 +00:00
del_pvl_from_vgs ( vg , pvl ) ;
2009-09-02 21:39:49 +00:00
dm_list_add ( & vg - > removed_pvs , & pvl - > list ) ;
}
2009-09-02 21:39:29 +00:00
}
2015-03-05 14:00:44 -06:00
int vg_remove_direct ( struct volume_group * vg )
2009-09-02 21:39:29 +00:00
{
struct physical_volume * pv ;
struct pv_list * pvl ;
int ret = 1 ;
2009-09-02 21:39:07 +00:00
if ( ! vg_remove_mdas ( vg ) ) {
log_error ( " vg_remove_mdas %s failed " , vg - > name ) ;
2007-08-21 17:38:20 +00:00
return 0 ;
}
/* init physical volumes */
2009-09-02 21:39:49 +00:00
dm_list_iterate_items ( pvl , & vg - > removed_pvs ) {
2007-08-21 17:38:20 +00:00
pv = pvl - > pv ;
2010-06-30 19:55:43 +00:00
if ( is_missing_pv ( pv ) )
continue ;
2007-08-21 17:38:20 +00:00
log_verbose ( " Removing physical volume \" %s \" from "
2009-07-10 20:05:29 +00:00
" volume group \" %s \" " , pv_dev_name ( pv ) , vg - > name ) ;
2008-02-06 15:47:28 +00:00
pv - > vg_name = vg - > fid - > fmt - > orphan_vg_name ;
2014-07-11 12:24:15 +02:00
pv - > status & = ~ ALLOCATABLE_PV ;
2007-08-21 17:38:20 +00:00
if ( ! dev_get_size ( pv_dev ( pv ) , & pv - > size ) ) {
2007-10-12 14:29:32 +00:00
log_error ( " %s: Couldn't get size. " , pv_dev_name ( pv ) ) ;
2007-08-21 17:38:20 +00:00
ret = 0 ;
continue ;
}
/* FIXME Write to same sector label was read from */
2011-02-28 13:19:02 +00:00
if ( ! pv_write ( vg - > cmd , pv , 0 ) ) {
2007-08-21 17:38:20 +00:00
log_error ( " Failed to remove physical volume \" %s \" "
" from volume group \" %s \" " ,
2009-07-10 20:05:29 +00:00
pv_dev_name ( pv ) , vg - > name ) ;
2007-08-21 17:38:20 +00:00
ret = 0 ;
}
}
2015-03-05 14:00:44 -06:00
lockd_vg_update ( vg ) ;
2016-02-22 09:42:03 -06:00
set_vg_notify ( vg - > cmd ) ;
2010-12-22 15:36:41 +00:00
if ( ! backup_remove ( vg - > cmd , vg - > name ) )
stack ;
2007-08-21 17:38:20 +00:00
if ( ret )
config: add silent mode
Accept -q as the short form of --quiet.
Suppress non-essential standard output if -q is given twice.
Treat log/silent in lvm.conf as equivalent to -qq.
Review all log_print messages and change some to
log_print_unless_silent.
When silent, the following commands still produce output:
dumpconfig, lvdisplay, lvmdiskscan, lvs, pvck, pvdisplay,
pvs, version, vgcfgrestore -l, vgdisplay, vgs.
[Needs checking.]
Non-essential messages are shifted from log level 4 to log level 5
for syslog and lvm2_log_fn purposes.
2012-08-25 20:35:48 +01:00
log_print_unless_silent ( " Volume group \" %s \" successfully removed " , vg - > name ) ;
2007-08-21 17:38:20 +00:00
else
2009-07-10 20:05:29 +00:00
log_error ( " Volume group \" %s \" not properly removed " , vg - > name ) ;
2007-08-21 17:38:20 +00:00
2015-03-05 14:00:44 -06:00
return ret ;
}
int vg_remove ( struct volume_group * vg )
{
int ret ;
ret = vg_remove_direct ( vg ) ;
2007-08-21 17:38:20 +00:00
return ret ;
}
2013-12-12 11:26:35 +01:00
int check_dev_block_size_for_vg ( struct device * dev , const struct volume_group * vg ,
2019-08-07 11:38:06 -05:00
unsigned int * max_logical_block_size_found )
2013-12-12 11:26:35 +01:00
{
2019-08-07 11:38:06 -05:00
unsigned int physical_block_size , logical_block_size ;
2013-12-12 11:26:35 +01:00
2019-08-07 11:38:06 -05:00
if ( ! ( dev_get_direct_block_sizes ( dev , & physical_block_size , & logical_block_size ) ) )
2013-12-12 11:26:35 +01:00
return_0 ;
2019-08-07 11:38:06 -05:00
/* FIXME: max_logical_block_size_found does not seem to be used anywhere */
if ( logical_block_size > * max_logical_block_size_found )
* max_logical_block_size_found = logical_block_size ;
2013-12-12 11:26:35 +01:00
2019-08-07 11:38:06 -05:00
if ( logical_block_size > > SECTOR_SHIFT > vg - > extent_size ) {
2013-12-12 11:26:35 +01:00
log_error ( " Physical extent size used for volume group %s "
2019-08-07 11:38:06 -05:00
" is less than logical block size (%u bytes) that %s uses. " ,
vg - > name , logical_block_size , dev_name ( dev ) ) ;
2013-12-12 11:26:35 +01:00
return 0 ;
}
return 1 ;
}
int vg_check_pv_dev_block_sizes ( const struct volume_group * vg )
{
struct pv_list * pvl ;
2019-08-07 11:38:06 -05:00
unsigned int max_logical_block_size_found = 0 ;
2013-12-12 11:26:35 +01:00
dm_list_iterate_items ( pvl , & vg - > pvs ) {
2019-08-07 11:38:06 -05:00
if ( ! check_dev_block_size_for_vg ( pvl - > pv - > dev , vg , & max_logical_block_size_found ) )
2013-12-12 11:26:35 +01:00
return 0 ;
}
return 1 ;
}
2017-10-18 19:29:32 +01:00
int check_pv_dev_sizes ( struct volume_group * vg )
2016-01-22 11:37:09 +01:00
{
struct pv_list * pvl ;
uint64_t dev_size , size ;
int r = 1 ;
2016-01-22 13:20:21 +01:00
if ( ! vg - > cmd - > check_pv_dev_sizes | |
is_orphan_vg ( vg - > name ) )
2016-01-22 11:37:09 +01:00
return 1 ;
dm_list_iterate_items ( pvl , & vg - > pvs ) {
if ( is_missing_pv ( pvl - > pv ) )
continue ;
2016-03-10 13:02:38 +01:00
/*
* Don ' t compare the sizes if we ' re not able
* to determine the real dev_size . This may
* happen if the device has gone since we did
* VG read .
*/
if ( ! dev_get_size ( pvl - > pv - > dev , & dev_size ) )
continue ;
2016-01-22 11:37:09 +01:00
size = pv_size ( pvl - > pv ) ;
if ( dev_size < size ) {
2016-05-05 21:13:41 +02:00
log_warn ( " WARNING: Device %s has size of % " PRIu64 " sectors which "
2016-01-22 11:37:09 +01:00
" is smaller than corresponding PV size of % " PRIu64
" sectors. Was device resized? " ,
pv_dev_name ( pvl - > pv ) , dev_size , size ) ;
r = 0 ;
}
}
return r ;
}
2016-02-18 15:38:23 -06:00
int vg_extend_each_pv ( struct volume_group * vg , struct pvcreate_params * pp )
2016-02-16 14:15:24 -06:00
{
struct pv_list * pvl ;
2019-08-07 11:38:06 -05:00
unsigned int max_logical_block_size = 0 ;
2019-08-01 10:06:47 -05:00
unsigned int physical_block_size , logical_block_size ;
unsigned int prev_lbs = 0 ;
int inconsistent_existing_lbs = 0 ;
2016-02-16 14:15:24 -06:00
2016-11-25 14:08:39 +01:00
log_debug_metadata ( " Adding PVs to VG %s. " , vg - > name ) ;
2016-02-16 14:15:24 -06:00
2017-10-18 19:29:32 +01:00
if ( vg_bad_status_bits ( vg , RESIZEABLE_VG ) )
2016-02-16 14:15:24 -06:00
return_0 ;
2019-08-01 10:06:47 -05:00
/*
* Check if existing PVs have inconsistent block sizes .
* If so , do not enforce new devices to be consistent .
*/
dm_list_iterate_items ( pvl , & vg - > pvs ) {
logical_block_size = 0 ;
physical_block_size = 0 ;
2019-09-03 10:07:56 -05:00
if ( ! pvl - > pv - > dev )
continue ;
2019-08-01 10:06:47 -05:00
if ( ! dev_get_direct_block_sizes ( pvl - > pv - > dev , & physical_block_size , & logical_block_size ) )
continue ;
if ( ! logical_block_size )
continue ;
if ( ! prev_lbs ) {
prev_lbs = logical_block_size ;
continue ;
}
if ( prev_lbs ! = logical_block_size ) {
inconsistent_existing_lbs = 1 ;
break ;
}
}
2016-02-16 14:15:24 -06:00
dm_list_iterate_items ( pvl , & pp - > pvs ) {
2016-11-25 14:08:39 +01:00
log_debug_metadata ( " Adding PV %s to VG %s. " , pv_dev_name ( pvl - > pv ) , vg - > name ) ;
2016-02-16 14:15:24 -06:00
if ( ! ( check_dev_block_size_for_vg ( pvl - > pv - > dev ,
( const struct volume_group * ) vg ,
2019-08-07 11:38:06 -05:00
& max_logical_block_size ) ) ) {
2016-11-25 14:08:39 +01:00
log_error ( " PV %s has wrong block size. " , pv_dev_name ( pvl - > pv ) ) ;
2016-11-25 13:46:06 +01:00
return 0 ;
2016-02-16 14:15:24 -06:00
}
2019-08-01 10:06:47 -05:00
logical_block_size = 0 ;
physical_block_size = 0 ;
if ( ! dev_get_direct_block_sizes ( pvl - > pv - > dev , & physical_block_size , & logical_block_size ) )
log_warn ( " WARNING: PV %s has unknown block size. " , pv_dev_name ( pvl - > pv ) ) ;
else if ( prev_lbs & & logical_block_size & & ( logical_block_size ! = prev_lbs ) ) {
if ( vg - > cmd - > allow_mixed_block_sizes | | inconsistent_existing_lbs )
log_debug ( " Devices have inconsistent block sizes (%u and %u) " , prev_lbs , logical_block_size ) ;
else {
log_error ( " Devices have inconsistent logical block sizes (%u and %u). " ,
prev_lbs , logical_block_size ) ;
return 0 ;
}
}
2017-10-18 19:29:32 +01:00
if ( ! add_pv_to_vg ( vg , pv_dev_name ( pvl - > pv ) , pvl - > pv , 0 ) ) {
2016-02-16 14:15:24 -06:00
log_error ( " PV %s cannot be added to VG %s. " ,
pv_dev_name ( pvl - > pv ) , vg - > name ) ;
2016-11-25 13:46:06 +01:00
return 0 ;
2016-02-16 14:15:24 -06:00
}
}
2017-10-18 19:29:32 +01:00
( void ) check_pv_dev_sizes ( vg ) ;
2016-02-16 14:15:24 -06:00
dm_list_splice ( & vg - > pv_write_list , & pp - > pvs ) ;
return 1 ;
}
2010-02-24 18:15:49 +00:00
int lv_change_tag ( struct logical_volume * lv , const char * tag , int add_tag )
{
2010-02-24 18:15:57 +00:00
char * tag_new ;
2010-02-24 18:15:49 +00:00
if ( ! ( lv - > vg - > fid - > fmt - > features & FMT_TAGS ) ) {
log_error ( " Logical volume %s/%s does not support tags " ,
lv - > vg - > name , lv - > name ) ;
return 0 ;
}
if ( add_tag ) {
2010-02-24 18:15:57 +00:00
if ( ! ( tag_new = dm_pool_strdup ( lv - > vg - > vgmem , tag ) ) ) {
2010-07-09 16:57:44 +00:00
log_error ( " Failed to duplicate tag %s from %s/%s " ,
tag , lv - > vg - > name , lv - > name ) ;
return 0 ;
2010-02-24 18:15:57 +00:00
}
if ( ! str_list_add ( lv - > vg - > vgmem , & lv - > tags , tag_new ) ) {
2010-02-24 18:15:49 +00:00
log_error ( " Failed to add tag %s to %s/%s " ,
tag , lv - > vg - > name , lv - > name ) ;
return 0 ;
}
2012-02-08 12:52:58 +00:00
} else
str_list_del ( & lv - > tags , tag ) ;
2010-02-24 18:15:49 +00:00
return 1 ;
}
2010-02-24 18:15:05 +00:00
int vg_change_tag ( struct volume_group * vg , const char * tag , int add_tag )
{
2010-02-24 18:15:57 +00:00
char * tag_new ;
2010-02-24 18:15:05 +00:00
if ( ! ( vg - > fid - > fmt - > features & FMT_TAGS ) ) {
log_error ( " Volume group %s does not support tags " , vg - > name ) ;
return 0 ;
}
if ( add_tag ) {
2010-02-24 18:15:57 +00:00
if ( ! ( tag_new = dm_pool_strdup ( vg - > vgmem , tag ) ) ) {
2010-07-09 16:57:44 +00:00
log_error ( " Failed to duplicate tag %s from %s " ,
tag , vg - > name ) ;
return 0 ;
2010-02-24 18:15:57 +00:00
}
if ( ! str_list_add ( vg - > vgmem , & vg - > tags , tag_new ) ) {
2010-02-24 18:15:05 +00:00
log_error ( " Failed to add tag %s to volume group %s " ,
tag , vg - > name ) ;
return 0 ;
}
2012-02-08 12:52:58 +00:00
} else
str_list_del ( & vg - > tags , tag ) ;
2010-02-24 18:15:05 +00:00
return 1 ;
}
2001-11-12 15:10:01 +00:00
const char * strip_dir ( const char * vg_name , const char * dev_dir )
2001-11-12 12:16:57 +00:00
{
2002-12-19 23:25:55 +00:00
size_t len = strlen ( dev_dir ) ;
2001-11-12 12:16:57 +00:00
if ( ! strncmp ( vg_name , dev_dir , len ) )
vg_name + = len ;
return vg_name ;
}
2014-09-19 14:51:41 +02:00
/*
* Validates major and minor numbers .
* On > 2.4 kernel we only support dynamic major number .
*/
int validate_major_minor ( const struct cmd_context * cmd ,
const struct format_type * fmt ,
int32_t major , int32_t minor )
{
int r = 1 ;
if ( ! strncmp ( cmd - > kernel_vsn , " 2.4. " , 4 ) | |
( fmt - > features & FMT_RESTRICTED_LVIDS ) ) {
if ( major < 0 | | major > 255 ) {
log_error ( " Major number %d outside range 0-255. " , major ) ;
r = 0 ;
}
if ( minor < 0 | | minor > 255 ) {
log_error ( " Minor number %d outside range 0-255. " , minor ) ;
r = 0 ;
}
} else {
/* 12 bits for major number */
if ( ( major ! = - 1 ) & &
2023-07-13 12:32:44 +02:00
( major ! = ( int ) cmd - > dev_types - > device_mapper_major ) ) {
2014-09-19 14:51:41 +02:00
/* User supplied some major number */
if ( major < 0 | | major > 4095 ) {
log_error ( " Major number %d outside range 0-4095. " , major ) ;
r = 0 ;
} else
log_print_unless_silent ( " Ignoring supplied major %d number - "
" kernel assigns major numbers dynamically. " ,
major ) ;
}
/* 20 bits for minor number */
if ( minor < 0 | | minor > 1048575 ) {
log_error ( " Minor number %d outside range 0-1048575. " , minor ) ;
r = 0 ;
}
}
return r ;
}
2008-01-14 21:07:58 +00:00
/*
* Validate parameters to vg_create ( ) before calling .
2008-01-16 19:54:39 +00:00
* FIXME : Move inside vg_create library function .
* FIXME : Change vgcreate_params struct to individual gets / sets
2008-01-14 21:07:58 +00:00
*/
2009-11-01 20:05:17 +00:00
int vgcreate_params_validate ( struct cmd_context * cmd ,
struct vgcreate_params * vp )
2008-01-14 21:07:58 +00:00
{
2012-10-16 10:20:02 +02:00
if ( ! validate_new_vg_name ( cmd , vp - > vg_name ) )
return_0 ;
2008-01-14 21:07:58 +00:00
if ( vp - > alloc = = ALLOC_INHERIT ) {
log_error ( " Volume Group allocation policy cannot inherit "
" from anything " ) ;
2012-10-16 10:07:27 +02:00
return 0 ;
2008-01-14 21:07:58 +00:00
}
if ( ! vp - > extent_size ) {
log_error ( " Physical extent size may not be zero " ) ;
2012-10-16 10:07:27 +02:00
return 0 ;
2008-01-14 21:07:58 +00:00
}
if ( ! ( cmd - > fmt - > features & FMT_UNLIMITED_VOLS ) ) {
if ( ! vp - > max_lv )
vp - > max_lv = 255 ;
if ( ! vp - > max_pv )
vp - > max_pv = 255 ;
if ( vp - > max_lv > 255 | | vp - > max_pv > 255 ) {
log_error ( " Number of volumes may not exceed 255 " ) ;
2012-10-16 10:07:27 +02:00
return 0 ;
2008-01-14 21:07:58 +00:00
}
}
2012-10-16 10:07:27 +02:00
return 1 ;
2008-01-14 21:07:58 +00:00
}
2017-11-14 15:38:55 +00:00
static void _vg_wipe_cached_precommitted ( struct volume_group * vg )
{
release_vg ( vg - > vg_precommitted ) ;
vg - > vg_precommitted = NULL ;
}
static void _vg_move_cached_precommitted_to_committed ( struct volume_group * vg )
{
release_vg ( vg - > vg_committed ) ;
vg - > vg_committed = vg - > vg_precommitted ;
vg - > vg_precommitted = NULL ;
2021-06-08 19:39:15 +02:00
vg - > needs_backup = 1 ;
2017-11-14 15:38:55 +00:00
}
2009-10-16 17:41:49 +00:00
int lv_has_unknown_segments ( const struct logical_volume * lv )
{
struct lv_segment * seg ;
/* foreach segment */
dm_list_iterate_items ( seg , & lv - > segments )
if ( seg_unknown ( seg ) )
return 1 ;
return 0 ;
}
int vg_has_unknown_segments ( const struct volume_group * vg )
{
struct lv_list * lvl ;
/* foreach LV */
dm_list_iterate_items ( lvl , & vg - > lvs )
if ( lv_has_unknown_segments ( lvl - > lv ) )
return 1 ;
return 0 ;
}
Change vg_create() to take only minimal parameters and obtain a lock.
vg_t *vg_create(struct cmd_context *cmd, const char *vg_name);
This is the first step towards the API called to create a VG.
Call vg_lock_newname() inside this function. Use _vg_make_handle()
where possible.
Now we have 2 ways to construct a volume group:
1) vg_read: Used when constructing an existing VG from disks
2) vg_create: Used when constructing a new VG
Both of these interfaces obtain a lock, and return a vg_t *.
The usage of _vg_make_handle() inside vg_create() doesn't fit
perfectly but it's ok for now. Needs some cleanup though and I've
noted "FIXME" in the code.
Add the new vg_create() plus vg 'set' functions for non-default
VG parameters in the following tools:
- vgcreate: Fairly straightforward refactoring. We just moved
vg_lock_newname inside vg_create so we check the return via
vg_read_error.
- vgsplit: The refactoring here is a bit more tricky. Originally
we called vg_lock_newname and depending on the error code, we either
read the existing vg or created the new one. Now vg_create()
calls vg_lock_newname, so we first try to create the VG. If this
fails with FAILED_EXIST, we can then do the vg_read. If the
create succeeds, we check the input parameters and set any new
values on the VG.
TODO in future patches:
1. The VG_ORPHAN lock needs some thought. We may want to treat
this as any other VG, and require the application to obtain a handle
and pass it to other API calls (for example, vg_extend). Or,
we may find that hiding the VG_ORPHAN lock inside other APIs is
the way to go. I thought of placing the VG_ORPHAN lock inside
vg_create() and tying it to the vg handle, but was not certain
this was the right approach.
2. Cleanup error paths. Integrate vg_read_error() with vg_create and
vg_read* error codes and/or the new error APIs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2009-07-09 10:09:33 +00:00
/*
* Create a VG with default parameters .
*/
2009-07-29 13:26:01 +00:00
struct volume_group * vg_create ( struct cmd_context * cmd , const char * vg_name )
2001-10-12 14:25:53 +00:00
{
2009-07-29 13:26:01 +00:00
struct volume_group * vg ;
2013-06-16 21:07:39 +02:00
struct format_instance_ctx fic = {
. type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS ,
. context . vg_ref . vg_name = vg_name
} ;
2011-03-11 14:50:13 +00:00
struct format_instance * fid ;
2009-04-10 09:59:18 +00:00
2011-03-10 12:43:29 +00:00
if ( ! ( vg = alloc_vg ( " vg_create " , cmd , vg_name ) ) )
2009-04-10 09:59:18 +00:00
goto_bad ;
2001-10-12 14:25:53 +00:00
if ( ! id_create ( & vg - > id ) ) {
2009-07-15 20:02:46 +00:00
log_error ( " Couldn't create uuid for volume group '%s'. " ,
vg_name ) ;
2001-10-12 14:25:53 +00:00
goto bad ;
}
2002-01-10 23:21:07 +00:00
vg - > status = ( RESIZEABLE_VG | LVM_READ | LVM_WRITE ) ;
2015-02-23 23:03:52 +00:00
vg - > system_id = NULL ;
2007-04-26 16:44:59 +00:00
Change vg_create() to take only minimal parameters and obtain a lock.
vg_t *vg_create(struct cmd_context *cmd, const char *vg_name);
This is the first step towards the API called to create a VG.
Call vg_lock_newname() inside this function. Use _vg_make_handle()
where possible.
Now we have 2 ways to construct a volume group:
1) vg_read: Used when constructing an existing VG from disks
2) vg_create: Used when constructing a new VG
Both of these interfaces obtain a lock, and return a vg_t *.
The usage of _vg_make_handle() inside vg_create() doesn't fit
perfectly but it's ok for now. Needs some cleanup though and I've
noted "FIXME" in the code.
Add the new vg_create() plus vg 'set' functions for non-default
VG parameters in the following tools:
- vgcreate: Fairly straightforward refactoring. We just moved
vg_lock_newname inside vg_create so we check the return via
vg_read_error.
- vgsplit: The refactoring here is a bit more tricky. Originally
we called vg_lock_newname and depending on the error code, we either
read the existing vg or created the new one. Now vg_create()
calls vg_lock_newname, so we first try to create the VG. If this
fails with FAILED_EXIST, we can then do the vg_read. If the
create succeeds, we check the input parameters and set any new
values on the VG.
TODO in future patches:
1. The VG_ORPHAN lock needs some thought. We may want to treat
this as any other VG, and require the application to obtain a handle
and pass it to other API calls (for example, vg_extend). Or,
we may find that hiding the VG_ORPHAN lock inside other APIs is
the way to go. I thought of placing the VG_ORPHAN lock inside
vg_create() and tying it to the vg handle, but was not certain
this was the right approach.
2. Cleanup error paths. Integrate vg_read_error() with vg_create and
vg_read* error codes and/or the new error APIs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2009-07-09 10:09:33 +00:00
vg - > extent_size = DEFAULT_EXTENT_SIZE * 2 ;
vg - > max_lv = DEFAULT_MAX_LV ;
vg - > max_pv = DEFAULT_MAX_PV ;
vg - > alloc = DEFAULT_ALLOC_POLICY ;
2010-06-28 20:36:37 +00:00
vg - > mda_copies = DEFAULT_VGMETADATACOPIES ;
2004-05-18 22:12:53 +00:00
2011-03-11 14:50:13 +00:00
if ( ! ( fid = cmd - > fmt - > ops - > create_instance ( cmd - > fmt , & fic ) ) ) {
2002-04-24 18:20:51 +00:00
log_error ( " Failed to create format instance " ) ;
goto bad ;
}
2011-03-11 14:50:13 +00:00
vg_set_fid ( vg , fid ) ;
2002-04-24 18:20:51 +00:00
2003-08-26 21:12:06 +00:00
if ( vg - > fid - > fmt - > ops - > vg_setup & &
! vg - > fid - > fmt - > ops - > vg_setup ( vg - > fid , vg ) ) {
2001-10-15 18:39:40 +00:00
log_error ( " Format specific setup of volume group '%s' failed. " ,
vg_name ) ;
2001-10-12 14:25:53 +00:00
goto bad ;
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
return vg ;
2001-10-12 14:25:53 +00:00
Change vg_create() to take only minimal parameters and obtain a lock.
vg_t *vg_create(struct cmd_context *cmd, const char *vg_name);
This is the first step towards the API called to create a VG.
Call vg_lock_newname() inside this function. Use _vg_make_handle()
where possible.
Now we have 2 ways to construct a volume group:
1) vg_read: Used when constructing an existing VG from disks
2) vg_create: Used when constructing a new VG
Both of these interfaces obtain a lock, and return a vg_t *.
The usage of _vg_make_handle() inside vg_create() doesn't fit
perfectly but it's ok for now. Needs some cleanup though and I've
noted "FIXME" in the code.
Add the new vg_create() plus vg 'set' functions for non-default
VG parameters in the following tools:
- vgcreate: Fairly straightforward refactoring. We just moved
vg_lock_newname inside vg_create so we check the return via
vg_read_error.
- vgsplit: The refactoring here is a bit more tricky. Originally
we called vg_lock_newname and depending on the error code, we either
read the existing vg or created the new one. Now vg_create()
calls vg_lock_newname, so we first try to create the VG. If this
fails with FAILED_EXIST, we can then do the vg_read. If the
create succeeds, we check the input parameters and set any new
values on the VG.
TODO in future patches:
1. The VG_ORPHAN lock needs some thought. We may want to treat
this as any other VG, and require the application to obtain a handle
and pass it to other API calls (for example, vg_extend). Or,
we may find that hiding the VG_ORPHAN lock inside other APIs is
the way to go. I thought of placing the VG_ORPHAN lock inside
vg_create() and tying it to the vg handle, but was not certain
this was the right approach.
2. Cleanup error paths. Integrate vg_read_error() with vg_create and
vg_read* error codes and/or the new error APIs.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2009-07-09 10:09:33 +00:00
bad :
2011-08-10 20:25:29 +00:00
unlock_and_release_vg ( cmd , vg , vg_name ) ;
2001-10-12 14:25:53 +00:00
return NULL ;
}
2014-10-30 11:38:49 +01:00
/* Rounds up by default */
uint32_t extents_from_size ( struct cmd_context * cmd , uint64_t size ,
2009-07-26 02:34:09 +00:00
uint32_t extent_size )
{
if ( size % extent_size ) {
size + = extent_size - size % extent_size ;
config: add silent mode
Accept -q as the short form of --quiet.
Suppress non-essential standard output if -q is given twice.
Treat log/silent in lvm.conf as equivalent to -qq.
Review all log_print messages and change some to
log_print_unless_silent.
When silent, the following commands still produce output:
dumpconfig, lvdisplay, lvmdiskscan, lvs, pvck, pvdisplay,
pvs, version, vgcfgrestore -l, vgdisplay, vgs.
[Needs checking.]
Non-essential messages are shifted from log level 4 to log level 5
for syslog and lvm2_log_fn purposes.
2012-08-25 20:35:48 +01:00
log_print_unless_silent ( " Rounding up size to full physical extent %s " ,
display_size ( cmd , size ) ) ;
2009-07-26 02:34:09 +00:00
}
2011-11-04 22:49:53 +00:00
if ( size > ( uint64_t ) MAX_EXTENT_COUNT * extent_size ) {
2009-07-26 02:34:09 +00:00
log_error ( " Volume too large (%s) for extent size %s. "
2018-05-16 17:53:38 -04:00
" Upper limit is less than %s. " ,
2009-07-26 02:34:09 +00:00
display_size ( cmd , size ) ,
display_size ( cmd , ( uint64_t ) extent_size ) ,
2011-11-04 22:49:53 +00:00
display_size ( cmd , ( uint64_t ) MAX_EXTENT_COUNT *
2009-07-26 02:34:09 +00:00
extent_size ) ) ;
return 0 ;
}
2014-10-30 11:38:49 +01:00
return ( uint32_t ) ( size / extent_size ) ;
2009-07-26 02:34:09 +00:00
}
2014-10-30 23:43:12 +01:00
/*
* Converts size according to percentage with specified rounding to extents
*
* For PERCENT_NONE size is in standard sector units .
* For all other percent type is in DM_PERCENT_1 base unit ( supports decimal point )
*
* Return value of 0 extents is an error .
*/
uint32_t extents_from_percent_size ( struct volume_group * vg , const struct dm_list * pvh ,
uint32_t extents , int roundup ,
percent_type_t percent , uint64_t size )
{
uint32_t count ;
switch ( percent ) {
case PERCENT_NONE :
if ( ! roundup & & ( size % vg - > extent_size ) ) {
if ( ! ( size - = size % vg - > extent_size ) ) {
log_error ( " Specified size is smaller then physical extent boundary. " ) ;
return 0 ;
}
log_print_unless_silent ( " Rounding size to boundary between physical extents: %s. " ,
display_size ( vg - > cmd , size ) ) ;
}
return extents_from_size ( vg - > cmd , size , vg - > extent_size ) ;
case PERCENT_LV :
break ; /* Base extents already passed in. */
case PERCENT_VG :
extents = vg - > extent_count ;
break ;
case PERCENT_PVS :
if ( pvh ! = & vg - > pvs ) {
/* Physical volumes are specified on cmdline */
if ( ! ( extents = pv_list_extents_free ( pvh ) ) ) {
log_error ( " No free extents in the list of physical volumes. " ) ;
return 0 ;
}
break ;
}
2023-08-16 15:12:36 +02:00
/* fall through */ /* to use all PVs in VG like %FREE */
2014-10-30 23:43:12 +01:00
case PERCENT_FREE :
if ( ! ( extents = vg - > free_count ) ) {
log_error ( " No free extents in Volume group %s. " , vg - > name ) ;
return 0 ;
}
break ;
default :
log_error ( INTERNAL_ERROR " Unsupported percent type %u. " , percent ) ;
return 0 ;
}
if ( ! ( count = percent_of_extents ( size , extents , roundup ) ) )
2017-06-24 16:22:36 +02:00
log_error ( " Converted %s%%%s into 0 extents. " ,
display_percent ( vg - > cmd , size ) , get_percent_string ( percent ) ) ;
2014-10-30 23:43:12 +01:00
else
2017-06-24 16:22:36 +02:00
log_verbose ( " Converted %s%%%s into % " PRIu32 " extents. " ,
display_percent ( vg - > cmd , size ) , get_percent_string ( percent ) , count ) ;
2014-10-30 23:43:12 +01:00
return count ;
}
2010-07-05 22:23:15 +00:00
static dm_bitset_t _bitset_with_random_bits ( struct dm_pool * mem , uint32_t num_bits ,
uint32_t num_set_bits , unsigned * seed )
{
dm_bitset_t bs ;
unsigned bit_selected ;
char buf [ 32 ] ;
uint32_t i = num_bits - num_set_bits ;
2017-07-20 09:57:09 +02:00
if ( ! ( bs = dm_bitset_create ( mem , num_bits ) ) ) {
2010-07-05 22:23:15 +00:00
log_error ( " Failed to allocate bitset for setting random bits. " ) ;
return NULL ;
}
if ( ! dm_pool_begin_object ( mem , 512 ) ) {
log_error ( " dm_pool_begin_object failed for random list of bits. " ) ;
dm_pool_free ( mem , bs ) ;
return NULL ;
}
/* Perform loop num_set_bits times, selecting one bit each time */
while ( i + + < num_bits ) {
/* Select a random bit between 0 and (i-1) inclusive. */
2014-04-04 01:26:19 +01:00
bit_selected = lvm_even_rand ( seed , i ) ;
2010-07-05 22:23:15 +00:00
/*
* If the bit was already set , set the new bit that became
* choosable for the first time during this pass .
* This maintains a uniform probability distribution by compensating
* for being unable to select it until this pass .
*/
if ( dm_bit ( bs , bit_selected ) )
bit_selected = i - 1 ;
dm_bit_set ( bs , bit_selected ) ;
if ( dm_snprintf ( buf , sizeof ( buf ) , " %u " , bit_selected ) < 0 ) {
log_error ( " snprintf random bit failed. " ) ;
dm_pool_free ( mem , bs ) ;
return NULL ;
}
if ( ! dm_pool_grow_object ( mem , buf , strlen ( buf ) ) ) {
log_error ( " Failed to generate list of random bits. " ) ;
dm_pool_free ( mem , bs ) ;
return NULL ;
}
}
2011-03-14 17:00:57 +00:00
if ( ! dm_pool_grow_object ( mem , " \0 " , 1 ) ) {
log_error ( " Failed to finish list of random bits. " ) ;
dm_pool_free ( mem , bs ) ;
return NULL ;
}
2013-01-07 22:30:29 +00:00
log_debug_metadata ( " Selected % " PRIu32 " random bits from % " PRIu32 " : %s " , num_set_bits , num_bits , ( char * ) dm_pool_end_object ( mem ) ) ;
2010-07-05 22:23:15 +00:00
return bs ;
}
2010-06-28 20:37:54 +00:00
static int _vg_ignore_mdas ( struct volume_group * vg , uint32_t num_to_ignore )
{
struct metadata_area * mda ;
2010-07-05 22:23:15 +00:00
uint32_t mda_used_count = vg_mda_used_count ( vg ) ;
dm_bitset_t mda_to_ignore_bs ;
int r = 1 ;
2010-06-28 20:37:54 +00:00
2013-01-07 22:30:29 +00:00
log_debug_metadata ( " Adjusting ignored mdas for %s: % " PRIu32 " of % " PRIu32 " mdas in use "
" but % " PRIu32 " required. Changing % " PRIu32 " mda. " ,
vg - > name , mda_used_count , vg_mda_count ( vg ) , vg_mda_copies ( vg ) , num_to_ignore ) ;
2010-06-30 13:51:11 +00:00
2010-06-28 20:37:54 +00:00
if ( ! num_to_ignore )
return 1 ;
2010-06-30 13:51:11 +00:00
2010-07-05 22:23:15 +00:00
if ( ! ( mda_to_ignore_bs = _bitset_with_random_bits ( vg - > vgmem , mda_used_count ,
num_to_ignore , & vg - > cmd - > rand_seed ) ) )
return_0 ;
2010-06-30 19:28:35 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use )
2010-07-05 22:23:15 +00:00
if ( ! mda_is_ignored ( mda ) & & ( - - mda_used_count ,
dm_bit ( mda_to_ignore_bs , mda_used_count ) ) ) {
2010-06-28 20:37:54 +00:00
mda_set_ignored ( mda , 1 ) ;
2010-06-30 19:28:35 +00:00
if ( ! - - num_to_ignore )
2010-07-05 22:23:15 +00:00
goto out ;
2010-06-28 20:37:54 +00:00
}
2010-06-30 13:51:11 +00:00
log_error ( INTERNAL_ERROR " Unable to find % " PRIu32 " metadata areas to ignore "
2010-06-28 20:37:54 +00:00
" on volume group %s " , num_to_ignore , vg - > name ) ;
2010-06-30 13:51:11 +00:00
2010-07-05 22:23:15 +00:00
r = 0 ;
out :
dm_pool_free ( vg - > vgmem , mda_to_ignore_bs ) ;
return r ;
2010-06-28 20:37:54 +00:00
}
static int _vg_unignore_mdas ( struct volume_group * vg , uint32_t num_to_unignore )
{
struct metadata_area * mda , * tmda ;
2010-07-05 22:23:15 +00:00
uint32_t mda_used_count = vg_mda_used_count ( vg ) ;
uint32_t mda_count = vg_mda_count ( vg ) ;
uint32_t mda_free_count = mda_count - mda_used_count ;
dm_bitset_t mda_to_unignore_bs ;
int r = 1 ;
2010-06-28 20:37:54 +00:00
if ( ! num_to_unignore )
return 1 ;
2010-06-30 13:51:11 +00:00
2013-01-07 22:30:29 +00:00
log_debug_metadata ( " Adjusting ignored mdas for %s: % " PRIu32 " of % " PRIu32 " mdas in use "
" but % " PRIu32 " required. Changing % " PRIu32 " mda. " ,
vg - > name , mda_used_count , mda_count , vg_mda_copies ( vg ) , num_to_unignore ) ;
2010-07-05 22:23:15 +00:00
if ( ! ( mda_to_unignore_bs = _bitset_with_random_bits ( vg - > vgmem , mda_free_count ,
num_to_unignore , & vg - > cmd - > rand_seed ) ) )
return_0 ;
2010-06-30 13:51:11 +00:00
2010-06-30 19:28:35 +00:00
dm_list_iterate_items_safe ( mda , tmda , & vg - > fid - > metadata_areas_ignored )
2010-07-05 22:23:15 +00:00
if ( mda_is_ignored ( mda ) & & ( - - mda_free_count ,
dm_bit ( mda_to_unignore_bs , mda_free_count ) ) ) {
2010-06-28 20:37:54 +00:00
mda_set_ignored ( mda , 0 ) ;
dm_list_move ( & vg - > fid - > metadata_areas_in_use ,
& mda - > list ) ;
2010-06-30 19:28:35 +00:00
if ( ! - - num_to_unignore )
2010-07-05 22:23:15 +00:00
goto out ;
2010-06-28 20:37:54 +00:00
}
2010-06-30 13:51:11 +00:00
2010-06-30 19:28:35 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use )
2010-07-06 20:09:38 +00:00
if ( mda_is_ignored ( mda ) & & ( - - mda_free_count ,
dm_bit ( mda_to_unignore_bs , mda_free_count ) ) ) {
2010-06-28 20:37:54 +00:00
mda_set_ignored ( mda , 0 ) ;
2010-06-30 19:28:35 +00:00
if ( ! - - num_to_unignore )
2010-07-05 22:23:15 +00:00
goto out ;
2010-06-28 20:37:54 +00:00
}
2010-06-30 13:51:11 +00:00
log_error ( INTERNAL_ERROR " Unable to find % " PRIu32 " metadata areas to unignore "
" on volume group %s " , num_to_unignore , vg - > name ) ;
2010-07-05 22:23:15 +00:00
r = 0 ;
out :
dm_pool_free ( vg - > vgmem , mda_to_unignore_bs ) ;
return r ;
2010-06-28 20:37:54 +00:00
}
static int _vg_adjust_ignored_mdas ( struct volume_group * vg )
{
2010-06-30 19:28:35 +00:00
uint32_t mda_copies_used = vg_mda_used_count ( vg ) ;
2010-06-28 20:37:54 +00:00
2010-06-30 19:28:35 +00:00
if ( vg - > mda_copies = = VGMETADATACOPIES_UNMANAGED ) {
/* Ensure at least one mda is in use. */
if ( ! mda_copies_used & & vg_mda_count ( vg ) & & ! _vg_unignore_mdas ( vg , 1 ) )
return_0 ;
2010-06-28 20:37:54 +00:00
else
2010-06-30 19:28:35 +00:00
return 1 ;
2010-06-28 20:37:54 +00:00
}
2010-06-30 13:51:11 +00:00
2010-06-30 19:28:35 +00:00
/* Not an error to have vg_mda_count larger than total mdas. */
if ( vg - > mda_copies = = VGMETADATACOPIES_ALL | |
vg - > mda_copies > = vg_mda_count ( vg ) ) {
/* Use all */
if ( ! _vg_unignore_mdas ( vg , vg_mda_count ( vg ) - mda_copies_used ) )
return_0 ;
} else if ( mda_copies_used < vg - > mda_copies ) {
if ( ! _vg_unignore_mdas ( vg , vg - > mda_copies - mda_copies_used ) )
return_0 ;
} else if ( mda_copies_used > vg - > mda_copies )
if ( ! _vg_ignore_mdas ( vg , mda_copies_used - vg - > mda_copies ) )
return_0 ;
Allow 'all' and 'unmanaged' values for --vgmetadatacopies.
Allowing an 'all' and 'unmanaged' value is more intuitive, and
provides a simple way for users to get back to original LVM behavior
of metadata written to all PVs in the volume group.
If the user requests "--vgmetadatacopies unmanaged", this instructs
LVM not to manage the ignore bits to achieve a specific number of
metadata copies in the volume group. The user is free to use
"pvchange --metadataignore" to control the mdas on a per-PV basis.
If the user requests "--vgmetadatacopies all", this instructs LVM
to do 2 things: 1) clear all ignore bits, and 2) set the "unmanaged"
policy going forward.
Internally, we use the special MAX_UINT32 value to indicate 'all'.
This 'just' works since it's the largest value possible for the
field and so all 'ignore' bits on all mdas in the VG will get
cleared inside _vg_metadata_balance(). However, after we've
called the _vg_metadata_balance function, we check for the special
'all' value, and if set, we write the "unmanaged" value into the
metadata. As such, the 'all' value is never written to disk.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:40:01 +00:00
/*
* The VGMETADATACOPIES_ALL value will never be written disk .
* It is a special cmdline value that means 2 things :
* 1. clear all ignore bits in all mdas in this vg
* 2. set the " unmanaged " policy going forward for metadata balancing
*/
if ( vg - > mda_copies = = VGMETADATACOPIES_ALL )
vg - > mda_copies = VGMETADATACOPIES_UNMANAGED ;
2010-06-30 13:51:11 +00:00
return 1 ;
2010-06-28 20:37:54 +00:00
}
2010-08-20 12:43:49 +00:00
uint64_t find_min_mda_size ( struct dm_list * mdas )
{
uint64_t min_mda_size = UINT64_MAX , mda_size ;
struct metadata_area * mda ;
dm_list_iterate_items ( mda , mdas ) {
if ( ! mda - > ops - > mda_total_sectors )
continue ;
mda_size = mda - > ops - > mda_total_sectors ( mda ) ;
if ( mda_size < min_mda_size )
min_mda_size = mda_size ;
}
if ( min_mda_size = = UINT64_MAX )
min_mda_size = UINT64_C ( 0 ) ;
return min_mda_size ;
}
2010-06-28 20:38:56 +00:00
static int _move_mdas ( struct volume_group * vg_from , struct volume_group * vg_to ,
struct dm_list * mdas_from , struct dm_list * mdas_to )
2007-03-23 12:43:17 +00:00
{
struct metadata_area * mda , * mda2 ;
int common_mda = 0 ;
2008-11-03 22:14:30 +00:00
dm_list_iterate_items_safe ( mda , mda2 , mdas_from ) {
2007-03-23 12:43:17 +00:00
if ( ! mda - > ops - > mda_in_vg ) {
common_mda = 1 ;
continue ;
}
2008-04-08 12:49:21 +00:00
if ( ! mda - > ops - > mda_in_vg ( vg_from - > fid , vg_from , mda ) ) {
if ( is_orphan_vg ( vg_to - > name ) )
2008-11-03 22:14:30 +00:00
dm_list_del ( & mda - > list ) ;
2008-04-08 12:49:21 +00:00
else
2008-11-03 22:14:30 +00:00
dm_list_move ( mdas_to , & mda - > list ) ;
2008-04-08 12:49:21 +00:00
}
2007-03-23 12:43:17 +00:00
}
2010-06-28 20:38:56 +00:00
return common_mda ;
}
/*
* Separate metadata areas after splitting a VG .
* Also accepts orphan VG as destination ( for vgreduce ) .
*/
2010-07-09 15:34:40 +00:00
int vg_split_mdas ( struct cmd_context * cmd __attribute__ ( ( unused ) ) ,
2010-06-28 20:38:56 +00:00
struct volume_group * vg_from , struct volume_group * vg_to )
{
struct dm_list * mdas_from_in_use , * mdas_to_in_use ;
struct dm_list * mdas_from_ignored , * mdas_to_ignored ;
int common_mda = 0 ;
2007-03-23 12:43:17 +00:00
2010-06-28 20:38:56 +00:00
mdas_from_in_use = & vg_from - > fid - > metadata_areas_in_use ;
mdas_from_ignored = & vg_from - > fid - > metadata_areas_ignored ;
mdas_to_in_use = & vg_to - > fid - > metadata_areas_in_use ;
mdas_to_ignored = & vg_to - > fid - > metadata_areas_ignored ;
common_mda = _move_mdas ( vg_from , vg_to ,
mdas_from_in_use , mdas_to_in_use ) ;
common_mda = _move_mdas ( vg_from , vg_to ,
mdas_from_ignored , mdas_to_ignored ) ;
if ( ( dm_list_empty ( mdas_from_in_use ) & &
dm_list_empty ( mdas_from_ignored ) ) | |
( ( ! is_orphan_vg ( vg_to - > name ) & &
dm_list_empty ( mdas_to_in_use ) & &
dm_list_empty ( mdas_to_ignored ) ) ) )
2007-03-23 12:43:17 +00:00
return common_mda ;
return 1 ;
}
2016-02-18 15:38:23 -06:00
void pvcreate_params_set_defaults ( struct pvcreate_params * pp )
{
memset ( pp , 0 , sizeof ( * pp ) ) ;
pp - > zero = 1 ;
pp - > force = PROMPT ;
pp - > yes = 0 ;
pp - > restorefile = NULL ;
pp - > uuid_str = NULL ;
pp - > pva . size = 0 ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
pp - > pva . data_alignment = 0 ;
pp - > pva . data_alignment_offset = 0 ;
2016-02-18 15:38:23 -06:00
pp - > pva . pvmetadatacopies = DEFAULT_PVMETADATACOPIES ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
pp - > pva . pvmetadatasize = get_default_pvmetadatasize_sectors ( ) ;
2016-02-18 15:38:23 -06:00
pp - > pva . label_sector = DEFAULT_LABELSECTOR ;
pp - > pva . metadataignore = DEFAULT_PVMETADATAIGNORE ;
pp - > pva . ba_start = 0 ;
pp - > pva . ba_size = 0 ;
pp - > pva . pe_start = PV_PE_START_CALC ;
pp - > pva . extent_count = 0 ;
pp - > pva . extent_size = 0 ;
dm_list_init ( & pp - > prompts ) ;
dm_list_init ( & pp - > arg_devices ) ;
dm_list_init ( & pp - > arg_process ) ;
dm_list_init ( & pp - > arg_confirm ) ;
dm_list_init ( & pp - > arg_create ) ;
dm_list_init ( & pp - > arg_remove ) ;
dm_list_init ( & pp - > arg_fail ) ;
dm_list_init ( & pp - > pvs ) ;
}
2008-09-19 04:28:58 +00:00
static struct physical_volume * _alloc_pv ( struct dm_pool * mem , struct device * dev )
2007-10-12 18:37:19 +00:00
{
2012-02-13 10:51:52 +00:00
struct physical_volume * pv ;
2007-10-12 18:37:19 +00:00
2012-02-13 10:51:52 +00:00
if ( ! ( pv = dm_pool_zalloc ( mem , sizeof ( * pv ) ) ) ) {
log_error ( " Failed to allocate pv structure. " ) ;
return NULL ;
}
2007-10-12 18:37:19 +00:00
2008-09-19 04:28:58 +00:00
pv - > dev = dev ;
2007-10-12 18:37:19 +00:00
2008-11-03 22:14:30 +00:00
dm_list_init ( & pv - > tags ) ;
dm_list_init ( & pv - > segments ) ;
2007-10-12 18:37:19 +00:00
return pv ;
}
2009-07-26 01:52:19 +00:00
/**
* pv_create - initialize a physical volume for use with a volume group
2012-02-13 11:03:59 +00:00
* created PV belongs to Orphan VG .
2009-07-26 01:52:19 +00:00
*
* Returns :
* PV handle - physical volume initialized successfully
* NULL - invalid parameter or problem initializing the physical volume
*/
2016-02-18 15:31:27 -06:00
2009-07-26 01:52:19 +00:00
struct physical_volume * pv_create ( const struct cmd_context * cmd ,
2007-06-11 18:29:30 +00:00
struct device * dev ,
2016-02-18 15:31:27 -06:00
struct pv_create_args * pva )
2001-09-25 12:49:28 +00:00
{
2009-07-26 01:52:19 +00:00
const struct format_type * fmt = cmd - > fmt ;
2012-02-13 11:03:59 +00:00
struct dm_pool * mem = fmt - > orphan_vg - > vgmem ;
2008-09-19 04:28:58 +00:00
struct physical_volume * pv = _alloc_pv ( mem , dev ) ;
2011-02-21 12:24:15 +00:00
unsigned mda_index ;
2012-02-13 11:03:59 +00:00
struct pv_list * pvl ;
2016-02-18 15:31:27 -06:00
uint64_t size = pva - > size ;
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
uint64_t data_alignment = pva - > data_alignment ;
uint64_t data_alignment_offset = pva - > data_alignment_offset ;
2016-02-18 15:31:27 -06:00
unsigned pvmetadatacopies = pva - > pvmetadatacopies ;
uint64_t pvmetadatasize = pva - > pvmetadatasize ;
unsigned metadataignore = pva - > metadataignore ;
2001-10-12 10:32:06 +00:00
2007-10-12 18:37:19 +00:00
if ( ! pv )
2012-02-13 10:51:52 +00:00
return_NULL ;
2001-10-12 10:32:06 +00:00
2016-02-18 15:31:27 -06:00
if ( pva - > idp )
memcpy ( & pv - > id , pva - > idp , sizeof ( * pva - > idp ) ) ;
2005-01-20 18:11:53 +00:00
else if ( ! id_create ( & pv - > id ) ) {
log_error ( " Failed to create random uuid for %s. " ,
dev_name ( dev ) ) ;
2007-10-12 18:37:19 +00:00
goto bad ;
2005-01-20 18:11:53 +00:00
}
2002-01-16 18:10:08 +00:00
2002-02-20 18:29:30 +00:00
if ( ! dev_get_size ( pv - > dev , & pv - > size ) ) {
2007-10-12 14:29:32 +00:00
log_error ( " %s: Couldn't get size. " , pv_dev_name ( pv ) ) ;
2002-02-20 18:29:30 +00:00
goto bad ;
}
if ( size ) {
if ( size > pv - > size )
2007-06-28 17:33:44 +00:00
log_warn ( " WARNING: %s: Overriding real size. "
2007-10-12 14:29:32 +00:00
" You could lose data. " , pv_dev_name ( pv ) ) ;
2002-04-24 18:20:51 +00:00
log_verbose ( " %s: Pretending size is % " PRIu64 " sectors. " ,
2007-10-12 14:29:32 +00:00
pv_dev_name ( pv ) , size ) ;
2002-02-20 18:29:30 +00:00
pv - > size = size ;
}
2002-04-24 18:20:51 +00:00
2011-02-18 14:11:22 +00:00
if ( pv - > size < pv_min_size ( ) ) {
log_error ( " %s: Size must exceed minimum of % " PRIu64 " sectors. " ,
pv_dev_name ( pv ) , pv_min_size ( ) ) ;
2001-10-12 10:32:06 +00:00
goto bad ;
}
2013-02-21 14:47:49 +01:00
if ( pv - > size < data_alignment + data_alignment_offset ) {
2009-02-23 16:53:42 +00:00
log_error ( " %s: Data alignment must not exceed device size. " ,
pv_dev_name ( pv ) ) ;
goto bad ;
}
2012-02-10 02:53:03 +00:00
if ( ! ( pvl = dm_pool_zalloc ( mem , sizeof ( * pvl ) ) ) ) {
log_error ( " pv_list allocation in pv_create failed " ) ;
2011-02-21 12:12:32 +00:00
goto bad ;
}
2012-02-10 02:53:03 +00:00
pvl - > pv = pv ;
add_pvl_to_vgs ( fmt - > orphan_vg , pvl ) ;
fmt - > orphan_vg - > extent_count + = pv - > pe_count ;
fmt - > orphan_vg - > free_count + = pv - > pe_count ;
2011-02-21 12:12:32 +00:00
2002-11-18 14:04:08 +00:00
pv - > fmt = fmt ;
2008-02-06 15:47:28 +00:00
pv - > vg_name = fmt - > orphan_vg_name ;
2002-02-15 14:33:59 +00:00
Place the first PE at 1 MiB for all defaults
. When using default settings, this commit should change
nothing. The first PE continues to be placed at 1 MiB
resulting in a metadata area size of 1020 KiB (for
4K page sizes; slightly smaller for larger page sizes.)
. When default_data_alignment is disabled in lvm.conf,
align pe_start at 1 MiB, based on a default metadata area
size that adapts to the page size. Previously, disabling
this option would result in mda_size that was too small
for common use, and produced a 64 KiB aligned pe_start.
. Customized pe_start and mda_size values continue to be
set as before in lvm.conf and command line.
. Remove the configure option for setting default_data_alignment
at build time.
. Improve alignment related option descriptions.
. Add section about alignment to pvcreate man page.
Previously, DEFAULT_PVMETADATASIZE was 255 sectors.
However, the fact that the config setting named
"default_data_alignment" has a default value of 1 (MiB)
meant that DEFAULT_PVMETADATASIZE was having no effect.
The metadata area size is the space between the start of
the metadata area (page size offset from the start of the
device) and the first PE (1 MiB by default due to
default_data_alignment 1.) The result is a 1020 KiB metadata
area on machines with 4KiB page size (1024 KiB - 4 KiB),
and smaller on machines with larger page size.
If default_data_alignment was set to 0 (disabled), then
DEFAULT_PVMETADATASIZE 255 would take effect, and produce a
metadata area that was 188 KiB and pe_start of 192 KiB.
This was too small for common use.
This is fixed by making the default metadata area size a
computed value that matches the value produced by
default_data_alignment.
2018-11-13 15:00:11 -06:00
/*
* Sets pv : pe_align , pe_align_offset , pe_start , pe_size
* Does not write to device .
*/
2016-02-18 15:31:27 -06:00
if ( ! fmt - > ops - > pv_initialise ( fmt , pva , pv ) ) {
2011-02-21 12:24:15 +00:00
log_error ( " Format-specific initialisation of physical "
" volume %s failed. " , pv_dev_name ( pv ) ) ;
2002-02-15 14:33:59 +00:00
goto bad ;
}
2009-02-22 19:00:26 +00:00
2011-02-21 12:24:15 +00:00
for ( mda_index = 0 ; mda_index < pvmetadatacopies ; mda_index + + ) {
if ( pv - > fmt - > ops - > pv_add_metadata_area & &
! pv - > fmt - > ops - > pv_add_metadata_area ( pv - > fmt , pv ,
2016-02-18 15:31:27 -06:00
pva - > pe_start ! = PV_PE_START_CALC ,
2011-02-21 12:24:15 +00:00
mda_index , pvmetadatasize ,
metadataignore ) ) {
log_error ( " Failed to add metadata area for "
" new physical volume %s " , pv_dev_name ( pv ) ) ;
goto bad ;
}
}
2001-10-12 10:32:06 +00:00
return pv ;
2001-10-15 18:39:40 +00:00
bad :
2012-02-13 11:03:59 +00:00
// FIXME: detach from orphan in error path
//free_pv_fid(pv);
//dm_pool_free(mem, pv);
2001-10-12 10:32:06 +00:00
return NULL ;
2001-09-25 12:49:28 +00:00
}
2007-06-11 18:29:30 +00:00
/* FIXME: liblvm todo - make into function that returns handle */
2008-03-13 22:51:24 +00:00
struct pv_list * find_pv_in_vg ( const struct volume_group * vg ,
2013-03-19 13:58:02 +01:00
const char * pv_name )
2001-10-15 18:39:40 +00:00
{
2002-01-21 14:28:12 +00:00
struct pv_list * pvl ;
2018-06-15 11:03:55 -05:00
struct device * dev = dev_cache_get ( vg - > cmd , pv_name , vg - > cmd - > filter ) ;
2001-10-25 14:04:18 +00:00
2014-10-07 16:06:21 +02:00
/*
* If the device does not exist or is filtered out , don ' t bother trying
* to find it in the list . This also prevents accidentally finding a
* non - NULL PV which happens to be missing ( i . e . its pv - > dev is NULL )
* for such devices .
*/
if ( ! dev )
return NULL ;
dm_list_iterate_items ( pvl , & vg - > pvs )
if ( pvl - > pv - > dev = = dev )
2002-01-21 14:28:12 +00:00
return pvl ;
2001-09-25 12:49:28 +00:00
2001-10-15 18:39:40 +00:00
return NULL ;
2002-11-18 14:04:08 +00:00
}
2008-11-03 22:14:30 +00:00
struct pv_list * find_pv_in_pv_list ( const struct dm_list * pl ,
2008-03-28 19:08:23 +00:00
const struct physical_volume * pv )
{
struct pv_list * pvl ;
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , pl )
2008-03-28 19:08:23 +00:00
if ( pvl - > pv = = pv )
return pvl ;
2008-04-10 19:59:43 +00:00
2008-03-28 19:08:23 +00:00
return NULL ;
}
2003-01-17 21:04:26 +00:00
int pv_is_in_vg ( struct volume_group * vg , struct physical_volume * pv )
{
2005-06-01 16:51:55 +00:00
struct pv_list * pvl ;
2003-01-17 21:04:26 +00:00
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs )
2005-06-01 16:51:55 +00:00
if ( pv = = pvl - > pv )
2003-01-17 21:04:26 +00:00
return 1 ;
return 0 ;
}
2007-06-12 22:41:27 +00:00
/**
* find_pv_in_vg_by_uuid - Find PV in VG by PV UUID
* @ vg : volume group to search
* @ id : UUID of the PV to match
*
* Returns :
2010-03-16 15:30:48 +00:00
* struct pv_list within owning struct volume_group - if UUID of PV found in VG
2007-06-12 22:41:27 +00:00
* NULL - invalid parameter or UUID of PV not found in VG
*
* Note
* FIXME - liblvm todo - make into function that takes VG handle
*/
2010-03-16 15:30:48 +00:00
struct pv_list * find_pv_in_vg_by_uuid ( const struct volume_group * vg ,
const struct id * id )
2007-06-11 18:29:30 +00:00
{
2013-03-19 13:58:02 +01:00
struct pv_list * pvl ;
dm_list_iterate_items ( pvl , & vg - > pvs )
if ( id_equal ( & pvl - > pv - > id , id ) )
return pvl ;
return NULL ;
2007-06-11 18:29:30 +00:00
}
2008-03-13 22:51:24 +00:00
struct lv_list * find_lv_in_vg ( const struct volume_group * vg ,
const char * lv_name )
2001-10-29 13:52:23 +00:00
{
2002-01-21 14:28:12 +00:00
struct lv_list * lvl ;
2001-10-29 13:52:23 +00:00
const char * ptr ;
/* Use last component */
if ( ( ptr = strrchr ( lv_name , ' / ' ) ) )
ptr + + ;
else
ptr = lv_name ;
2001-10-31 12:47:01 +00:00
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvl , & vg - > lvs )
2002-01-21 16:49:32 +00:00
if ( ! strcmp ( lvl - > lv - > name , ptr ) )
2002-01-21 14:28:12 +00:00
return lvl ;
2001-10-29 13:52:23 +00:00
2001-11-09 22:01:04 +00:00
return NULL ;
2001-10-29 13:52:23 +00:00
}
2021-03-10 21:58:57 +01:00
struct logical_volume * find_lv_in_vg_by_lvid ( const struct volume_group * vg ,
2015-11-21 23:31:44 +01:00
const union lvid * lvid )
2002-02-25 12:56:16 +00:00
{
struct lv_list * lvl ;
2021-08-03 15:32:33 -05:00
if ( memcmp ( & lvid - > id [ 0 ] , & vg - > id , ID_LEN ) )
2021-03-08 08:07:47 +01:00
return NULL ; /* Check VG does not match */
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvl , & vg - > lvs )
2021-03-08 08:07:47 +01:00
if ( ! memcmp ( & lvid - > id [ 1 ] , & lvl - > lv - > lvid . id [ 1 ] , sizeof ( lvid - > id [ 1 ] ) ) )
return lvl - > lv ; /* LV uuid match */
2002-02-25 12:56:16 +00:00
return NULL ;
}
2008-03-13 22:51:24 +00:00
struct logical_volume * find_lv ( const struct volume_group * vg ,
const char * lv_name )
2001-10-29 13:52:23 +00:00
{
2024-10-24 22:41:00 +02:00
if ( ! vg - > lv_names ) {
struct lv_list * lvl = find_lv_in_vg ( vg , lv_name ) ;
return lvl ? lvl - > lv : NULL ;
}
2024-10-24 16:12:18 +02:00
return radix_tree_lookup_ptr ( vg - > lv_names , lv_name , strlen ( lv_name ) ) ;
2001-10-29 13:52:23 +00:00
}
2016-03-01 15:21:21 +01:00
struct generic_logical_volume * find_historical_glv ( const struct volume_group * vg ,
const char * historical_lv_name ,
2016-03-01 15:26:57 +01:00
int check_removed_list ,
2016-03-01 15:21:21 +01:00
struct glv_list * * glvl_found )
{
struct glv_list * glvl ;
const char * ptr ;
2016-03-01 15:26:57 +01:00
const struct dm_list * list = check_removed_list ? & vg - > removed_historical_lvs
: & vg - > historical_lvs ;
2016-03-01 15:21:21 +01:00
/* Use last component */
if ( ( ptr = strrchr ( historical_lv_name , ' / ' ) ) )
ptr + + ;
else
ptr = historical_lv_name ;
2016-03-01 15:26:57 +01:00
dm_list_iterate_items ( glvl , list ) {
2016-03-01 15:21:21 +01:00
if ( ! strcmp ( glvl - > glv - > historical - > name , ptr ) ) {
if ( glvl_found )
* glvl_found = glvl ;
return glvl - > glv ;
}
}
if ( glvl_found )
* glvl_found = NULL ;
return NULL ;
}
2016-03-01 15:31:48 +01:00
int lv_name_is_used_in_vg ( const struct volume_group * vg , const char * name , int * historical )
{
2021-10-01 13:53:28 +02:00
int found ;
2016-03-01 15:31:48 +01:00
2021-10-01 13:53:28 +02:00
if ( historical )
* historical = 0 ;
if ( find_lv ( vg , name ) )
2016-03-01 15:31:48 +01:00
found = 1 ;
2021-10-01 13:53:28 +02:00
else if ( find_historical_glv ( vg , name , 0 , NULL ) ) {
2016-03-01 15:31:48 +01:00
found = 1 ;
if ( historical )
* historical = 1 ;
2021-10-01 13:53:28 +02:00
} else
found = 0 ;
2016-03-01 15:31:48 +01:00
return found ;
}
2001-11-28 13:45:50 +00:00
struct physical_volume * find_pv ( struct volume_group * vg , struct device * dev )
2001-10-29 13:52:23 +00:00
{
2005-06-01 16:51:55 +00:00
struct pv_list * pvl ;
2001-11-09 22:01:04 +00:00
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs )
2005-06-01 16:51:55 +00:00
if ( dev = = pvl - > pv - > dev )
return pvl - > pv ;
2002-01-21 16:05:23 +00:00
2001-11-09 22:01:04 +00:00
return NULL ;
2001-10-29 13:52:23 +00:00
}
2024-10-24 16:12:18 +02:00
struct physical_volume * find_pv_by_pv_name ( struct volume_group * vg , const char * pv_name )
{
if ( ! vg - > pv_names ) {
log_error ( INTERNAL_ERROR " Cannot find pv name %s outside of _read_vg() " , pv_name ) ;
return NULL ;
}
return radix_tree_lookup_ptr ( vg - > pv_names , pv_name , strlen ( pv_name ) ) ;
}
2002-02-25 12:56:16 +00:00
2003-04-24 22:23:24 +00:00
/* Find segment at a given logical extent in an LV */
2007-12-20 18:55:46 +00:00
struct lv_segment * find_seg_by_le ( const struct logical_volume * lv , uint32_t le )
2003-04-24 22:23:24 +00:00
{
struct lv_segment * seg ;
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( seg , & lv - > segments )
2003-04-24 22:23:24 +00:00
if ( le > = seg - > le & & le < seg - > le + seg - > len )
return seg ;
return NULL ;
}
2007-12-20 18:55:46 +00:00
struct lv_segment * first_seg ( const struct logical_volume * lv )
2005-10-28 12:48:50 +00:00
{
2009-05-12 19:09:21 +00:00
struct lv_segment * seg ;
2005-10-28 12:48:50 +00:00
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( seg , & lv - > segments )
2009-05-12 19:09:21 +00:00
return seg ;
2005-10-28 12:48:50 +00:00
2009-05-12 19:09:21 +00:00
return NULL ;
2005-10-28 12:48:50 +00:00
}
2011-10-28 20:12:54 +00:00
struct lv_segment * last_seg ( const struct logical_volume * lv )
{
2012-02-27 09:51:31 +00:00
struct lv_segment * seg ;
2011-10-28 20:12:54 +00:00
2012-02-27 09:51:31 +00:00
dm_list_iterate_back_items ( seg , & lv - > segments )
return seg ;
2011-10-28 20:12:54 +00:00
2012-02-27 09:51:31 +00:00
return NULL ;
2011-10-28 20:12:54 +00:00
}
2009-09-02 21:39:07 +00:00
int vg_remove_mdas ( struct volume_group * vg )
2002-04-24 18:20:51 +00:00
{
2002-11-18 14:04:08 +00:00
struct metadata_area * mda ;
2002-04-24 18:20:51 +00:00
/* FIXME Improve recovery situation? */
/* Remove each copy of the metadata */
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2002-11-18 14:04:08 +00:00
if ( mda - > ops - > vg_remove & &
2008-01-30 13:19:47 +00:00
! mda - > ops - > vg_remove ( vg - > fid , vg , mda ) )
return_0 ;
2002-04-24 18:20:51 +00:00
}
return 1 ;
}
2008-01-16 19:54:39 +00:00
/*
* Determine whether two vgs are compatible for merging .
*/
2010-07-09 15:34:40 +00:00
int vgs_are_compatible ( struct cmd_context * cmd __attribute__ ( ( unused ) ) ,
2008-01-16 19:54:39 +00:00
struct volume_group * vg_from ,
struct volume_group * vg_to )
{
struct lv_list * lvl1 , * lvl2 ;
struct pv_list * pvl ;
2011-02-18 14:47:28 +00:00
const char * name1 , * name2 ;
2008-01-16 19:54:39 +00:00
if ( lvs_in_vg_activated ( vg_from ) ) {
log_error ( " Logical volumes in \" %s \" must be inactive " ,
vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
/* Check compatibility */
if ( vg_to - > extent_size ! = vg_from - > extent_size ) {
log_error ( " Extent sizes differ: %d (%s) and %d (%s) " ,
vg_to - > extent_size , vg_to - > name ,
vg_from - > extent_size , vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
if ( vg_to - > max_pv & &
( vg_to - > max_pv < vg_to - > pv_count + vg_from - > pv_count ) ) {
log_error ( " Maximum number of physical volumes (%d) exceeded "
" for \" %s \" and \" %s \" " , vg_to - > max_pv , vg_to - > name ,
vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
if ( vg_to - > max_lv & &
2009-05-13 21:27:43 +00:00
( vg_to - > max_lv < vg_visible_lvs ( vg_to ) + vg_visible_lvs ( vg_from ) ) ) {
2008-01-16 19:54:39 +00:00
log_error ( " Maximum number of logical volumes (%d) exceeded "
" for \" %s \" and \" %s \" " , vg_to - > max_lv , vg_to - > name ,
vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
2008-01-22 02:48:53 +00:00
/* Metadata types must be the same */
if ( vg_to - > fid - > fmt ! = vg_from - > fid - > fmt ) {
log_error ( " Metadata types differ for \" %s \" and \" %s \" " ,
vg_to - > name , vg_from - > name ) ;
return 0 ;
}
2008-01-16 19:54:39 +00:00
/* Check no conflicts with LV names */
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvl1 , & vg_to - > lvs ) {
2008-01-17 17:17:09 +00:00
name1 = lvl1 - > lv - > name ;
2008-01-16 19:54:39 +00:00
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvl2 , & vg_from - > lvs ) {
2008-01-17 17:17:09 +00:00
name2 = lvl2 - > lv - > name ;
2008-01-16 19:54:39 +00:00
if ( ! strcmp ( name1 , name2 ) ) {
log_error ( " Duplicate logical volume "
" name \" %s \" "
" in \" %s \" and \" %s \" " ,
name1 , vg_to - > name , vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
}
}
/* Check no PVs are constructed from either VG */
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg_to - > pvs ) {
2008-01-16 19:54:39 +00:00
if ( pv_uses_vg ( pvl - > pv , vg_from ) ) {
log_error ( " Physical volume %s might be constructed "
" from same volume group %s. " ,
pv_dev_name ( pvl - > pv ) , vg_from - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
}
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg_from - > pvs ) {
2008-01-16 19:54:39 +00:00
if ( pv_uses_vg ( pvl - > pv , vg_to ) ) {
log_error ( " Physical volume %s might be constructed "
" from same volume group %s. " ,
pv_dev_name ( pvl - > pv ) , vg_to - > name ) ;
2008-01-17 17:17:09 +00:00
return 0 ;
2008-01-16 19:54:39 +00:00
}
}
return 1 ;
}
2008-09-19 06:42:00 +00:00
struct _lv_postorder_baton {
int ( * fn ) ( struct logical_volume * lv , void * data ) ;
void * data ;
} ;
2017-07-20 10:30:00 +02:00
static int _lv_postorder_visit ( struct logical_volume * lv ,
2008-09-19 06:42:00 +00:00
int ( * fn ) ( struct logical_volume * lv , void * data ) ,
void * data ) ;
static int _lv_each_dependency ( struct logical_volume * lv ,
int ( * fn ) ( struct logical_volume * lv , void * data ) ,
void * data )
{
2011-04-08 14:40:18 +00:00
unsigned i , s ;
2008-09-19 06:42:00 +00:00
struct lv_segment * lvseg ;
2013-12-17 13:53:15 +01:00
struct dm_list * snh ;
2008-09-19 06:42:00 +00:00
struct logical_volume * deps [ ] = {
lv - > snapshot ? lv - > snapshot - > origin : 0 ,
lv - > snapshot ? lv - > snapshot - > cow : 0 } ;
2014-04-04 21:10:30 +02:00
for ( i = 0 ; i < DM_ARRAY_SIZE ( deps ) ; + + i ) {
2008-09-19 06:42:00 +00:00
if ( deps [ i ] & & ! fn ( deps [ i ] , data ) )
return_0 ;
}
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvseg , & lv - > segments ) {
2013-12-17 13:53:15 +01:00
if ( lvseg - > external_lv & & ! fn ( lvseg - > external_lv , data ) )
return_0 ;
2008-09-19 06:42:00 +00:00
if ( lvseg - > log_lv & & ! fn ( lvseg - > log_lv , data ) )
return_0 ;
2012-01-25 08:50:10 +00:00
if ( lvseg - > pool_lv & & ! fn ( lvseg - > pool_lv , data ) )
return_0 ;
if ( lvseg - > metadata_lv & & ! fn ( lvseg - > metadata_lv , data ) )
return_0 ;
2020-12-11 15:56:04 -06:00
if ( lvseg - > writecache & & ! fn ( lvseg - > writecache , data ) )
return_0 ;
if ( lvseg - > integrity_meta_dev & & ! fn ( lvseg - > integrity_meta_dev , data ) )
return_0 ;
2008-09-19 06:42:00 +00:00
for ( s = 0 ; s < lvseg - > area_count ; + + s ) {
if ( seg_type ( lvseg , s ) = = AREA_LV & & ! fn ( seg_lv ( lvseg , s ) , data ) )
return_0 ;
}
}
2013-12-17 13:53:15 +01:00
if ( lv_is_origin ( lv ) )
dm_list_iterate ( snh , & lv - > snapshot_segs )
if ( ! fn ( dm_list_struct_base ( snh , struct lv_segment , origin_list ) - > cow , data ) )
return_0 ;
2008-09-19 06:42:00 +00:00
return 1 ;
}
static int _lv_postorder_cleanup ( struct logical_volume * lv , void * data )
{
if ( ! ( lv - > status & POSTORDER_FLAG ) )
return 1 ;
lv - > status & = ~ POSTORDER_FLAG ;
if ( ! _lv_each_dependency ( lv , _lv_postorder_cleanup , data ) )
return_0 ;
return 1 ;
}
2011-02-14 19:27:05 +00:00
static int _lv_postorder_level ( struct logical_volume * lv , void * data )
{
struct _lv_postorder_baton * baton = data ;
2015-11-17 13:21:22 +01:00
return ( data ) ? _lv_postorder_visit ( lv , baton - > fn , baton - > data ) : 0 ;
2020-08-28 21:14:06 +02:00
}
2011-02-14 19:27:05 +00:00
2008-09-19 06:42:00 +00:00
static int _lv_postorder_visit ( struct logical_volume * lv ,
int ( * fn ) ( struct logical_volume * lv , void * data ) ,
void * data )
{
struct _lv_postorder_baton baton ;
int r ;
if ( lv - > status & POSTORDER_FLAG )
return 1 ;
2011-02-14 19:27:05 +00:00
if ( lv - > status & POSTORDER_OPEN_FLAG )
return 1 ; // a data structure loop has closed...
lv - > status | = POSTORDER_OPEN_FLAG ;
2008-09-19 06:42:00 +00:00
baton . fn = fn ;
baton . data = data ;
r = _lv_each_dependency ( lv , _lv_postorder_level , & baton ) ;
2011-02-14 19:27:05 +00:00
2009-05-30 01:54:29 +00:00
if ( r )
2008-09-19 06:42:00 +00:00
r = fn ( lv , data ) ;
2009-05-30 01:54:29 +00:00
2011-02-14 19:27:05 +00:00
lv - > status & = ~ POSTORDER_OPEN_FLAG ;
lv - > status | = POSTORDER_FLAG ;
2008-09-19 06:42:00 +00:00
return r ;
}
/*
* This will walk the LV dependency graph in depth - first order and in the
* postorder , call a callback function " fn " . The void * data is passed along all
* the calls . The callback may return zero to indicate an error and terminate
* the depth - first walk . The error is propagated to return value of
* _lv_postorder .
*/
static int _lv_postorder ( struct logical_volume * lv ,
int ( * fn ) ( struct logical_volume * lv , void * data ) ,
void * data )
{
int r ;
2011-08-11 17:34:30 +00:00
int pool_locked = dm_pool_locked ( lv - > vg - > vgmem ) ;
if ( pool_locked & & ! dm_pool_unlock ( lv - > vg - > vgmem , 0 ) )
return_0 ;
2008-09-19 06:42:00 +00:00
r = _lv_postorder_visit ( lv , fn , data ) ;
_lv_postorder_cleanup ( lv , 0 ) ;
2011-08-11 17:34:30 +00:00
if ( pool_locked & & ! dm_pool_lock ( lv - > vg - > vgmem , 0 ) )
return_0 ;
2008-09-19 06:42:00 +00:00
return r ;
}
2011-03-10 14:40:32 +00:00
/*
* Calls _lv_postorder ( ) on each LV from VG . Avoids duplicate transitivity visits .
* Clears with _lv_postorder_cleanup ( ) when all LVs were visited by postorder .
*/
static int _lv_postorder_vg ( struct volume_group * vg ,
int ( * fn ) ( struct logical_volume * lv , void * data ) ,
void * data )
{
struct lv_list * lvl ;
int r = 1 ;
2011-08-11 17:34:30 +00:00
int pool_locked = dm_pool_locked ( vg - > vgmem ) ;
if ( pool_locked & & ! dm_pool_unlock ( vg - > vgmem , 0 ) )
return_0 ;
2011-03-10 14:40:32 +00:00
dm_list_iterate_items ( lvl , & vg - > lvs )
if ( ! _lv_postorder_visit ( lvl - > lv , fn , data ) ) {
stack ;
r = 0 ;
}
dm_list_iterate_items ( lvl , & vg - > lvs )
_lv_postorder_cleanup ( lvl - > lv , 0 ) ;
2011-08-11 17:34:30 +00:00
if ( pool_locked & & ! dm_pool_lock ( vg - > vgmem , 0 ) )
return_0 ;
2011-03-10 14:40:32 +00:00
return r ;
}
2008-09-19 06:42:00 +00:00
struct _lv_mark_if_partial_baton {
int partial ;
} ;
static int _lv_mark_if_partial_collect ( struct logical_volume * lv , void * data )
{
struct _lv_mark_if_partial_baton * baton = data ;
2016-03-02 20:59:03 +01:00
if ( baton & & lv_is_partial ( lv ) )
2008-09-19 06:42:00 +00:00
baton - > partial = 1 ;
return 1 ;
}
static int _lv_mark_if_partial_single ( struct logical_volume * lv , void * data )
{
2011-04-08 14:40:18 +00:00
unsigned s ;
2016-12-09 15:08:04 +01:00
struct _lv_mark_if_partial_baton baton = { . partial = 0 } ;
2008-09-19 06:42:00 +00:00
struct lv_segment * lvseg ;
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( lvseg , & lv - > segments ) {
2008-09-19 06:42:00 +00:00
for ( s = 0 ; s < lvseg - > area_count ; + + s ) {
if ( seg_type ( lvseg , s ) = = AREA_PV ) {
2010-03-16 14:37:38 +00:00
if ( is_missing_pv ( seg_pv ( lvseg , s ) ) )
2008-09-19 06:42:00 +00:00
lv - > status | = PARTIAL_LV ;
}
}
}
2012-02-28 11:10:45 +00:00
if ( ! _lv_each_dependency ( lv , _lv_mark_if_partial_collect , & baton ) )
return_0 ;
2008-09-19 06:42:00 +00:00
if ( baton . partial )
lv - > status | = PARTIAL_LV ;
return 1 ;
}
/*
* Mark LVs with missing PVs using PARTIAL_LV status flag . The flag is
* propagated transitively , so LVs referencing other LVs are marked
* partial as well , if any of their referenced LVs are marked partial .
*/
2011-05-07 13:32:05 +00:00
int vg_mark_partial_lvs ( struct volume_group * vg , int clear )
2008-09-19 06:42:00 +00:00
{
2011-05-07 13:32:05 +00:00
struct lv_list * lvl ;
if ( clear )
dm_list_iterate_items ( lvl , & vg - > lvs )
lvl - > lv - > status & = ~ PARTIAL_LV ;
2011-03-10 14:40:32 +00:00
if ( ! _lv_postorder_vg ( vg , _lv_mark_if_partial_single , NULL ) )
return_0 ;
2008-09-19 06:42:00 +00:00
return 1 ;
}
2009-05-20 11:09:49 +00:00
/*
* Be sure that all PV devices have cached read ahead in dev - cache
* Currently it takes read_ahead from first PV segment only
*/
static int _lv_read_ahead_single ( struct logical_volume * lv , void * data )
{
struct lv_segment * seg = first_seg ( lv ) ;
uint32_t seg_read_ahead = 0 , * read_ahead = data ;
2012-06-21 12:43:31 +02:00
if ( ! read_ahead ) {
log_error ( INTERNAL_ERROR " Read ahead data missing. " ) ;
return 0 ;
}
2009-06-01 12:43:31 +00:00
if ( seg & & seg - > area_count & & seg_type ( seg , 0 ) = = AREA_PV )
2009-05-20 11:09:49 +00:00
dev_get_read_ahead ( seg_pv ( seg , 0 ) - > dev , & seg_read_ahead ) ;
if ( seg_read_ahead > * read_ahead )
* read_ahead = seg_read_ahead ;
return 1 ;
}
2009-06-01 12:43:31 +00:00
/*
* Calculate readahead for logical volume from underlying PV devices .
* If read_ahead is NULL , only ensure that readahead of PVs are preloaded
* into PV struct device in dev cache .
*/
void lv_calculate_readahead ( const struct logical_volume * lv , uint32_t * read_ahead )
2009-05-20 11:09:49 +00:00
{
2009-06-01 12:43:31 +00:00
uint32_t _read_ahead = 0 ;
2009-05-20 11:09:49 +00:00
if ( lv - > read_ahead = = DM_READ_AHEAD_AUTO )
2009-06-01 12:43:31 +00:00
_lv_postorder ( ( struct logical_volume * ) lv , _lv_read_ahead_single , & _read_ahead ) ;
2009-05-20 11:09:49 +00:00
2009-06-01 12:43:31 +00:00
if ( read_ahead ) {
2013-01-07 22:30:29 +00:00
log_debug_metadata ( " Calculated readahead of LV %s is %u " , lv - > name , _read_ahead ) ;
2009-06-01 12:43:31 +00:00
* read_ahead = _read_ahead ;
}
2009-05-20 11:09:49 +00:00
}
2011-03-30 13:35:51 +00:00
struct validate_hash {
2024-10-15 15:28:10 +02:00
struct radix_tree * lvname ;
struct radix_tree * historical_lvname ;
struct radix_tree * lvid ;
struct radix_tree * historical_lvid ;
struct radix_tree * pvid ;
struct radix_tree * lv_lock_args ;
2011-03-30 13:35:51 +00:00
} ;
2010-12-14 17:51:09 +00:00
/*
* Check that an LV and all its PV references are correctly listed in vg - > lvs
* and vg - > pvs , respectively . This only looks at a single LV , but * not * at the
* LVs it is using . To do the latter , you should use _lv_postorder with this
* function . C . f . vg_validate .
*/
static int _lv_validate_references_single ( struct logical_volume * lv , void * data )
{
struct volume_group * vg = lv - > vg ;
2011-03-30 13:35:51 +00:00
struct validate_hash * vhash = data ;
2010-12-14 17:51:09 +00:00
struct lv_segment * lvseg ;
2011-03-30 13:35:51 +00:00
struct physical_volume * pv ;
2011-04-08 14:40:18 +00:00
unsigned s ;
2010-12-14 17:51:09 +00:00
int r = 1 ;
2024-10-15 15:28:10 +02:00
if ( lv ! = radix_tree_lookup_ptr ( vhash - > lvid , & lv - > lvid . id [ 1 ] ,
2011-03-30 13:35:51 +00:00
sizeof ( lv - > lvid . id [ 1 ] ) ) ) {
2010-12-14 17:51:09 +00:00
log_error ( INTERNAL_ERROR
" Referenced LV %s not listed in VG %s. " ,
lv - > name , vg - > name ) ;
r = 0 ;
}
dm_list_iterate_items ( lvseg , & lv - > segments ) {
for ( s = 0 ; s < lvseg - > area_count ; + + s ) {
2011-03-30 13:35:51 +00:00
if ( seg_type ( lvseg , s ) ! = AREA_PV )
continue ;
pv = seg_pv ( lvseg , s ) ;
/* look up the reference in vg->pvs */
2024-10-15 15:28:10 +02:00
if ( pv ! = radix_tree_lookup_ptr ( vhash - > pvid , & pv - > id ,
2011-03-30 13:35:51 +00:00
sizeof ( pv - > id ) ) ) {
log_error ( INTERNAL_ERROR
" Referenced PV %s not listed in VG %s. " ,
pv_dev_name ( pv ) , vg - > name ) ;
r = 0 ;
2010-12-14 17:51:09 +00:00
}
}
}
return r ;
}
2015-07-09 13:24:28 -05:00
/*
* Format is < version > : < info >
*/
static int _validate_lock_args_chars ( const char * lock_args )
{
2016-02-23 12:18:48 +01:00
unsigned i ;
2015-07-09 13:24:28 -05:00
char c ;
int found_colon = 0 ;
int r = 1 ;
for ( i = 0 ; i < strlen ( lock_args ) ; i + + ) {
c = lock_args [ i ] ;
if ( ! isalnum ( c ) & & c ! = ' . ' & & c ! = ' _ ' & & c ! = ' - ' & & c ! = ' + ' & & c ! = ' : ' ) {
2016-02-23 12:18:48 +01:00
log_error ( INTERNAL_ERROR " Invalid character at index %u of lock_args \" %s \" " ,
2015-07-09 13:24:28 -05:00
i , lock_args ) ;
r = 0 ;
}
if ( c = = ' : ' & & found_colon ) {
2016-02-23 12:18:48 +01:00
log_error ( INTERNAL_ERROR " Invalid colon at index %u of lock_args \" %s \" " ,
2015-07-09 13:24:28 -05:00
i , lock_args ) ;
r = 0 ;
}
if ( c = = ' : ' )
found_colon = 1 ;
}
return r ;
}
static int _validate_vg_lock_args ( struct volume_group * vg )
{
2022-02-08 19:17:30 +01:00
if ( ! vg - > lock_args | | ! _validate_lock_args_chars ( vg - > lock_args ) ) {
2015-07-09 13:24:28 -05:00
log_error ( INTERNAL_ERROR " VG %s has invalid lock_args chars " , vg - > name ) ;
return 0 ;
}
return 1 ;
}
/*
* For lock_type sanlock , LV lock_args are < version > : < info >
* For lock_type dlm , LV lock_args are not used , and lock_args is
* just set to " dlm " .
*/
static int _validate_lv_lock_args ( struct logical_volume * lv )
{
int r = 1 ;
if ( ! strcmp ( lv - > vg - > lock_type , " sanlock " ) ) {
if ( ! _validate_lock_args_chars ( lv - > lock_args ) ) {
log_error ( INTERNAL_ERROR " LV %s/%s has invalid lock_args chars " ,
lv - > vg - > name , display_lvname ( lv ) ) ;
return 0 ;
}
} else if ( ! strcmp ( lv - > vg - > lock_type , " dlm " ) ) {
if ( strcmp ( lv - > lock_args , " dlm " ) ) {
log_error ( INTERNAL_ERROR " LV %s/%s has invalid lock_args \" %s \" " ,
lv - > vg - > name , display_lvname ( lv ) , lv - > lock_args ) ;
r = 0 ;
}
2021-05-07 10:25:14 +08:00
} else if ( ! strcmp ( lv - > vg - > lock_type , " idm " ) ) {
if ( strcmp ( lv - > lock_args , " idm " ) ) {
log_error ( INTERNAL_ERROR " LV %s/%s has invalid lock_args \" %s \" " ,
lv - > vg - > name , display_lvname ( lv ) , lv - > lock_args ) ;
r = 0 ;
}
2015-07-09 13:24:28 -05:00
}
return r ;
}
2005-07-12 19:40:59 +00:00
int vg_validate ( struct volume_group * vg )
2002-04-24 18:20:51 +00:00
{
2011-03-10 13:11:59 +00:00
struct pv_list * pvl ;
struct lv_list * lvl ;
2016-03-01 15:32:01 +01:00
struct glv_list * glvl ;
struct historical_logical_volume * hlv ;
2010-12-14 17:07:35 +00:00
struct lv_segment * seg ;
2014-05-29 09:41:03 +02:00
struct dm_str_list * sl ;
2010-07-09 15:34:40 +00:00
char uuid [ 64 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2014-07-03 19:06:04 +01:00
char uuid2 [ 64 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2024-10-24 15:27:31 +02:00
int r = 1 , rt ;
2013-07-05 17:10:11 +02:00
unsigned hidden_lv_count = 0 , lv_count = 0 , lv_visible_count = 0 ;
unsigned pv_count = 0 ;
unsigned num_snapshots = 0 ;
2013-07-05 17:10:47 +02:00
unsigned spare_count = 0 ;
2014-03-31 11:51:50 +02:00
size_t vg_name_len = strlen ( vg - > name ) ;
size_t dev_name_len ;
2011-03-30 13:35:51 +00:00
struct validate_hash vhash = { NULL } ;
2006-08-09 19:33:25 +00:00
2010-11-09 12:34:40 +00:00
if ( vg - > alloc = = ALLOC_CLING_BY_TAGS ) {
log_error ( INTERNAL_ERROR " VG %s allocation policy set to invalid cling_by_tags. " ,
vg - > name ) ;
r = 0 ;
}
2015-03-09 18:56:24 +00:00
if ( vg - > status & LVM_WRITE_LOCKED ) {
log_error ( INTERNAL_ERROR " VG %s has external flag LVM_WRITE_LOCKED set internally. " ,
vg - > name ) ;
r = 0 ;
}
2006-10-05 22:02:52 +00:00
/* FIXME Also check there's no data/metadata overlap */
2024-10-15 15:28:10 +02:00
if ( ! ( vhash . pvid = radix_tree_create ( NULL , NULL ) ) ) {
2011-03-10 13:11:59 +00:00
log_error ( " Failed to allocate pvid hash. " ) ;
return 0 ;
}
2012-02-23 00:11:01 +00:00
dm_list_iterate_items ( sl , & vg - > tags )
if ( ! validate_tag ( sl - > str ) ) {
log_error ( INTERNAL_ERROR " VG %s tag %s has invalid form. " ,
vg - > name , sl - > str ) ;
r = 0 ;
}
2010-04-01 11:43:24 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs ) {
if ( + + pv_count > vg - > pv_count ) {
log_error ( INTERNAL_ERROR " PV list corruption detected in VG %s. " , vg - > name ) ;
/* FIXME Dump list structure? */
r = 0 ;
}
2011-03-10 13:11:59 +00:00
2010-04-13 17:26:36 +00:00
if ( pvl - > pv - > vg ! = vg ) {
log_error ( INTERNAL_ERROR " VG %s PV list entry points "
2011-03-10 13:11:59 +00:00
" to different VG %s. " , vg - > name ,
2010-04-13 17:26:36 +00:00
pvl - > pv - > vg ? pvl - > pv - > vg - > name : " NULL " ) ;
r = 0 ;
}
2007-03-23 12:43:17 +00:00
if ( strcmp ( pvl - > pv - > vg_name , vg - > name ) ) {
2010-01-07 14:29:53 +00:00
log_error ( INTERNAL_ERROR " VG name for PV %s is corrupted. " ,
2007-10-12 14:29:32 +00:00
pv_dev_name ( pvl - > pv ) ) ;
2007-03-23 12:43:17 +00:00
r = 0 ;
}
2011-03-10 13:11:59 +00:00
2024-10-24 15:27:31 +02:00
if ( 1 ! = ( rt = radix_tree_uniq_insert_ptr ( vhash . pvid , & pvl - > pv - > id ,
sizeof ( pvl - > pv - > id ) , pvl - > pv ) ) ) {
r = 0 ;
if ( ! rt ) {
log_error ( " Failed to store pvid. " ) ;
goto out ;
}
2011-03-10 13:11:59 +00:00
if ( ! id_write_format ( & pvl - > pv - > id , uuid ,
sizeof ( uuid ) ) )
stack ;
log_error ( INTERNAL_ERROR " Duplicate PV id "
" %s detected for %s in %s. " ,
uuid , pv_dev_name ( pvl - > pv ) ,
vg - > name ) ;
}
2012-02-23 00:11:01 +00:00
dm_list_iterate_items ( sl , & pvl - > pv - > tags )
if ( ! validate_tag ( sl - > str ) ) {
log_error ( INTERNAL_ERROR " PV %s tag %s has invalid form. " ,
pv_dev_name ( pvl - > pv ) , sl - > str ) ;
r = 0 ;
}
2006-08-09 19:33:25 +00:00
}
2002-04-24 18:20:51 +00:00
2011-03-10 13:11:59 +00:00
2005-05-03 17:28:23 +00:00
if ( ! check_pv_segments ( vg ) ) {
2009-12-16 19:22:11 +00:00
log_error ( INTERNAL_ERROR " PV segments corrupted in %s. " ,
2005-05-03 17:28:23 +00:00
vg - > name ) ;
2006-08-09 19:33:25 +00:00
r = 0 ;
}
2015-03-23 13:32:00 +01:00
dm_list_iterate_items ( lvl , & vg - > removed_lvs ) {
if ( ! ( lvl - > lv - > status & LV_REMOVED ) ) {
log_error ( INTERNAL_ERROR " LV %s is not marked as removed while it's part "
" of removed LV list for VG %s " , lvl - > lv - > name , vg - > name ) ;
r = 0 ;
}
}
2009-05-13 21:27:43 +00:00
/*
* Count all non - snapshot invisible LVs
*/
dm_list_iterate_items ( lvl , & vg - > lvs ) {
2010-04-01 11:43:24 +00:00
lv_count + + ;
2015-03-23 13:32:00 +01:00
if ( lvl - > lv - > status & LV_REMOVED ) {
log_error ( INTERNAL_ERROR " LV %s is marked as removed while it's "
" still part of the VG %s " , lvl - > lv - > name , vg - > name ) ;
r = 0 ;
}
2015-03-09 18:56:24 +00:00
if ( lvl - > lv - > status & LVM_WRITE_LOCKED ) {
log_error ( INTERNAL_ERROR " LV %s has external flag LVM_WRITE_LOCKED set internally. " ,
2015-03-17 17:48:56 +00:00
lvl - > lv - > name ) ;
2015-03-09 18:56:24 +00:00
r = 0 ;
}
2014-03-31 11:51:50 +02:00
dev_name_len = strlen ( lvl - > lv - > name ) + vg_name_len + 3 ;
if ( dev_name_len > = NAME_LEN ) {
log_error ( INTERNAL_ERROR " LV name \" %s/%s \" length % "
PRIsize_t " is not supported. " ,
vg - > name , lvl - > lv - > name , dev_name_len ) ;
r = 0 ;
}
2014-07-03 19:06:04 +01:00
if ( ! id_equal ( & lvl - > lv - > lvid . id [ 0 ] , & lvl - > lv - > vg - > id ) ) {
if ( ! id_write_format ( & lvl - > lv - > lvid . id [ 0 ] , uuid ,
sizeof ( uuid ) ) )
stack ;
if ( ! id_write_format ( & lvl - > lv - > vg - > id , uuid2 ,
sizeof ( uuid2 ) ) )
stack ;
log_error ( INTERNAL_ERROR " LV %s has VG UUID %s but its VG %s has UUID %s " ,
lvl - > lv - > name , uuid , lvl - > lv - > vg - > name , uuid2 ) ;
r = 0 ;
}
2014-03-31 11:51:50 +02:00
2014-09-21 11:34:50 +02:00
if ( lv_is_pool_metadata_spare ( lvl - > lv ) ) {
if ( + + spare_count > 1 ) {
2014-11-11 14:13:00 +00:00
log_error ( INTERNAL_ERROR " LV %s is extra pool metadata spare volume. %u found but only 1 allowed. " ,
2014-09-21 11:34:50 +02:00
lvl - > lv - > name , spare_count ) ;
r = 0 ;
}
if ( vg - > pool_metadata_spare_lv ! = lvl - > lv ) {
2014-11-11 14:13:00 +00:00
log_error ( INTERNAL_ERROR " LV %s is not the VG's pool metadata spare volume. " ,
2014-09-21 11:34:50 +02:00
lvl - > lv - > name ) ;
r = 0 ;
}
}
2024-10-19 00:05:45 +02:00
if ( ! check_lv_segments_incomplete_vg ( lvl - > lv ) ) {
2010-04-01 13:08:06 +00:00
log_error ( INTERNAL_ERROR " LV segments corrupted in %s. " ,
lvl - > lv - > name ) ;
r = 0 ;
}
2010-11-09 12:34:40 +00:00
if ( lvl - > lv - > alloc = = ALLOC_CLING_BY_TAGS ) {
log_error ( INTERNAL_ERROR " LV %s allocation policy set to invalid cling_by_tags. " ,
lvl - > lv - > name ) ;
r = 0 ;
}
2012-02-23 00:11:01 +00:00
if ( ! validate_name ( lvl - > lv - > name ) ) {
log_error ( INTERNAL_ERROR " LV name %s has invalid form. " , lvl - > lv - > name ) ;
r = 0 ;
}
dm_list_iterate_items ( sl , & lvl - > lv - > tags )
if ( ! validate_tag ( sl - > str ) ) {
log_error ( INTERNAL_ERROR " LV %s tag %s has invalid form. " ,
lvl - > lv - > name , sl - > str ) ;
r = 0 ;
}
2021-03-14 11:44:54 +01:00
if ( lv_is_visible ( lvl - > lv ) )
lv_visible_count + + ;
else if ( lv_is_cow ( lvl - > lv ) )
num_snapshots + + ;
else /* count other non-snapshot invisible volumes */
hidden_lv_count + + ;
2009-05-13 21:27:43 +00:00
/*
* FIXME : add check for unreferenced invisible LVs
* - snapshot cow & origin
* - mirror log & images
* - mirror conversion volumes ( _mimagetmp * )
*/
}
/*
* all volumes = visible LVs + snapshot_cows + invisible LVs
*/
2010-04-01 11:43:24 +00:00
if ( lv_count ! = lv_visible_count + num_snapshots + hidden_lv_count ) {
2013-07-05 17:10:11 +02:00
log_error ( INTERNAL_ERROR " #LVs (%u) != #visible LVs (%u) "
" + #snapshots (%u) + #internal LVs (%u) in VG %s " ,
lv_count , lv_visible_count , num_snapshots ,
hidden_lv_count , vg - > name ) ;
2008-06-06 19:28:35 +00:00
r = 0 ;
2008-04-22 12:54:33 +00:00
}
2010-04-01 13:08:06 +00:00
/* Avoid endless loop if lv->segments list is corrupt */
if ( ! r )
2011-03-30 13:35:51 +00:00
goto out ;
2010-04-01 13:08:06 +00:00
2024-10-15 15:28:10 +02:00
if ( ! ( vhash . lvname = radix_tree_create ( NULL , NULL ) ) ) {
2011-03-10 13:11:59 +00:00
log_error ( " Failed to allocate lv_name hash " ) ;
2011-03-30 13:35:51 +00:00
r = 0 ;
goto out ;
2011-03-10 13:11:59 +00:00
}
2024-10-15 15:28:10 +02:00
if ( ! ( vhash . lvid = radix_tree_create ( NULL , NULL ) ) ) {
2011-03-10 13:11:59 +00:00
log_error ( " Failed to allocate uuid hash " ) ;
2011-03-30 13:35:51 +00:00
r = 0 ;
goto out ;
2011-03-10 13:11:59 +00:00
}
2024-10-24 15:27:31 +02:00
/* For best CPU cache utilization do a separate pass for lvname and lvid */
dm_list_iterate_items ( lvl , & vg - > lvs )
if ( 1 ! = ( rt = radix_tree_uniq_insert_ptr ( vhash . lvname , lvl - > lv - > name ,
strlen ( lvl - > lv - > name ) , lvl ) ) ) {
r = 0 ;
if ( ! rt ) {
log_error ( " Failed to store lvname. " ) ;
goto out ;
}
2011-03-10 13:11:59 +00:00
log_error ( INTERNAL_ERROR
" Duplicate LV name %s detected in %s. " ,
lvl - > lv - > name , vg - > name ) ;
}
2024-10-24 15:27:31 +02:00
dm_list_iterate_items ( lvl , & vg - > lvs )
if ( 1 ! = ( rt = radix_tree_uniq_insert_ptr ( vhash . lvid , & lvl - > lv - > lvid . id [ 1 ] ,
sizeof ( lvl - > lv - > lvid . id [ 1 ] ) , lvl - > lv ) ) ) {
r = 0 ;
if ( ! rt ) {
log_error ( " Failed to store lvid. " ) ;
goto out ;
}
2011-03-10 13:11:59 +00:00
if ( ! id_write_format ( & lvl - > lv - > lvid . id [ 1 ] , uuid ,
sizeof ( uuid ) ) )
stack ;
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " Duplicate LV id %s detected for %s in %s. " ,
2011-03-10 13:11:59 +00:00
uuid , lvl - > lv - > name , vg - > name ) ;
2006-08-09 19:33:25 +00:00
}
2005-05-03 17:28:23 +00:00
2024-10-24 15:27:31 +02:00
dm_list_iterate_items ( lvl , & vg - > lvs )
2024-10-19 00:05:45 +02:00
if ( ! check_lv_segments_complete_vg ( lvl - > lv ) ) {
2009-12-16 19:22:11 +00:00
log_error ( INTERNAL_ERROR " LV segments corrupted in %s. " ,
2005-06-01 16:51:55 +00:00
lvl - > lv - > name ) ;
2006-08-09 19:33:25 +00:00
r = 0 ;
2005-06-01 16:51:55 +00:00
}
2011-03-10 13:11:59 +00:00
2011-03-30 13:35:51 +00:00
if ( ! _lv_postorder_vg ( vg , _lv_validate_references_single , & vhash ) ) {
2011-03-10 14:40:32 +00:00
stack ;
r = 0 ;
2010-12-14 17:51:09 +00:00
}
dm_list_iterate_items ( lvl , & vg - > lvs ) {
2014-09-15 21:33:53 +01:00
if ( ! lv_is_pvmove ( lvl - > lv ) )
2010-12-14 17:07:35 +00:00
continue ;
dm_list_iterate_items ( seg , & lvl - > lv - > segments ) {
if ( seg_is_mirrored ( seg ) ) {
if ( seg - > area_count ! = 2 ) {
log_error ( INTERNAL_ERROR
2011-03-10 13:11:59 +00:00
" Segment in %s is not 2-way. " ,
lvl - > lv - > name ) ;
2010-12-14 17:07:35 +00:00
r = 0 ;
}
} else if ( seg - > area_count ! = 1 ) {
log_error ( INTERNAL_ERROR
2011-03-10 13:11:59 +00:00
" Segment in %s has wrong number of areas: %d. " ,
lvl - > lv - > name , seg - > area_count ) ;
2010-12-14 17:07:35 +00:00
r = 0 ;
}
}
}
2008-08-29 13:41:21 +00:00
if ( ! ( vg - > fid - > fmt - > features & FMT_UNLIMITED_VOLS ) & &
( ! vg - > max_lv | | ! vg - > max_pv ) ) {
2009-12-16 19:22:11 +00:00
log_error ( INTERNAL_ERROR " Volume group %s has limited PV/LV count "
2008-08-29 13:41:21 +00:00
" but limit is not set. " , vg - > name ) ;
r = 0 ;
}
2013-07-05 17:10:47 +02:00
if ( vg - > pool_metadata_spare_lv & &
! lv_is_pool_metadata_spare ( vg - > pool_metadata_spare_lv ) ) {
log_error ( INTERNAL_ERROR " VG references non pool metadata spare LV %s. " ,
vg - > pool_metadata_spare_lv - > name ) ;
r = 0 ;
}
2009-05-13 21:29:10 +00:00
if ( vg_max_lv_reached ( vg ) )
stack ;
2015-03-05 14:00:44 -06:00
2024-10-15 15:28:10 +02:00
if ( ! ( vhash . lv_lock_args = radix_tree_create ( NULL , NULL ) ) ) {
2015-03-05 14:00:44 -06:00
log_error ( " Failed to allocate lv_lock_args hash " ) ;
r = 0 ;
goto out ;
}
2018-06-01 10:04:54 -05:00
if ( vg_is_shared ( vg ) ) {
2015-03-05 14:00:44 -06:00
if ( ! vg - > lock_args ) {
log_error ( INTERNAL_ERROR " VG %s with lock_type %s without lock_args " ,
vg - > name , vg - > lock_type ) ;
r = 0 ;
}
if ( vg_is_clustered ( vg ) ) {
log_error ( INTERNAL_ERROR " VG %s with lock_type %s is clustered " ,
vg - > name , vg - > lock_type ) ;
r = 0 ;
}
if ( vg - > system_id & & vg - > system_id [ 0 ] ) {
log_error ( INTERNAL_ERROR " VG %s with lock_type %s has system_id %s " ,
vg - > name , vg - > lock_type , vg - > system_id ) ;
r = 0 ;
}
2021-05-07 10:25:14 +08:00
if ( strcmp ( vg - > lock_type , " sanlock " ) & & strcmp ( vg - > lock_type , " dlm " ) & &
strcmp ( vg - > lock_type , " idm " ) ) {
2015-03-05 14:00:44 -06:00
log_error ( INTERNAL_ERROR " VG %s has unknown lock_type %s " ,
vg - > name , vg - > lock_type ) ;
r = 0 ;
}
2015-07-09 13:24:28 -05:00
2015-07-10 11:41:29 -05:00
if ( ! _validate_vg_lock_args ( vg ) )
2015-07-09 13:24:28 -05:00
r = 0 ;
2015-03-05 14:00:44 -06:00
} else {
if ( vg - > lock_args ) {
log_error ( INTERNAL_ERROR " VG %s has lock_args %s without lock_type " ,
vg - > name , vg - > lock_args ) ;
r = 0 ;
}
}
dm_list_iterate_items ( lvl , & vg - > lvs ) {
2018-06-01 10:04:54 -05:00
if ( vg_is_shared ( vg ) ) {
2015-03-05 14:00:44 -06:00
if ( lockd_lv_uses_lock ( lvl - > lv ) ) {
2015-07-09 13:24:28 -05:00
if ( vg - > skip_validate_lock_args )
2015-03-05 14:00:44 -06:00
continue ;
2015-07-09 13:24:28 -05:00
2015-07-10 11:41:29 -05:00
/*
* FIXME : make missing lock_args an error .
* There are at least two cases where this
* check doesn ' t work correctly :
*
* 1. When creating a cow snapshot ,
* ( lvcreate - s - L1M - n snap1 vg / lv1 ) ,
* lockd_lv_uses_lock ( ) uses lv_is_cow ( )
* which depends on lv - > snapshot being
* set , but it ' s not set at this point ,
* so lockd_lv_uses_lock ( ) cannot identify
* the LV as a cow_lv , and thinks it needs
* a lock when it doesn ' t . To fix this we
* probably need to validate by finding the
* origin LV , then finding all its snapshots
* which will have no lock_args .
*
* 2. When converting an LV to a thin pool
* without using an existing metadata LV ,
* ( lvconvert - - type thin - pool vg / poolX ) ,
* there is an intermediate LV created ,
* probably for the metadata LV , and
* validate is called on the VG in this
* intermediate state , which finds the
* newly created LV which is not yet
* identified as a metadata LV , and
* does not have any lock_args . To fix
* this we might be able to find the place
* where the intermediate LV is created ,
* and set new variable on it like for vgs ,
* lv - > skip_validate_lock_args .
*/
2015-07-09 13:24:28 -05:00
if ( ! lvl - > lv - > lock_args ) {
2015-07-10 11:41:29 -05:00
/*
log_verbose ( " LV %s/%s missing lock_args " ,
vg - > name , lvl - > lv - > name ) ;
2015-03-05 14:00:44 -06:00
r = 0 ;
2015-07-10 11:41:29 -05:00
*/
2015-07-09 13:24:28 -05:00
continue ;
}
if ( ! _validate_lv_lock_args ( lvl - > lv ) ) {
r = 0 ;
continue ;
}
if ( ! strcmp ( vg - > lock_type , " sanlock " ) ) {
2024-10-15 15:28:10 +02:00
if ( radix_tree_lookup_ptr ( vhash . lv_lock_args , lvl - > lv - > lock_args ,
strlen ( lvl - > lv - > lock_args ) ) ) {
2023-07-12 13:50:21 +02:00
log_error ( INTERNAL_ERROR " LV %s has duplicate lock_args %s. " ,
display_lvname ( lvl - > lv ) , lvl - > lv - > lock_args ) ;
2015-03-05 14:00:44 -06:00
r = 0 ;
}
2024-10-15 15:28:10 +02:00
if ( ! radix_tree_insert_ptr ( vhash . lv_lock_args , lvl - > lv - > lock_args ,
strlen ( lvl - > lv - > lock_args ) , lvl ) ) {
2015-03-05 14:00:44 -06:00
log_error ( " Failed to hash lvname. " ) ;
r = 0 ;
}
}
} else {
2019-10-24 17:09:00 -05:00
if ( lv_is_cache_vol ( lvl - > lv ) ) {
log_debug ( " lock_args will be ignored on cache vol " ) ;
} else if ( lvl - > lv - > lock_args ) {
2023-07-12 13:50:21 +02:00
log_error ( INTERNAL_ERROR " LV %s shouldn't have lock_args %s. " ,
display_lvname ( lvl - > lv ) , lvl - > lv - > lock_args ) ;
2015-03-05 14:00:44 -06:00
r = 0 ;
}
}
} else {
if ( lvl - > lv - > lock_args ) {
2023-07-12 13:50:21 +02:00
log_error ( INTERNAL_ERROR " LV %s with no lock_type has lock_args %s. " ,
display_lvname ( lvl - > lv ) , lvl - > lv - > lock_args ) ;
2015-03-05 14:00:44 -06:00
r = 0 ;
}
}
}
2024-10-15 15:28:10 +02:00
if ( ! ( vhash . historical_lvname = radix_tree_create ( NULL , NULL ) ) ) {
2016-03-01 15:32:01 +01:00
r = 0 ;
2021-03-09 11:42:29 +01:00
goto_out ;
2016-03-01 15:32:01 +01:00
}
2024-10-24 15:27:31 +02:00
if ( ! ( vhash . historical_lvid = radix_tree_create ( NULL , NULL ) ) ) {
r = 0 ;
goto_out ;
}
2016-03-01 15:32:01 +01:00
dm_list_iterate_items ( glvl , & vg - > historical_lvs ) {
if ( ! glvl - > glv - > is_historical ) {
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " LV %s/%s appearing in VG's historical list is not a historical LV. " ,
2016-03-01 15:32:01 +01:00
vg - > name , glvl - > glv - > live - > name ) ;
r = 0 ;
continue ;
}
hlv = glvl - > glv - > historical ;
if ( hlv - > vg ! = vg ) {
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " Historical LV %s points to different VG %s while it is listed in VG %s. " ,
2016-03-01 15:32:01 +01:00
hlv - > name , hlv - > vg - > name , vg - > name ) ;
r = 0 ;
continue ;
}
if ( ! id_equal ( & hlv - > lvid . id [ 0 ] , & hlv - > vg - > id ) ) {
if ( ! id_write_format ( & hlv - > lvid . id [ 0 ] , uuid , sizeof ( uuid ) ) )
stack ;
if ( ! id_write_format ( & hlv - > vg - > id , uuid2 , sizeof ( uuid2 ) ) )
stack ;
log_error ( INTERNAL_ERROR " Historical LV %s has VG UUID %s but its VG %s has UUID %s " ,
hlv - > name , uuid , hlv - > vg - > name , uuid2 ) ;
r = 0 ;
2024-10-24 15:27:31 +02:00
}
if ( 1 ! = ( rt = radix_tree_uniq_insert_ptr ( vhash . historical_lvname , hlv - > name ,
strlen ( hlv - > name ) , hlv ) ) ) {
r = 0 ;
if ( ! rt ) {
log_error ( " Failed to store historical LV name. " ) ;
goto out ;
}
log_error ( INTERNAL_ERROR " Duplicate historical LV name %s detected in %s. " ,
hlv - > name , vg - > name ) ;
}
2016-03-01 15:32:01 +01:00
2024-10-24 15:27:31 +02:00
if ( 1 ! = ( rt = radix_tree_uniq_insert_ptr ( vhash . historical_lvid , & hlv - > lvid . id [ 1 ] ,
sizeof ( hlv - > lvid . id [ 1 ] ) , hlv ) ) ) {
r = 0 ;
if ( ! rt ) {
log_error ( " Failed to store historical LV id. " ) ;
goto out ;
}
2016-03-01 15:32:01 +01:00
if ( ! id_write_format ( & hlv - > lvid . id [ 1 ] , uuid , sizeof ( uuid ) ) )
stack ;
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " Duplicate historical LV id %s detected for %s in %s. " ,
2016-03-01 15:32:01 +01:00
uuid , hlv - > name , vg - > name ) ;
}
2024-10-15 15:28:10 +02:00
if ( radix_tree_lookup_ptr ( vhash . lvname , hlv - > name , strlen ( hlv - > name ) ) ) {
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " Name %s appears as live and historical LV at the same time in VG %s. " ,
2016-03-01 15:32:01 +01:00
hlv - > name , vg - > name ) ;
r = 0 ;
}
2021-10-01 14:30:49 +02:00
if ( ! hlv - > indirect_origin & & dm_list_empty ( & hlv - > indirect_glvs ) ) {
2024-10-24 15:27:31 +02:00
log_error ( INTERNAL_ERROR " Historical LV %s is not part of any LV chain in VG %s. " ,
hlv - > name , vg - > name ) ;
2016-03-01 15:32:01 +01:00
r = 0 ;
}
}
2011-03-30 13:35:51 +00:00
out :
if ( vhash . lvid )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . lvid ) ;
2011-03-30 13:35:51 +00:00
if ( vhash . lvname )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . lvname ) ;
2016-03-01 15:32:01 +01:00
if ( vhash . historical_lvid )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . historical_lvid ) ;
2016-03-01 15:32:01 +01:00
if ( vhash . historical_lvname )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . historical_lvname ) ;
2011-03-30 13:35:51 +00:00
if ( vhash . pvid )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . pvid ) ;
2015-03-05 14:00:44 -06:00
if ( vhash . lv_lock_args )
2024-10-15 15:28:10 +02:00
radix_tree_destroy ( vhash . lv_lock_args ) ;
2009-03-16 14:34:57 +00:00
2006-08-09 19:33:25 +00:00
return r ;
2005-07-12 19:40:59 +00:00
}
2016-02-16 13:02:00 -06:00
static int _pv_in_pv_list ( struct physical_volume * pv , struct dm_list * head )
2016-02-12 14:22:02 +01:00
{
2016-02-16 13:02:00 -06:00
struct pv_list * pvl ;
2016-02-12 14:22:02 +01:00
2016-02-16 13:02:00 -06:00
dm_list_iterate_items ( pvl , head ) {
if ( pvl - > pv = = pv )
2016-02-12 14:22:02 +01:00
return 1 ;
}
return 0 ;
}
2016-03-02 12:19:07 +01:00
static int _check_historical_lv_is_valid ( struct historical_logical_volume * hlv )
2016-03-01 15:20:09 +01:00
{
struct glv_list * glvl ;
2016-03-02 12:19:07 +01:00
if ( hlv - > checked )
return hlv - > valid ;
/*
* Historical LV is valid if there is
* at least one live LV among ancestors .
*/
hlv - > valid = 0 ;
dm_list_iterate_items ( glvl , & hlv - > indirect_glvs ) {
if ( ! glvl - > glv - > is_historical | |
_check_historical_lv_is_valid ( glvl - > glv - > historical ) ) {
hlv - > valid = 1 ;
break ;
}
}
hlv - > checked = 1 ;
return hlv - > valid ;
}
static int _handle_historical_lvs ( struct volume_group * vg )
{
struct glv_list * glvl , * tglvl ;
2016-03-01 15:20:09 +01:00
time_t current_timestamp = 0 ;
struct historical_logical_volume * hlv ;
2016-03-02 12:19:07 +01:00
int valid = 1 ;
dm_list_iterate_items ( glvl , & vg - > historical_lvs )
glvl - > glv - > historical - > checked = 0 ;
2016-03-01 15:20:09 +01:00
dm_list_iterate_items ( glvl , & vg - > historical_lvs ) {
hlv = glvl - > glv - > historical ;
2016-03-02 12:19:07 +01:00
valid & = _check_historical_lv_is_valid ( hlv ) ;
2016-03-01 15:20:09 +01:00
if ( ! hlv - > timestamp_removed ) {
if ( ! current_timestamp )
current_timestamp = time ( NULL ) ;
hlv - > timestamp_removed = ( uint64_t ) current_timestamp ;
}
}
2016-03-02 12:19:07 +01:00
if ( valid )
return 1 ;
dm_list_iterate_items_safe ( glvl , tglvl , & vg - > historical_lvs ) {
hlv = glvl - > glv - > historical ;
if ( hlv - > checked & & hlv - > valid )
continue ;
log_print_unless_silent ( " Automatically removing historical "
" logical volume %s/%s%s. " ,
vg - > name , HISTORICAL_LV_PREFIX , hlv - > name ) ;
if ( ! historical_glv_remove ( glvl - > glv ) )
return_0 ;
}
2016-03-01 15:20:09 +01:00
return 1 ;
}
2019-02-06 12:32:26 -06:00
static void _wipe_outdated_pvs ( struct cmd_context * cmd , struct volume_group * vg )
{
2021-10-01 14:25:59 +02:00
char vgid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
DM_LIST_INIT ( devs ) ;
2019-02-06 12:32:26 -06:00
struct dm_list * mdas = NULL ;
struct device_list * devl ;
struct device * dev ;
struct metadata_area * mda ;
struct label * label ;
struct lvmcache_info * info ;
uint32_t ext_flags ;
/*
* When vg_read selected a good copy of the metadata , it used it to
* update the lvmcache representation of the VG ( lvmcache_update_vg ) .
* At that point outdated PVs were recognized and moved into the
* vginfo - > outdated_infos list . Here we clear the PVs on that list .
*/
2021-10-01 14:25:59 +02:00
vgid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( vgid , & vg - > id . uuid , ID_LEN ) ;
lvmcache_get_outdated_devs ( cmd , vg - > name , vgid , & devs ) ;
2019-02-06 12:32:26 -06:00
dm_list_iterate_items ( devl , & devs ) {
dev = devl - > dev ;
2021-08-03 15:32:33 -05:00
lvmcache_get_outdated_mdas ( cmd , vg - > name , vgid , dev , & mdas ) ;
2019-02-06 12:32:26 -06:00
if ( mdas ) {
dm_list_iterate_items ( mda , mdas ) {
log_warn ( " WARNING: wiping mda on outdated PV %s " , dev_name ( dev ) ) ;
if ( ! text_wipe_outdated_pv_mda ( cmd , dev , mda ) )
log_warn ( " WARNING: failed to wipe mda on outdated PV %s " , dev_name ( dev ) ) ;
}
}
if ( ! ( label = lvmcache_get_dev_label ( dev ) ) ) {
log_error ( " _wipe_outdated_pvs no label for %s " , dev_name ( dev ) ) ;
continue ;
}
info = label - > info ;
ext_flags = lvmcache_ext_flags ( info ) ;
ext_flags & = ~ PV_EXT_USED ;
lvmcache_set_ext_version ( info , PV_HEADER_EXTENSION_VSN ) ;
lvmcache_set_ext_flags ( info , ext_flags ) ;
log_warn ( " WARNING: wiping header on outdated PV %s " , dev_name ( dev ) ) ;
if ( ! label_write ( dev , label ) )
log_warn ( " WARNING: failed to wipe header on outdated PV %s " , dev_name ( dev ) ) ;
lvmcache_del ( info ) ;
}
/*
* A vgremove will involve many vg_write ( ) calls ( one for each lv
* removed ) but we only need to wipe pvs once , so clear the outdated
* list so it won ' t be wiped again .
*/
2021-08-03 15:32:33 -05:00
lvmcache_del_outdated_devs ( cmd , vg - > name , vgid ) ;
2019-02-06 12:32:26 -06:00
}
2005-07-12 19:40:59 +00:00
/*
* After vg_write ( ) returns success ,
* caller MUST call either vg_commit ( ) or vg_revert ( )
*/
int vg_write ( struct volume_group * vg )
{
2021-10-01 14:25:59 +02:00
char vgid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2008-11-03 22:14:30 +00:00
struct dm_list * mdah ;
2019-02-06 13:18:45 -06:00
struct pv_list * pvl , * pvl_safe , * new_pvl ;
2005-07-12 19:40:59 +00:00
struct metadata_area * mda ;
2015-03-05 14:00:44 -06:00
struct lv_list * lvl ;
2019-02-06 13:51:54 -06:00
struct device * mda_dev ;
2014-05-26 14:23:33 +02:00
int revert = 0 , wrote = 0 ;
2005-07-12 19:40:59 +00:00
2021-10-01 14:25:59 +02:00
vgid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( vgid , & vg - > id . uuid , ID_LEN ) ;
2023-08-15 09:53:39 -05:00
log_debug ( " Writing metadata for VG %s. " , vg - > name ) ;
2018-06-01 10:12:04 -05:00
if ( vg_is_shared ( vg ) ) {
dm_list_iterate_items ( lvl , & vg - > lvs ) {
if ( lvl - > lv - > lock_args & & ! strcmp ( lvl - > lv - > lock_args , " pending " ) ) {
if ( ! lockd_init_lv_args ( vg - > cmd , vg , lvl - > lv , vg - > lock_type , & lvl - > lv - > lock_args ) ) {
log_error ( " Cannot allocate lock for new LV. " ) ;
return 0 ;
}
lvl - > lv - > new_lock_args = 1 ;
2015-03-05 14:00:44 -06:00
}
}
}
2016-03-01 15:20:09 +01:00
if ( ! _handle_historical_lvs ( vg ) ) {
log_error ( " Failed to handle historical LVs in VG %s. " , vg - > name ) ;
return 0 ;
}
2008-01-30 13:19:47 +00:00
if ( ! vg_validate ( vg ) )
return_0 ;
2005-07-12 19:40:59 +00:00
2002-04-30 17:12:37 +00:00
if ( vg - > status & PARTIAL_VG ) {
2008-09-19 06:42:00 +00:00
log_error ( " Cannot update partial volume group %s. " , vg - > name ) ;
return 0 ;
}
if ( vg_missing_pv_count ( vg ) & & ! vg - > cmd - > handles_missing_pvs ) {
log_error ( " Cannot update volume group %s while physical "
" volumes are missing. " , vg - > name ) ;
2002-04-30 17:12:37 +00:00
return 0 ;
}
2019-08-01 13:50:04 -05:00
if ( lvmcache_has_duplicate_devs ( ) & & vg_has_duplicate_pvs ( vg ) & &
lvmcache: improve duplicate PV handling
Wait to compare and choose alternate duplicate devices until
after all devices are scanned. During scanning, the first
duplicate dev is kept in lvmcache, and others are kept in a
new list (_found_duplicate_devs).
After all devices are scanned, compare all the duplicates
available for a given PVID and decide which is best.
If the dev used in lvmcache is changed, drop the old dev
from lvmcache entirely and rescan the replacement dev.
Previously the VG metadata from the old dev was kept in
lvmcache and only the dev was replaced.
A new config setting devices/allow_changes_with_duplicate_pvs
can be set to 0 which disallows modifying a VG or activating
LVs in it when the VG contains PVs with duplicate devices.
Set to 1 is the old behavior which allowed the VG to be
changed.
The logic for which of two devs is preferred has changed.
The primary goal is to choose a device that is currently
in use if the other isn't, e.g. by an active LV.
. prefer dev with fs mounted if the other doesn't, else
. prefer dev that is dm if the other isn't, else
. prefer dev in subsystem if the other isn't
If neither device is preferred by these rules, then don't
change devices in lvmcache, leaving the one that was found
first.
The previous logic for preferring a device was:
. prefer dev in subsystem if the other isn't, else
. prefer dev without holders if the other has holders, else
. prefer dev that is dm if the other isn't
2016-02-09 13:06:27 -06:00
! find_config_tree_bool ( vg - > cmd , devices_allow_changes_with_duplicate_pvs_CFG , NULL ) ) {
log_error ( " Cannot update volume group %s with duplicate PV devices. " ,
vg - > name ) ;
return 0 ;
}
2009-10-16 17:41:49 +00:00
if ( vg_has_unknown_segments ( vg ) & & ! vg - > cmd - > handles_unknown_segments ) {
log_error ( " Cannot update volume group %s with unknown segments in it! " ,
vg - > name ) ;
return 0 ;
}
2018-04-27 16:22:46 -05:00
if ( ! _vg_adjust_ignored_mdas ( vg ) )
2010-06-30 13:51:11 +00:00
return_0 ;
2009-10-16 17:41:49 +00:00
2010-06-30 19:28:35 +00:00
if ( ! vg_mda_used_count ( vg ) ) {
2002-11-18 14:04:08 +00:00
log_error ( " Aborting vg_write: No metadata areas to write to! " ) ;
return 0 ;
}
2019-02-06 12:32:26 -06:00
if ( vg - > cmd - > wipe_outdated_pvs )
_wipe_outdated_pvs ( vg - > cmd , vg ) ;
2021-06-08 19:02:07 +02:00
if ( ! vg_is_archived ( vg ) & & vg - > vg_committed & & ! archive ( vg - > vg_committed ) )
return_0 ;
2011-11-18 19:28:00 +00:00
if ( critical_section ( ) )
log_error ( INTERNAL_ERROR
" Writing metadata in critical section. " ) ;
/* Unlock memory if possible */
memlock_unlock ( vg - > cmd ) ;
2002-04-24 18:20:51 +00:00
vg - > seqno + + ;
2019-02-06 13:18:45 -06:00
dm_list_iterate_items ( pvl , & vg - > pvs ) {
int update_pv_header = 0 ;
if ( _pv_in_pv_list ( pvl - > pv , & vg - > pv_write_list ) )
continue ;
if ( ! pvl - > pv - > fmt - > ops - > pv_needs_rewrite ( pvl - > pv - > fmt , pvl - > pv , & update_pv_header ) )
continue ;
if ( ! update_pv_header )
continue ;
if ( ! ( new_pvl = dm_pool_zalloc ( vg - > vgmem , sizeof ( * new_pvl ) ) ) )
continue ;
new_pvl - > pv = pvl - > pv ;
dm_list_add ( & vg - > pv_write_list , & new_pvl - > list ) ;
log_warn ( " WARNING: updating PV header on %s for VG %s. " , pv_dev_name ( pvl - > pv ) , vg - > name ) ;
}
2016-02-16 12:43:24 -06:00
dm_list_iterate_items_safe ( pvl , pvl_safe , & vg - > pv_write_list ) {
if ( ! pv_write ( vg - > cmd , pvl - > pv , 1 ) )
return_0 ;
dm_list_del ( & pvl - > list ) ;
}
2002-04-24 18:20:51 +00:00
/* Write to each copy of the metadata area */
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2017-12-12 17:49:35 +00:00
if ( mda - > status & MDA_FAILED )
continue ;
2019-02-06 13:51:54 -06:00
2024-10-24 23:00:44 +02:00
if ( ! ( mda_dev = mda_get_device ( mda ) ) ) {
log_warn ( " WARNING: mda without device. " ) ;
continue ;
}
2019-02-06 13:51:54 -06:00
/*
* When the scan and vg_read find old metadata in an mda , they
* leave the info struct in lvmcache , and leave the mda in
* info - > mdas . That means we use the mda here to write new
* metadata into . This means that a command writing a VG will
* automatically update old metadata to the latest .
*
* This can also happen if the metadata was ignored on this
* dev , and then it ' s later changed to not ignored , and
* we see the old metadata .
*/
2021-08-03 15:32:33 -05:00
if ( lvmcache_has_old_metadata ( vg - > cmd , vg - > name , vgid , mda_dev ) ) {
2019-02-06 13:51:54 -06:00
log_warn ( " WARNING: updating old metadata to %u on %s for VG %s. " ,
vg - > seqno , dev_name ( mda_dev ) , vg - > name ) ;
}
2004-03-26 21:07:30 +00:00
if ( ! mda - > ops - > vg_write ) {
2021-03-09 11:42:29 +01:00
log_error ( " Format does not support writing volume group metadata areas. " ) ;
2014-05-26 14:23:33 +02:00
revert = 1 ;
break ;
2003-08-26 21:12:06 +00:00
}
2019-02-06 13:51:54 -06:00
2002-11-18 14:04:08 +00:00
if ( ! mda - > ops - > vg_write ( vg - > fid , vg , mda ) ) {
2014-05-26 14:23:33 +02:00
if ( vg - > cmd - > handles_missing_pvs ) {
2015-01-09 14:04:44 +01:00
log_warn ( " WARNING: Failed to write an MDA of VG %s. " , vg - > name ) ;
2014-05-26 14:23:33 +02:00
mda - > status | = MDA_FAILED ;
} else {
stack ;
revert = 1 ;
break ;
}
} else
+ + wrote ;
}
2005-06-01 16:51:55 +00:00
2014-05-26 14:23:33 +02:00
if ( revert | | ! wrote ) {
2015-01-09 14:04:44 +01:00
log_error ( " Failed to write VG %s. " , vg - > name ) ;
2014-05-26 14:23:33 +02:00
dm_list_uniterate ( mdah , & vg - > fid - > metadata_areas_in_use , & mda - > list ) {
mda = dm_list_item ( mdah , struct metadata_area ) ;
2017-12-12 17:49:35 +00:00
if ( mda - > status & MDA_FAILED )
continue ;
2014-05-26 14:23:33 +02:00
if ( mda - > ops - > vg_revert & &
! mda - > ops - > vg_revert ( vg - > fid , vg , mda ) ) {
stack ;
2003-07-04 22:34:56 +00:00
}
2002-04-24 18:20:51 +00:00
}
2014-05-26 14:23:33 +02:00
return 0 ;
2002-04-24 18:20:51 +00:00
}
2005-04-06 18:59:55 +00:00
/* Now pre-commit each copy of the new metadata */
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2014-05-26 14:23:33 +02:00
if ( mda - > status & MDA_FAILED )
continue ;
2005-04-06 18:59:55 +00:00
if ( mda - > ops - > vg_precommit & &
! mda - > ops - > vg_precommit ( vg - > fid , vg , mda ) ) {
stack ;
/* Revert */
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2014-05-26 14:23:33 +02:00
if ( mda - > status & MDA_FAILED )
continue ;
2005-04-06 18:59:55 +00:00
if ( mda - > ops - > vg_revert & &
! mda - > ops - > vg_revert ( vg - > fid , vg , mda ) ) {
stack ;
}
}
return 0 ;
}
}
2015-12-15 16:14:49 -06:00
lockd_vg_update ( vg ) ;
2003-07-04 22:34:56 +00:00
return 1 ;
}
2010-06-28 20:35:33 +00:00
static int _vg_commit_mdas ( struct volume_group * vg )
2003-07-04 22:34:56 +00:00
{
Before committing each mda, arrange mdas so ignored mdas get committed first.
Arrange mdas so mdas that are to be ignored come first. This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.
Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)
After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3. However, if possible we try to avoid
inconsistent on-disk states. Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.
We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)
Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is clearly a consistent state. PV1 will be read but the mda will be
ignored. All other PVs contain v1 metadata, and no auto-correct will be
required. In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:49 +00:00
struct metadata_area * mda , * tmda ;
2021-10-01 14:25:59 +02:00
DM_LIST_INIT ( ignored ) ;
2019-02-05 14:02:24 -06:00
int good = 0 ;
2008-04-04 15:41:20 +00:00
Before committing each mda, arrange mdas so ignored mdas get committed first.
Arrange mdas so mdas that are to be ignored come first. This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.
Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)
After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3. However, if possible we try to avoid
inconsistent on-disk states. Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.
We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)
Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is clearly a consistent state. PV1 will be read but the mda will be
ignored. All other PVs contain v1 metadata, and no auto-correct will be
required. In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:49 +00:00
/* Rearrange the metadata_areas_in_use so ignored mdas come first. */
2010-06-30 17:13:05 +00:00
dm_list_iterate_items_safe ( mda , tmda , & vg - > fid - > metadata_areas_in_use )
Before committing each mda, arrange mdas so ignored mdas get committed first.
Arrange mdas so mdas that are to be ignored come first. This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.
Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)
After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3. However, if possible we try to avoid
inconsistent on-disk states. Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.
We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)
Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is clearly a consistent state. PV1 will be read but the mda will be
ignored. All other PVs contain v1 metadata, and no auto-correct will be
required. In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:49 +00:00
if ( mda_is_ignored ( mda ) )
dm_list_move ( & ignored , & mda - > list ) ;
2010-06-30 17:13:05 +00:00
dm_list_iterate_items_safe ( mda , tmda , & ignored )
Before committing each mda, arrange mdas so ignored mdas get committed first.
Arrange mdas so mdas that are to be ignored come first. This is an
optimization that ensures consistency on disk for the longest period of time.
This was noted by agk in review of the v4 patchset of pvchange-based mda
balance.
Note the following example for an explanation of the background:
Assume the initial state on disk is as follows:
PV0 (v1, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
If we did not sort the list, we would have a commit sequence something like
this:
PV0 (v2, non-ignored)
PV1 (v2, ignored)
PV2 (v2, ignored)
PV3 (v2, non-ignored)
After the commit of PV0's mdas, we'd have an on-disk state like this:
PV0 (v2, non-ignored)
PV1 (v1, non-ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is an inconsistent state of the disk. If the machine fails, the next
time it was brought back up, the auto-correct mechanism in vg_read would
update the metadata on PV1-PV3. However, if possible we try to avoid
inconsistent on-disk states. Clearly, because we did not sort, we have
a greater chance of on-disk inconsistency - from the time the commit of
PV0 is complete until the time PV3 is complete.
We could improve the amount of time the on-disk state is consistent by simply
sorting the commit order as follows:
PV1 (v2, ignored)
PV2 (v2, ignored)
PV0 (v2, non-ignored)
PV3 (v2, non-ignored)
Thus, after the first PV is committed (in this case PV1), on-disk we would
have:
PV0 (v1, non-ignored)
PV1 (v2, ignored)
PV2 (v1, non-ignored)
PV3 (v1, non-ignored)
This is clearly a consistent state. PV1 will be read but the mda will be
ignored. All other PVs contain v1 metadata, and no auto-correct will be
required. In fact, if we commit all PVs with ignored mdas first, we'll
only have an inconsistent state when we start writing non-ignored PVs,
and thus the chances we'll get an inconsistent state on disk is much
less with the sorted method.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2010-06-28 20:35:49 +00:00
dm_list_move ( & vg - > fid - > metadata_areas_in_use , & mda - > list ) ;
2002-04-24 18:20:51 +00:00
/* Commit to each copy of the metadata area */
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2014-05-26 14:23:33 +02:00
if ( mda - > status & MDA_FAILED )
continue ;
2002-11-18 14:04:08 +00:00
if ( mda - > ops - > vg_commit & &
! mda - > ops - > vg_commit ( vg - > fid , vg , mda ) ) {
2002-04-24 18:20:51 +00:00
stack ;
2019-02-05 14:02:24 -06:00
} else
good + + ;
2003-07-04 22:34:56 +00:00
}
2019-02-05 14:02:24 -06:00
if ( good )
return 1 ;
return 0 ;
2010-06-28 20:35:33 +00:00
}
/* Commit pending changes */
int vg_commit ( struct volume_group * vg )
{
2017-10-06 02:12:42 +01:00
struct pv_list * pvl ;
2021-03-05 16:21:50 +01:00
struct dm_str_list * sl ;
2019-02-05 14:02:24 -06:00
int ret ;
2010-06-28 20:35:33 +00:00
2019-02-05 14:02:24 -06:00
ret = _vg_commit_mdas ( vg ) ;
2003-07-04 22:34:56 +00:00
2016-02-22 09:42:03 -06:00
set_vg_notify ( vg - > cmd ) ;
2019-02-05 14:02:24 -06:00
if ( ret ) {
2010-04-14 13:09:16 +00:00
/*
* We need to clear old_name after a successful commit .
* The volume_group structure could be reused later .
*/
vg - > old_name = NULL ;
2017-10-06 02:12:42 +01:00
dm_list_iterate_items ( pvl , & vg - > pvs )
pvl - > pv - > status & = ~ PV_MOVED_VG ;
2013-03-17 16:27:44 +01:00
2024-08-29 23:05:41 +02:00
/* This *is* the original now that it's committed. */
2017-11-14 15:38:55 +00:00
_vg_move_cached_precommitted_to_committed ( vg ) ;
2021-03-05 16:21:50 +01:00
if ( vg - > needs_write_and_commit ) {
/* Print buffered messages that have been finished with this commit. */
dm_list_iterate_items ( sl , & vg - > msg_list )
log_print_unless_silent ( " %s " , sl - > str ) ;
dm_list_init ( & vg - > msg_list ) ;
vg - > needs_write_and_commit = 0 ;
}
2021-06-08 19:39:15 +02:00
}
2010-01-05 16:09:33 +00:00
2003-07-04 22:34:56 +00:00
/* If at least one mda commit succeeded, it was committed */
2019-02-05 14:02:24 -06:00
return ret ;
2003-07-04 22:34:56 +00:00
}
/* Don't commit any pending changes */
2011-09-27 17:09:42 +00:00
void vg_revert ( struct volume_group * vg )
2003-07-04 22:34:56 +00:00
{
struct metadata_area * mda ;
2015-03-05 14:00:44 -06:00
struct lv_list * lvl ;
dm_list_iterate_items ( lvl , & vg - > lvs ) {
if ( lvl - > lv - > new_lock_args ) {
lockd_free_lv ( vg - > cmd , vg , lvl - > lv - > name , & lvl - > lv - > lvid . id [ 1 ] , lvl - > lv - > lock_args ) ;
lvl - > lv - > new_lock_args = 0 ;
}
}
2003-07-04 22:34:56 +00:00
2017-11-14 15:38:55 +00:00
_vg_wipe_cached_precommitted ( vg ) ; /* VG is no longer needed */
2014-02-22 01:44:21 +01:00
2010-06-28 20:32:44 +00:00
dm_list_iterate_items ( mda , & vg - > fid - > metadata_areas_in_use ) {
2003-07-04 22:34:56 +00:00
if ( mda - > ops - > vg_revert & &
! mda - > ops - > vg_revert ( vg - > fid , vg , mda ) ) {
stack ;
2002-04-24 18:20:51 +00:00
}
}
}
2012-02-10 01:28:27 +00:00
struct _vg_read_orphan_baton {
2015-03-11 16:18:42 +01:00
struct cmd_context * cmd ;
2012-02-10 01:28:27 +00:00
struct volume_group * vg ;
2017-11-06 12:09:52 -06:00
const struct format_type * fmt ;
2012-02-10 01:28:27 +00:00
} ;
static int _vg_read_orphan_pv ( struct lvmcache_info * info , void * baton )
{
struct _vg_read_orphan_baton * b = baton ;
struct physical_volume * pv = NULL ;
struct pv_list * pvl ;
2017-06-01 11:10:09 -05:00
uint32_t ext_version ;
uint32_t ext_flags ;
2012-02-10 01:28:27 +00:00
2017-11-06 12:09:52 -06:00
if ( ! ( pv = _pv_read ( b - > cmd , b - > fmt , b - > vg , info ) ) ) {
2012-03-13 21:36:02 +01:00
stack ;
2012-02-10 01:28:27 +00:00
return 1 ;
}
2012-02-23 13:11:07 +00:00
2012-02-10 01:28:27 +00:00
if ( ! ( pvl = dm_pool_zalloc ( b - > vg - > vgmem , sizeof ( * pvl ) ) ) ) {
log_error ( " pv_list allocation failed " ) ;
free_pv_fid ( pv ) ;
return 0 ;
}
pvl - > pv = pv ;
add_pvl_to_vgs ( b - > vg , pvl ) ;
2015-03-11 16:18:42 +01:00
2017-05-26 13:26:09 -05:00
/*
* FIXME : this bit of code that does the auto repair is disabled
* until we can distinguish cases where the repair should not
* happen , i . e . the VG metadata could not be read / parsed .
*
* A PV holding VG metadata that lvm can ' t understand
* ( e . g . damaged , checksum error , unrecognized flag )
* will appear as an in - use orphan , and would be cleared
* by this repair code . Disable this repair until the
* code can keep track of these problematic PVs , and
* distinguish them from actual in - use orphans .
*/
/*
2015-03-11 16:18:42 +01:00
if ( ! _check_or_repair_orphan_pv_ext ( pv , info , baton ) ) {
stack ;
return 0 ;
}
2017-05-26 13:26:09 -05:00
*/
2015-03-11 16:18:42 +01:00
2017-06-01 11:10:09 -05:00
/*
* Nothing to do if PV header extension < 2 :
* - version 0 is PV header without any extensions ,
* - version 1 has bootloader area support only and
* we ' re not checking anything for that one here .
*/
ext_version = lvmcache_ext_version ( info ) ;
ext_flags = lvmcache_ext_flags ( info ) ;
/*
* Warn about a PV that has the in - use flag set , but appears in
* the orphan VG ( no VG was found referencing it . )
* There are a number of conditions that could lead to this :
*
* . The PV was created with no mdas and is used in a VG with
* other PVs ( with metadata ) that have not yet appeared on
* the system . So , no VG metadata is found by lvm which
* references the in - use PV with no mdas .
*
* . vgremove could have failed after clearing mdas but
* before clearing the in - use flag . In this case , the
* in - use flag needs to be manually cleared on the PV .
*
2024-08-29 23:05:41 +02:00
* . The PV may have damaged / unrecognized VG metadata
2017-06-01 11:10:09 -05:00
* that lvm could not read .
*
* . The PV may have no mdas , and the PVs with the metadata
* may have damaged / unrecognized metadata .
*/
if ( ( ext_version > = 2 ) & & ( ext_flags & PV_EXT_USED ) ) {
log_warn ( " WARNING: PV %s is marked in use but no VG was found using it. " , pv_dev_name ( pv ) ) ;
log_warn ( " WARNING: PV %s might need repairing. " , pv_dev_name ( pv ) ) ;
}
2012-02-10 01:28:27 +00:00
return 1 ;
}
2012-03-01 09:46:38 +00:00
/* Make orphan PVs look like a VG. */
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
struct volume_group * vg_read_orphans ( struct cmd_context * cmd , const char * orphan_vgname )
2002-11-18 14:04:08 +00:00
{
2019-06-07 14:30:03 -05:00
const struct format_type * fmt = cmd - > fmt ;
2003-07-04 22:34:56 +00:00
struct lvmcache_vginfo * vginfo ;
2011-03-11 15:08:31 +00:00
struct volume_group * vg = NULL ;
2012-02-10 01:28:27 +00:00
struct _vg_read_orphan_baton baton ;
2013-11-22 13:18:02 +01:00
struct pv_list * pvl , * tpvl ;
2013-02-19 03:13:59 +01:00
struct pv_list head ;
2002-11-18 14:04:08 +00:00
2013-02-19 03:13:59 +01:00
dm_list_init ( & head . list ) ;
2008-04-08 12:49:21 +00:00
2012-02-10 01:28:27 +00:00
if ( ! ( vginfo = lvmcache_vginfo_from_vgname ( orphan_vgname , NULL ) ) )
return_NULL ;
2012-02-10 02:53:03 +00:00
vg = fmt - > orphan_vg ;
2013-11-22 13:18:02 +01:00
dm_list_iterate_items_safe ( pvl , tpvl , & vg - > pvs )
if ( pvl - > pv - > status & UNLABELLED_PV )
dm_list_move ( & head . list , & pvl - > list ) ;
else
2013-02-19 03:13:59 +01:00
pv_set_fid ( pvl - > pv , NULL ) ;
2013-11-22 13:18:02 +01:00
2012-02-29 00:19:14 +00:00
dm_list_init ( & vg - > pvs ) ;
2012-02-29 00:18:27 +00:00
vg - > pv_count = 0 ;
2013-02-19 03:13:59 +01:00
vg - > extent_count = 0 ;
vg - > free_count = 0 ;
2008-04-07 22:12:37 +00:00
2015-03-11 16:18:42 +01:00
baton . cmd = cmd ;
2017-11-06 12:09:52 -06:00
baton . fmt = fmt ;
2012-02-10 01:28:27 +00:00
baton . vg = vg ;
2017-11-06 12:09:52 -06:00
/*
* vg_read for a normal VG will rescan labels for all the devices
* in the VG , in case something changed on disk between the initial
* label scan and acquiring the VG lock . We don ' t rescan labels
* here because this is only called in two ways :
*
* 1. for reporting , in which case it doesn ' t matter if something
* changed between the label scan and printing the PVs here
*
* 2. pvcreate_each_device ( ) for pvcreate //vgcreate/vgextend,
* which already does the label rescan after taking the
* orphan lock .
*/
2012-02-13 10:58:20 +00:00
2013-11-24 19:03:29 +01:00
while ( ( pvl = ( struct pv_list * ) dm_list_first ( & head . list ) ) ) {
2013-02-19 03:13:59 +01:00
dm_list_del ( & pvl - > list ) ;
add_pvl_to_vgs ( vg , pvl ) ;
vg - > extent_count + = pvl - > pv - > pe_count ;
vg - > free_count + = pvl - > pv - > pe_count ;
}
2012-02-13 10:58:20 +00:00
if ( ! lvmcache_foreach_pv ( vginfo , _vg_read_orphan_pv , & baton ) )
2012-03-01 09:46:38 +00:00
return_NULL ;
2002-11-18 14:04:08 +00:00
return vg ;
}
2011-08-11 16:31:40 +00:00
static void _destroy_fid ( struct format_instance * * fid )
{
if ( * fid ) {
( * fid ) - > fmt - > ops - > destroy_instance ( * fid ) ;
* fid = NULL ;
}
}
2009-07-29 13:26:01 +00:00
int vg_missing_pv_count ( const struct volume_group * vg )
2008-09-19 06:42:00 +00:00
{
int ret = 0 ;
struct pv_list * pvl ;
2008-11-03 22:14:30 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs ) {
2010-03-16 14:37:38 +00:00
if ( is_missing_pv ( pvl - > pv ) )
2008-09-19 06:42:00 +00:00
+ + ret ;
}
return ret ;
}
2016-03-16 14:01:26 +01:00
# define DEV_LIST_DELIM ", "
2024-07-08 15:32:41 -05:00
static int _check_devs_used_correspond_with_lv ( struct dm_pool * mem , struct dm_list * list , struct logical_volume * lv )
2016-03-16 14:01:26 +01:00
{
struct device_list * dl ;
int found_inconsistent = 0 ;
struct device * dev ;
struct lv_segment * seg ;
uint32_t s ;
2016-04-25 11:15:44 +02:00
int warned_about_no_dev = 0 ;
2016-03-16 14:01:26 +01:00
char * used_devnames = NULL , * assumed_devnames = NULL ;
2024-07-08 15:32:41 -05:00
if ( ! ( list = dev_cache_get_dev_list_for_lvid ( lv - > lvid . s + ID_LEN ) ) )
2016-03-16 14:01:26 +01:00
return 1 ;
dm_list_iterate_items ( dl , list ) {
dev = dl - > dev ;
if ( ! ( dev - > flags & DEV_ASSUMED_FOR_LV ) ) {
if ( ! found_inconsistent ) {
2016-07-12 16:43:12 +02:00
if ( ! dm_pool_begin_object ( mem , 32 ) )
return_0 ;
2016-03-16 14:01:26 +01:00
found_inconsistent = 1 ;
2016-03-22 16:03:51 +01:00
} else {
if ( ! dm_pool_grow_object ( mem , DEV_LIST_DELIM , sizeof ( DEV_LIST_DELIM ) - 1 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-22 16:03:51 +01:00
}
2016-03-16 14:01:26 +01:00
if ( ! dm_pool_grow_object ( mem , dev_name ( dev ) , 0 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-16 14:01:26 +01:00
}
}
if ( ! found_inconsistent )
return 1 ;
2016-03-22 16:03:51 +01:00
if ( ! dm_pool_grow_object ( mem , " \0 " , 1 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-16 14:01:26 +01:00
used_devnames = dm_pool_end_object ( mem ) ;
found_inconsistent = 0 ;
dm_list_iterate_items ( seg , & lv - > segments ) {
for ( s = 0 ; s < seg - > area_count ; s + + ) {
if ( seg_type ( seg , s ) = = AREA_PV ) {
if ( ! ( dev = seg_dev ( seg , s ) ) ) {
2016-04-25 11:15:44 +02:00
if ( ! warned_about_no_dev ) {
2016-04-25 11:41:36 +02:00
log_warn ( " WARNING: Couldn't find all devices for LV %s "
" while checking used and assumed devices. " ,
2016-04-25 11:15:44 +02:00
display_lvname ( lv ) ) ;
warned_about_no_dev = 1 ;
}
continue ;
2016-03-16 14:01:26 +01:00
}
if ( ! ( dev - > flags & DEV_USED_FOR_LV ) ) {
if ( ! found_inconsistent ) {
2016-07-12 16:43:12 +02:00
if ( ! dm_pool_begin_object ( mem , 32 ) )
return_0 ;
2016-03-16 14:01:26 +01:00
found_inconsistent = 1 ;
} else {
2016-03-22 16:03:51 +01:00
if ( ! dm_pool_grow_object ( mem , DEV_LIST_DELIM , sizeof ( DEV_LIST_DELIM ) - 1 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-16 14:01:26 +01:00
}
if ( ! dm_pool_grow_object ( mem , dev_name ( dev ) , 0 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-16 14:01:26 +01:00
}
}
}
}
if ( found_inconsistent ) {
2016-03-22 16:03:51 +01:00
if ( ! dm_pool_grow_object ( mem , " \0 " , 1 ) )
2016-04-21 20:55:23 +02:00
return_0 ;
2016-03-16 14:01:26 +01:00
assumed_devnames = dm_pool_end_object ( mem ) ;
2016-04-25 11:15:44 +02:00
log_warn ( " WARNING: Device mismatch detected for %s which is accessing %s instead of %s. " ,
display_lvname ( lv ) , used_devnames , assumed_devnames ) ;
2016-03-16 14:01:26 +01:00
}
return 1 ;
}
2024-07-08 15:32:41 -05:00
static int _check_devs_used_correspond_with_vg ( struct volume_group * vg )
2016-03-16 14:01:26 +01:00
{
2016-03-21 14:38:49 +01:00
struct dm_pool * mem ;
2021-10-01 14:25:59 +02:00
char vgid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2016-03-16 14:01:26 +01:00
struct pv_list * pvl ;
struct lv_list * lvl ;
struct dm_list * list ;
struct device_list * dl ;
int found_inconsistent = 0 ;
2021-10-01 14:25:59 +02:00
vgid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( vgid , & vg - > id . uuid , ID_LEN ) ;
2016-03-16 14:01:26 +01:00
/* Mark all PVs in VG as used. */
dm_list_iterate_items ( pvl , & vg - > pvs ) {
2016-04-27 12:13:26 -05:00
/*
* FIXME : It ' s not clear if the meaning
* of " missing " should always include the
* ! pv - > dev case , or if " missing " is the
* more narrow case where VG metadata has
* been written with the MISSING flag .
*/
if ( ! pvl - > pv - > dev )
continue ;
2016-03-16 14:01:26 +01:00
if ( is_missing_pv ( pvl - > pv ) )
continue ;
pvl - > pv - > dev - > flags | = DEV_ASSUMED_FOR_LV ;
}
2024-07-08 15:32:41 -05:00
if ( ! ( list = dev_cache_get_dev_list_for_vgid ( vgid ) ) )
2016-03-16 14:01:26 +01:00
return 1 ;
dm_list_iterate_items ( dl , list ) {
if ( ! ( dl - > dev - > flags & DEV_OPEN_FAILURE ) & &
! ( dl - > dev - > flags & DEV_ASSUMED_FOR_LV ) ) {
found_inconsistent = 1 ;
break ;
}
}
if ( found_inconsistent ) {
2016-03-21 14:38:49 +01:00
if ( ! ( mem = dm_pool_create ( " vg_devs_check " , 1024 ) ) )
return_0 ;
2016-03-16 14:01:26 +01:00
dm_list_iterate_items ( lvl , & vg - > lvs ) {
2024-07-08 15:32:41 -05:00
if ( ! _check_devs_used_correspond_with_lv ( mem , list , lvl - > lv ) ) {
2016-03-21 14:38:49 +01:00
dm_pool_destroy ( mem ) ;
2016-03-16 14:01:26 +01:00
return_0 ;
2016-03-21 14:38:49 +01:00
}
2016-03-16 14:01:26 +01:00
}
2016-03-21 14:38:49 +01:00
dm_pool_destroy ( mem ) ;
2016-03-16 14:01:26 +01:00
}
return 1 ;
}
2011-03-11 14:56:56 +00:00
void free_pv_fid ( struct physical_volume * pv )
{
if ( ! pv )
return ;
2013-05-29 12:42:09 +02:00
pv_set_fid ( pv , NULL ) ;
2011-03-11 14:56:56 +00:00
}
2008-01-30 14:00:02 +00:00
static struct physical_volume * _pv_read ( struct cmd_context * cmd ,
2017-11-06 12:09:52 -06:00
const struct format_type * fmt ,
struct volume_group * vg ,
struct lvmcache_info * info )
2002-04-24 18:20:51 +00:00
{
2021-10-01 14:25:59 +02:00
char pvid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2002-04-24 18:20:51 +00:00
struct physical_volume * pv ;
2017-11-06 12:09:52 -06:00
struct device * dev = lvmcache_device ( info ) ;
2002-11-18 14:04:08 +00:00
2017-11-06 12:09:52 -06:00
if ( ! ( pv = _alloc_pv ( vg - > vgmem , NULL ) ) ) {
log_error ( " pv allocation failed " ) ;
2005-04-19 20:52:35 +00:00
return NULL ;
2002-11-18 14:04:08 +00:00
}
2017-11-06 12:09:52 -06:00
if ( fmt - > ops - > pv_read ) {
/* format1 and pool */
if ( ! ( fmt - > ops - > pv_read ( fmt , dev_name ( dev ) , pv , 0 ) ) ) {
log_error ( " Failed to read existing physical volume '%s' " , dev_name ( dev ) ) ;
goto bad ;
}
} else {
/* format text */
if ( ! lvmcache_populate_pv_fields ( info , vg , pv ) )
goto_bad ;
2002-04-24 18:20:51 +00:00
}
2017-11-06 12:09:52 -06:00
if ( ! alloc_pv_segment_whole_pv ( vg - > vgmem , pv ) )
2010-01-21 21:09:23 +00:00
goto_bad ;
2005-04-19 20:52:35 +00:00
2021-10-01 14:25:59 +02:00
pvid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( pvid , & pv - > id . uuid , ID_LEN ) ;
lvmcache_fid_add_mdas ( info , vg - > fid , pvid , ID_LEN ) ;
2017-11-06 12:09:52 -06:00
pv_set_fid ( pv , vg - > fid ) ;
2005-04-19 20:52:35 +00:00
return pv ;
2010-01-21 21:09:23 +00:00
bad :
2011-03-11 14:56:56 +00:00
free_pv_fid ( pv ) ;
2017-11-06 12:09:52 -06:00
dm_pool_free ( vg - > vgmem , pv ) ;
2010-01-21 21:09:23 +00:00
return NULL ;
2002-04-24 18:20:51 +00:00
}
2019-02-05 11:32:54 -06:00
/*
* FIXME : we only want to print the warnings when this is called from
* vg_read , not from import_vg_from_metadata , so do the warnings elsewhere
* or avoid calling this from import_vg_from .
*/
static void _set_pv_device ( struct format_instance * fid ,
struct volume_group * vg ,
2021-02-05 16:16:03 -06:00
struct physical_volume * pv )
2019-02-05 11:32:54 -06:00
{
char buffer [ 64 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2019-07-09 13:32:41 -05:00
struct cmd_context * cmd = fid - > fmt - > cmd ;
struct device * dev ;
2019-02-05 11:32:54 -06:00
uint64_t size ;
2021-08-03 15:32:33 -05:00
if ( ! ( dev = lvmcache_device_from_pv_id ( cmd , & pv - > id , & pv - > label_sector ) ) ) {
2019-02-05 11:32:54 -06:00
if ( ! id_write_format ( & pv - > id , buffer , sizeof ( buffer ) ) )
buffer [ 0 ] = ' \0 ' ;
2021-11-03 09:50:11 -05:00
if ( cmd & & ! cmd - > expect_missing_vg_device & &
2019-12-11 12:56:15 -06:00
( ! vg_is_foreign ( vg ) & & ! cmd - > include_foreign_vgs ) )
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
log_warn ( " WARNING: Couldn't find device with uuid %s. " , buffer ) ;
2019-02-05 11:32:54 -06:00
else
log_debug_metadata ( " Couldn't find device with uuid %s. " , buffer ) ;
}
2019-07-09 13:32:41 -05:00
pv - > dev = dev ;
2019-02-05 11:32:54 -06:00
/*
* A previous command wrote the VG while this dev was missing , so
* the MISSING flag was included in the PV .
*/
if ( ( pv - > status & MISSING_PV ) & & pv - > dev )
log_warn ( " WARNING: VG %s was previously updated while PV %s was missing. " , vg - > name , dev_name ( pv - > dev ) ) ;
/*
* If this command writes the VG , we want the MISSING flag to be
* written for this PV with no device .
*/
if ( ! pv - > dev )
pv - > status | = MISSING_PV ;
/* Fix up pv size if missing or impossibly large */
if ( ( ! pv - > size | | pv - > size > ( 1ULL < < 62 ) ) & & pv - > dev ) {
if ( ! dev_get_size ( pv - > dev , & pv - > size ) ) {
log_error ( " %s: Couldn't get size. " , pv_dev_name ( pv ) ) ;
return ;
}
log_verbose ( " Fixing up missing size (%s) for PV %s " , display_size ( fid - > fmt - > cmd , pv - > size ) ,
pv_dev_name ( pv ) ) ;
size = pv - > pe_count * ( uint64_t ) vg - > extent_size + pv - > pe_start ;
if ( size > pv - > size )
log_warn ( " WARNING: Physical Volume %s is too large "
" for underlying device " , pv_dev_name ( pv ) ) ;
}
}
/*
2024-08-29 23:05:41 +02:00
* Finds the ' struct device ' that corresponds to each PV in the metadata ,
2019-02-05 11:32:54 -06:00
* and may make some adjustments to vg fields based on the dev properties .
*/
2021-02-05 16:16:03 -06:00
void set_pv_devices ( struct format_instance * fid , struct volume_group * vg )
2019-02-05 11:32:54 -06:00
{
struct pv_list * pvl ;
dm_list_iterate_items ( pvl , & vg - > pvs )
2021-02-05 16:16:03 -06:00
_set_pv_device ( fid , vg , pvl - > pv ) ;
2019-02-05 11:32:54 -06:00
}
pvscan: use process_each_vg for autoactivate
This refactors the code for autoactivation. Previously,
as each PV was found, it would be sent to lvmetad, and
the VG would be autoactivated using a non-standard VG
processing function (the "activation_handler") called via
a function pointer from within the lvmetad notification path.
Now, any scanning that the command needs to do (scanning
only the named device args, or scanning all devices when
there are no args), is done first, before any activation
is attempted. During the scans, the VG names are saved.
After scanning is complete, process_each_vg is used to do
autoactivation of the saved VG names. This makes pvscan
activation much more similar to activation done with
vgchange or lvchange.
The separate autoactivate phase also means that if lvmetad
is disabled (either before or during the scan), the command
can continue with the activation step by simply not using
lvmetad and reverting to disk scanning to do the
activation.
2016-04-28 09:37:03 -05:00
int pv_write ( struct cmd_context * cmd ,
2011-02-28 13:19:02 +00:00
struct physical_volume * pv , int allow_non_orphan )
2002-04-24 18:20:51 +00:00
{
2003-08-26 21:12:06 +00:00
if ( ! pv - > fmt - > ops - > pv_write ) {
log_error ( " Format does not support writing physical volumes " ) ;
return 0 ;
}
2011-02-25 14:08:54 +00:00
/*
* FIXME : Try to remove this restriction . This requires checking
* that the PV and the VG are in a consistent state . We need
* to provide some revert mechanism since PV label together
* with VG metadata write is not atomic .
*/
2011-02-28 13:19:02 +00:00
if ( ! allow_non_orphan & &
( ! is_orphan_vg ( pv - > vg_name ) | | pv - > pe_alloc_count ) ) {
2002-11-18 14:04:08 +00:00
log_error ( " Assertion failed: can't _pv_write non-orphan PV "
2014-03-18 23:54:46 +01:00
" (in VG %s) " , pv_vg_name ( pv ) ) ;
2002-11-18 14:04:08 +00:00
return 0 ;
2002-04-24 18:20:51 +00:00
}
2020-01-28 10:33:15 -06:00
if ( ! pv - > fmt - > ops - > pv_write ( cmd , pv - > fmt , pv ) )
2008-01-30 13:19:47 +00:00
return_0 ;
2002-04-24 18:20:51 +00:00
2013-03-25 16:21:59 +01:00
pv - > status & = ~ UNLABELLED_PV ;
2002-04-24 18:20:51 +00:00
return 1 ;
}
2007-02-07 13:29:52 +00:00
int pv_write_orphan ( struct cmd_context * cmd , struct physical_volume * pv )
{
const char * old_vg_name = pv - > vg_name ;
2008-02-06 15:47:28 +00:00
pv - > vg_name = cmd - > fmt - > orphan_vg_name ;
2007-02-07 13:29:52 +00:00
pv - > status = ALLOCATABLE_PV ;
2008-09-25 15:59:10 +00:00
pv - > pe_alloc_count = 0 ;
2007-02-07 13:29:52 +00:00
if ( ! dev_get_size ( pv - > dev , & pv - > size ) ) {
2007-10-12 14:29:32 +00:00
log_error ( " %s: Couldn't get size. " , pv_dev_name ( pv ) ) ;
2007-02-07 13:29:52 +00:00
return 0 ;
}
2011-02-28 13:19:02 +00:00
if ( ! pv_write ( cmd , pv , 0 ) ) {
2007-02-07 13:29:52 +00:00
log_error ( " Failed to clear metadata from physical "
" volume \" %s \" after removal from \" %s \" " ,
2007-10-12 14:29:32 +00:00
pv_dev_name ( pv ) , old_vg_name ) ;
2007-02-07 13:29:52 +00:00
return 0 ;
}
return 1 ;
}
2007-06-14 15:48:05 +00:00
2007-11-02 13:06:42 +00:00
/**
* is_orphan_vg - Determine whether a vg_name is an orphan
* @ vg_name : pointer to the vg_name
*/
int is_orphan_vg ( const char * vg_name )
{
2010-05-19 02:36:33 +00:00
return ( vg_name & & ! strncmp ( vg_name , ORPHAN_PREFIX , sizeof ( ORPHAN_PREFIX ) - 1 ) ) ? 1 : 0 ;
2007-11-02 13:06:42 +00:00
}
2011-01-12 20:42:50 +00:00
/*
* Exclude pseudo VG names used for locking .
*/
int is_real_vg ( const char * vg_name )
{
return ( vg_name & & * vg_name ! = ' # ' ) ;
}
2009-07-28 15:14:56 +00:00
/* FIXME: remove / combine this with locking? */
2009-07-29 13:26:01 +00:00
int vg_check_write_mode ( struct volume_group * vg )
2009-07-28 15:14:56 +00:00
{
if ( vg - > open_mode ! = ' w ' ) {
2009-07-28 20:41:41 +00:00
log_errno ( EPERM , " Attempt to modify a read-only VG " ) ;
2009-07-28 15:14:56 +00:00
return 0 ;
}
return 1 ;
}
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
/*
* Return 1 if the VG metadata should be written
2015-03-09 18:53:22 +00:00
* * without * the LVM_WRITE flag in the status line , and
* * with * the LVM_WRITE_LOCKED flag in the flags line .
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
*
* If this is done for a VG , it forces previous versions
2015-03-09 18:53:22 +00:00
* of lvm ( before the LVM_WRITE_LOCKED flag was added ) , to view
* the VG and its LVs as read - only ( because the LVM_WRITE flag
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
* is missing ) . Versions of lvm that understand the
2015-03-09 18:53:22 +00:00
* LVM_WRITE_LOCKED flag know to check the other methods of
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
* access control for the VG , specifically system_id and lock_type .
*
* So , if a VG has a system_id or lock_type , then the
* system_id and lock_type control access to the VG in
* addition to its basic writable status . Because previous
* lvm versions do not know about system_id or lock_type ,
2015-03-09 18:53:22 +00:00
* VGs depending on either of these should have LVM_WRITE_LOCKED
* instead of LVM_WRITE to prevent the previous lvm versions from
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
* assuming they can write the VG and its LVs .
*/
int vg_flag_write_locked ( struct volume_group * vg )
{
if ( vg - > system_id & & vg - > system_id [ 0 ] )
return 1 ;
if ( vg - > lock_type & & vg - > lock_type [ 0 ] & & strcmp ( vg - > lock_type , " none " ) )
return 1 ;
return 0 ;
}
2016-12-24 23:10:06 +01:00
static int _access_vg_clustered ( struct cmd_context * cmd , const struct volume_group * vg )
{
2018-06-05 10:47:01 -05:00
if ( vg_is_clustered ( vg ) ) {
2018-06-13 15:30:28 -05:00
/*
* force_access_clustered is only set when forcibly
* converting a clustered vg to lock type none .
*/
if ( cmd - > force_access_clustered ) {
log_debug ( " Allowing forced access to clustered vg %s " , vg - > name ) ;
return 1 ;
}
2018-06-15 15:43:59 -05:00
log_verbose ( " Skipping clustered VG %s. " , vg - > name ) ;
2016-12-24 23:10:06 +01:00
return 0 ;
}
return 1 ;
}
2009-01-26 22:42:59 +00:00
/*
* Performs a set of checks against a VG according to bits set in status
* and returns FAILED_ * bits for those that aren ' t acceptable .
*
* FIXME Remove the unnecessary duplicate definitions and return bits directly .
*/
2017-10-18 19:29:32 +01:00
uint32_t vg_bad_status_bits ( const struct volume_group * vg , uint64_t status )
2009-01-26 22:13:22 +00:00
{
uint32_t failure = 0 ;
2007-06-06 19:40:28 +00:00
2024-04-09 11:35:23 +02:00
if ( ! vg ) {
log_error ( INTERNAL_ERROR " Missing volume group. " ) ;
return FAILED_NOTFOUND ;
}
2016-12-24 23:10:06 +01:00
if ( ( status & CLUSTERED ) & & ! _access_vg_clustered ( vg - > cmd , vg ) )
2009-01-26 22:13:22 +00:00
/* Return because other flags are considered undefined. */
return FAILED_CLUSTERED ;
if ( ( status & LVM_WRITE ) & &
! ( vg - > status & LVM_WRITE ) ) {
log_error ( " Volume group %s is read-only " , vg - > name ) ;
failure | = FAILED_READ_ONLY ;
}
if ( ( status & RESIZEABLE_VG ) & &
2009-09-15 18:35:13 +00:00
! vg_is_resizeable ( vg ) ) {
2009-01-26 22:13:22 +00:00
log_error ( " Volume group %s is not resizeable. " , vg - > name ) ;
failure | = FAILED_RESIZEABLE ;
}
return failure ;
}
2007-06-06 19:40:28 +00:00
/**
* vg_check_status - check volume group status flags and log error
* @ vg - volume group to check status flags
2019-06-21 13:37:11 -05:00
* @ status - specific status flags to check
2007-06-06 19:40:28 +00:00
*/
2009-11-24 22:55:55 +00:00
int vg_check_status ( const struct volume_group * vg , uint64_t status )
2007-06-06 19:40:28 +00:00
{
2017-10-18 19:29:32 +01:00
return ! vg_bad_status_bits ( vg , status ) ;
2007-06-06 19:40:28 +00:00
}
2007-06-12 21:20:20 +00:00
2015-07-14 14:42:18 -05:00
static int _allow_extra_system_id ( struct cmd_context * cmd , const char * system_id )
2014-10-24 12:29:04 -05:00
{
const struct dm_config_node * cn ;
const struct dm_config_value * cv ;
const char * str ;
2015-07-08 11:22:24 +02:00
if ( ! ( cn = find_config_tree_array ( cmd , local_extra_system_ids_CFG , NULL ) ) )
2014-10-24 12:29:04 -05:00
return 0 ;
for ( cv = cn - > v ; cv ; cv = cv - > next ) {
if ( cv - > type = = DM_CFG_EMPTY_ARRAY )
break ;
2015-02-23 22:19:08 +00:00
/* Ignore invalid data: Warning message already issued by config.c */
if ( cv - > type ! = DM_CFG_STRING )
2014-10-24 12:29:04 -05:00
continue ;
str = cv - > v . str ;
2015-02-23 22:19:08 +00:00
if ( ! * str )
2014-10-24 12:29:04 -05:00
continue ;
if ( ! strcmp ( str , system_id ) )
return 1 ;
}
return 0 ;
}
2015-03-05 14:00:44 -06:00
static int _access_vg_lock_type ( struct cmd_context * cmd , struct volume_group * vg ,
2015-07-14 11:36:04 -05:00
uint32_t lockd_state , uint32_t * failure )
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
{
2015-03-05 14:00:44 -06:00
if ( cmd - > lockd_vg_disable )
return 1 ;
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
/*
2015-03-05 14:00:44 -06:00
* Local VG requires no lock from lvmlockd .
*/
2018-06-01 10:04:54 -05:00
if ( ! vg_is_shared ( vg ) )
2015-03-05 14:00:44 -06:00
return 1 ;
/*
* When lvmlockd is not used , lockd VGs are ignored by lvm
* and cannot be used , with two exceptions :
*
* . The - - shared option allows them to be revealed with
* reporting / display commands .
*
* . If a command asks to operate on one specifically
* by name , then an error is printed .
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
*/
2015-03-05 14:00:44 -06:00
if ( ! lvmlockd_use ( ) ) {
/*
* Some reporting / display commands have the - - shared option
* ( like - - foreign ) to allow them to reveal lockd VGs that
* are otherwise ignored . The - - shared option must only be
* permitted in commands that read the VG for report or display ,
* not any that write the VG or activate LVs .
*/
if ( cmd - > include_shared_vgs )
return 1 ;
/*
* Some commands want the error printed by vg_read , others by ignore_vg .
* Those using ignore_vg may choose to skip the error .
*/
if ( cmd - > vg_read_print_access_error ) {
log_error ( " Cannot access VG %s with lock type %s that requires lvmlockd. " ,
vg - > name , vg - > lock_type ) ;
}
2015-07-14 11:36:04 -05:00
* failure | = FAILED_LOCK_TYPE ;
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
return 0 ;
}
2015-03-05 14:00:44 -06:00
/*
* The lock request from lvmlockd failed . If the lock was ex ,
* we cannot continue . If the lock was sh , we could also fail
* to continue but since the lock was sh , it means the VG is
* only being read , and it doesn ' t hurt to allow reading with
* no lock .
*/
if ( lockd_state & LDST_FAIL ) {
vgchange/lvchange: enforce the shared VG lock from lvmlockd
The vgchange/lvchange activation commands read the VG, and
don't write it, so they acquire a shared VG lock from lvmlockd.
When other commands fail to acquire a shared VG lock from
lvmlockd, a warning is printed and they continue without it.
(Without it, the VG metadata they display from lvmetad may
not be up to date.)
vgchange/lvchange -a shouldn't continue without the shared
lock for a couple reasons:
. Usually they will just continue on and fail to acquire the
LV locks for activation, so continuing is pointless.
. More importantly, without the sh VG lock, the VG metadata
used by the command may be stale, and the LV locks shown
in the VG metadata may no longer be current. In the
case of sanlock, this would result in odd, unpredictable
errors when lvmlockd doesn't find the expected lock on
disk. In the case of dlm, the invalid LV lock could be
granted for the non-existing LV.
The solution is to not continue after the shared lock fails,
in the same way that a command fails if an exclusive lock fails.
2015-07-17 15:13:22 -05:00
if ( ( lockd_state & LDST_EX ) | | cmd - > lockd_vg_enforce_sh ) {
2015-03-05 14:00:44 -06:00
log_error ( " Cannot access VG %s due to failed lock. " , vg - > name ) ;
2015-07-14 11:36:04 -05:00
* failure | = FAILED_LOCK_MODE ;
2015-03-05 14:00:44 -06:00
return 0 ;
}
2017-07-19 16:16:12 +02:00
2024-06-26 13:57:30 -05:00
if ( lockd_state & ( LDST_FAIL_NOLS | LDST_FAIL_STARTING ) )
vg - > lockd_not_started = 1 ;
2017-07-19 16:16:12 +02:00
log_warn ( " Reading VG %s without a lock. " , vg - > name ) ;
return 1 ;
2015-03-05 14:00:44 -06:00
}
2016-12-15 11:49:19 -06:00
if ( test_mode ( ) ) {
log_error ( " Test mode is not yet supported with lock type %s. " , vg - > lock_type ) ;
2019-10-04 10:07:24 -05:00
* failure | = FAILED_LOCK_TYPE ;
2016-12-15 11:49:19 -06:00
return 0 ;
}
system_id: make new VGs read-only for old lvm versions
Previous versions of lvm will not obey the restrictions
imposed by the new system_id, and would allow such a VG
to be written. So, a VG with a new system_id is further
changed to force previous lvm versions to treat it as
read-only. This is done by removing the WRITE flag from
the metadata status line of these VGs, and putting a new
WRITE_LOCKED flag in the flags line of the metadata.
Versions of lvm that recognize WRITE_LOCKED, also obey the
new system_id. For these lvm versions, WRITE_LOCKED is
identical to WRITE, and the rules associated with matching
system_id's are imposed.
A new VG lock_type field is also added that causes the same
WRITE/WRITE_LOCKED transformation when set. A previous
version of lvm will also see a VG with lock_type as read-only.
Versions of lvm that recognize WRITE_LOCKED, must also obey
the lock_type setting. Until the lock_type feature is added,
lvm will fail to read any VG with lock_type set and report an
error about an unsupported lock_type. Once the lock_type
feature is added, lvm will allow VGs with lock_type to be
used according to the rules imposed by the lock_type.
When both system_id and lock_type settings are removed, a VG
is written with the old WRITE status flag, and without the
new WRITE_LOCKED flag. This allows old versions of lvm to
use the VG as before.
2015-03-04 11:30:53 -06:00
return 1 ;
}
2015-11-30 11:46:55 -06:00
int is_system_id_allowed ( struct cmd_context * cmd , const char * system_id )
2014-10-24 12:29:04 -05:00
{
2015-02-23 23:03:52 +00:00
/*
2015-11-30 11:46:55 -06:00
* A VG without a system_id can be accessed by anyone .
2015-02-23 23:03:52 +00:00
*/
2015-11-30 11:46:55 -06:00
if ( ! system_id | | ! system_id [ 0 ] )
return 1 ;
2015-02-23 23:03:52 +00:00
2014-10-24 12:29:04 -05:00
/*
2015-11-30 11:46:55 -06:00
* Allowed if the host and VG system_id ' s match .
2014-10-24 12:29:04 -05:00
*/
2015-11-30 11:46:55 -06:00
if ( cmd - > system_id & & ! strcmp ( cmd - > system_id , system_id ) )
2014-10-24 12:29:04 -05:00
return 1 ;
/*
2015-11-30 11:46:55 -06:00
* Allowed if a host ' s extra system_id matches .
2014-10-24 12:29:04 -05:00
*/
2015-11-30 11:46:55 -06:00
if ( cmd - > system_id & & _allow_extra_system_id ( cmd , system_id ) )
2014-10-24 12:29:04 -05:00
return 1 ;
2015-02-23 23:19:36 +00:00
/*
2015-11-30 11:46:55 -06:00
* Not allowed if the host does not have a system_id
* and the VG does , or if the host and VG ' s system_id ' s
* do not match .
2015-02-23 23:19:36 +00:00
*/
2015-11-30 11:46:55 -06:00
return 0 ;
}
static int _access_vg_systemid ( struct cmd_context * cmd , struct volume_group * vg )
{
/*
* A few commands allow read - only access to foreign VGs .
*/
if ( cmd - > include_foreign_vgs )
return 1 ;
if ( is_system_id_allowed ( cmd , vg - > system_id ) )
2015-02-23 23:19:36 +00:00
return 1 ;
2014-10-24 12:29:04 -05:00
/*
* Allow VG access if the local host has active LVs in it .
*/
if ( lvs_in_vg_activated ( vg ) ) {
2015-02-27 13:32:00 -06:00
log_warn ( " WARNING: Found LVs active in VG %s with foreign system ID %s. Possible data corruption. " ,
2014-10-24 12:29:04 -05:00
vg - > name , vg - > system_id ) ;
2015-02-25 11:33:11 -06:00
if ( cmd - > include_active_foreign_vgs )
return 1 ;
return 0 ;
2014-10-24 12:29:04 -05:00
}
/*
2015-11-30 11:46:55 -06:00
* Print an error when reading a VG that has a system_id
* and the host system_id is unknown .
2014-10-24 12:29:04 -05:00
*/
if ( ! cmd - > system_id | | cmd - > unknown_system_id ) {
2015-02-27 13:32:00 -06:00
log_error ( " Cannot access VG %s with system ID %s with unknown local system ID. " ,
2015-02-23 23:19:36 +00:00
vg - > name , vg - > system_id ) ;
2014-10-24 12:29:04 -05:00
return 0 ;
}
2015-02-25 10:44:42 -06:00
/*
2015-03-05 14:00:44 -06:00
* Some commands want the error printed by vg_read , others by ignore_vg .
* Those using ignore_vg may choose to skip the error .
2015-02-25 10:44:42 -06:00
*/
2015-03-05 14:00:44 -06:00
if ( cmd - > vg_read_print_access_error ) {
2015-02-27 13:32:00 -06:00
log_error ( " Cannot access VG %s with system ID %s with local system ID %s. " ,
2015-02-25 10:44:42 -06:00
vg - > name , vg - > system_id , cmd - > system_id ) ;
return 0 ;
}
2015-03-05 14:00:44 -06:00
/* Silently ignore foreign vgs. */
2014-10-24 12:29:04 -05:00
return 0 ;
}
2019-06-21 13:37:11 -05:00
static int _access_vg_exported ( struct cmd_context * cmd , struct volume_group * vg )
{
if ( ! vg_is_exported ( vg ) )
return 1 ;
if ( cmd - > include_exported_vgs )
return 1 ;
/*
* Some commands want the error printed by vg_read , others by ignore_vg .
* Those using ignore_vg may choose to skip the error .
*/
if ( cmd - > vg_read_print_access_error ) {
log_error ( " Volume group %s is exported " , vg - > name ) ;
return 0 ;
}
/* Silently ignore exported vgs. */
return 0 ;
}
2011-03-11 14:30:27 +00:00
struct format_instance * alloc_fid ( const struct format_type * fmt ,
const struct format_instance_ctx * fic )
{
2011-03-11 14:38:38 +00:00
struct dm_pool * mem ;
2011-03-11 14:30:27 +00:00
struct format_instance * fid ;
2011-03-11 14:38:38 +00:00
if ( ! ( mem = dm_pool_create ( " format_instance " , 1024 ) ) )
return_NULL ;
2011-03-11 15:10:16 +00:00
if ( ! ( fid = dm_pool_zalloc ( mem , sizeof ( * fid ) ) ) ) {
2011-03-11 14:30:27 +00:00
log_error ( " Couldn't allocate format_instance object. " ) ;
2011-03-11 14:38:38 +00:00
goto bad ;
2011-03-11 14:30:27 +00:00
}
2011-03-11 14:38:38 +00:00
fid - > ref_count = 1 ;
fid - > mem = mem ;
2011-03-11 14:30:27 +00:00
fid - > type = fic - > type ;
2011-03-11 14:38:38 +00:00
fid - > fmt = fmt ;
2011-03-11 14:30:27 +00:00
dm_list_init ( & fid - > metadata_areas_in_use ) ;
dm_list_init ( & fid - > metadata_areas_ignored ) ;
return fid ;
2011-03-11 14:38:38 +00:00
bad :
dm_pool_destroy ( mem ) ;
return NULL ;
2011-03-11 14:30:27 +00:00
}
2011-03-11 14:50:13 +00:00
void pv_set_fid ( struct physical_volume * pv ,
struct format_instance * fid )
{
2012-02-13 11:01:34 +00:00
if ( fid = = pv - > fid )
return ;
2011-04-01 14:54:20 +00:00
if ( fid )
fid - > ref_count + + ;
2011-03-11 14:50:13 +00:00
if ( pv - > fid )
pv - > fid - > fmt - > ops - > destroy_instance ( pv - > fid ) ;
pv - > fid = fid ;
}
2011-02-21 12:10:58 +00:00
void vg_set_fid ( struct volume_group * vg ,
struct format_instance * fid )
{
struct pv_list * pvl ;
2012-02-13 11:01:34 +00:00
if ( fid = = vg - > fid )
return ;
2011-03-11 14:50:13 +00:00
if ( fid )
fid - > ref_count + + ;
2011-02-21 12:10:58 +00:00
dm_list_iterate_items ( pvl , & vg - > pvs )
2011-03-11 14:50:13 +00:00
pv_set_fid ( pvl - > pv , fid ) ;
2011-04-01 14:54:20 +00:00
dm_list_iterate_items ( pvl , & vg - > removed_pvs )
pv_set_fid ( pvl - > pv , fid ) ;
if ( vg - > fid )
vg - > fid - > fmt - > ops - > destroy_instance ( vg - > fid ) ;
vg - > fid = fid ;
2011-02-21 12:10:58 +00:00
}
2011-02-21 12:05:49 +00:00
static int _convert_key_to_string ( const char * key , size_t key_len ,
unsigned sub_key , char * buf , size_t buf_len )
2010-06-28 20:33:22 +00:00
{
2011-02-21 12:05:49 +00:00
memcpy ( buf , key , key_len ) ;
buf + = key_len ;
buf_len - = key_len ;
if ( ( dm_snprintf ( buf , buf_len , " _%u " , sub_key ) = = - 1 ) )
return_0 ;
return 1 ;
}
int fid_add_mda ( struct format_instance * fid , struct metadata_area * mda ,
const char * key , size_t key_len , const unsigned sub_key )
{
2011-11-18 19:31:09 +00:00
static char full_key [ PATH_MAX ] ;
2012-02-23 13:11:07 +00:00
2010-06-30 17:13:05 +00:00
dm_list_add ( mda_is_ignored ( mda ) ? & fid - > metadata_areas_ignored :
2012-02-23 13:11:07 +00:00
& fid - > metadata_areas_in_use , & mda - > list ) ;
2011-02-21 12:05:49 +00:00
/* Return if the mda is not supposed to be indexed. */
if ( ! key )
return 1 ;
2013-11-18 18:00:49 +01:00
if ( ! fid - > metadata_areas_index )
return_0 ;
2011-02-21 12:05:49 +00:00
/* Add metadata area to index. */
2012-02-12 23:01:19 +00:00
if ( ! _convert_key_to_string ( key , key_len , sub_key ,
full_key , sizeof ( full_key ) ) )
2012-02-23 13:11:07 +00:00
return_0 ;
2011-02-21 12:05:49 +00:00
2012-02-28 11:12:58 +00:00
if ( ! dm_hash_insert ( fid - > metadata_areas_index ,
full_key , mda ) ) {
log_error ( " Failed to hash mda. " ) ;
return 0 ;
}
2011-02-21 12:05:49 +00:00
return 1 ;
2010-06-28 20:33:22 +00:00
}
2011-02-21 12:05:49 +00:00
int fid_add_mdas ( struct format_instance * fid , struct dm_list * mdas ,
const char * key , size_t key_len )
2010-06-28 20:33:22 +00:00
{
struct metadata_area * mda , * mda_new ;
2011-02-21 12:05:49 +00:00
unsigned mda_index = 0 ;
2010-06-28 20:33:22 +00:00
dm_list_iterate_items ( mda , mdas ) {
2011-03-11 15:10:16 +00:00
mda_new = mda_copy ( fid - > mem , mda ) ;
2010-06-28 20:33:22 +00:00
if ( ! mda_new )
return_0 ;
2011-02-25 13:59:47 +00:00
fid_remove_mda ( fid , NULL , key , key_len , mda_index ) ;
2011-02-21 12:05:49 +00:00
fid_add_mda ( fid , mda_new , key , key_len , mda_index ) ;
mda_index + + ;
2010-06-28 20:33:22 +00:00
}
2011-02-21 12:05:49 +00:00
return 1 ;
}
struct metadata_area * fid_get_mda_indexed ( struct format_instance * fid ,
const char * key , size_t key_len ,
const unsigned sub_key )
{
2011-11-18 19:31:09 +00:00
static char full_key [ PATH_MAX ] ;
2011-02-21 12:05:49 +00:00
struct metadata_area * mda = NULL ;
2013-11-18 18:00:49 +01:00
if ( ! fid - > metadata_areas_index )
return_NULL ;
2011-02-21 12:05:49 +00:00
2012-02-12 23:01:19 +00:00
if ( ! _convert_key_to_string ( key , key_len , sub_key ,
full_key , sizeof ( full_key ) ) )
return_NULL ;
2013-11-18 18:00:49 +01:00
2012-02-12 23:01:19 +00:00
mda = ( struct metadata_area * ) dm_hash_lookup ( fid - > metadata_areas_index ,
full_key ) ;
2011-02-21 12:05:49 +00:00
return mda ;
}
int fid_remove_mda ( struct format_instance * fid , struct metadata_area * mda ,
const char * key , size_t key_len , const unsigned sub_key )
{
2011-11-18 19:31:09 +00:00
static char full_key [ PATH_MAX ] ;
2011-02-21 12:05:49 +00:00
struct metadata_area * mda_indexed = NULL ;
/* At least one of mda or key must be specified. */
if ( ! mda & & ! key )
return 1 ;
if ( key ) {
/*
* If both mda and key specified , check given mda
* with what we find using the index and return
* immediately if these two do not match .
*/
if ( ! ( mda_indexed = fid_get_mda_indexed ( fid , key , key_len , sub_key ) ) | |
( mda & & mda ! = mda_indexed ) )
return 1 ;
mda = mda_indexed ;
2012-02-12 23:01:19 +00:00
if ( ! _convert_key_to_string ( key , key_len , sub_key ,
full_key , sizeof ( full_key ) ) )
return_0 ;
2011-02-21 12:05:49 +00:00
2012-02-12 23:01:19 +00:00
dm_hash_remove ( fid - > metadata_areas_index , full_key ) ;
2011-02-21 12:05:49 +00:00
}
dm_list_del ( & mda - > list ) ;
2010-06-28 20:33:22 +00:00
return 1 ;
}
2010-06-28 20:31:59 +00:00
/*
* Copy constructor for a metadata_area .
*/
struct metadata_area * mda_copy ( struct dm_pool * mem ,
struct metadata_area * mda )
{
struct metadata_area * mda_new ;
if ( ! ( mda_new = dm_pool_alloc ( mem , sizeof ( * mda_new ) ) ) ) {
log_error ( " metadata_area allocation failed " ) ;
return NULL ;
}
memcpy ( mda_new , mda , sizeof ( * mda ) ) ;
if ( mda - > ops - > mda_metadata_locn_copy & & mda - > metadata_locn ) {
mda_new - > metadata_locn =
mda - > ops - > mda_metadata_locn_copy ( mem , mda - > metadata_locn ) ;
if ( ! mda_new - > metadata_locn ) {
dm_pool_free ( mem , mda_new ) ;
return NULL ;
}
}
2010-07-08 17:41:46 +00:00
dm_list_init ( & mda_new - > list ) ;
2010-06-28 20:31:59 +00:00
return mda_new ;
}
2010-06-28 20:31:38 +00:00
/*
* This function provides a way to answer the question on a format specific
2024-08-29 23:05:41 +02:00
* basis - does the format specific context of these two metadata areas
2010-06-28 20:31:38 +00:00
* match ?
*
2024-08-29 23:05:41 +02:00
* A metadata_area is defined to be independent of the underlying context .
2010-06-28 20:31:38 +00:00
* This has the benefit that we can use the same abstraction to read disks
* ( see _metadata_text_raw_ops ) or files ( see _metadata_text_file_ops ) .
* However , one downside is there is no format - independent way to determine
* whether a given metadata_area is attached to a specific device - in fact ,
* it may not be attached to a device at all .
*
* Thus , LVM is structured such that an mda is not a member of struct
* physical_volume . The location of the mda depends on whether
* the PV is in a volume group . A PV not in a VG has an mda on the
* ' info - > mda ' list in lvmcache , while a PV in a VG has an mda on
2010-06-28 20:32:44 +00:00
* the vg - > fid - > metadata_areas_in_use list . For further details , see _vg_read ( ) ,
* and the sequence of creating the format_instance with fid - > metadata_areas_in_use
2010-06-28 20:31:38 +00:00
* list , as well as the construction of the VG , with list of PVs ( comes
* after the construction of the fid and list of mdas ) .
*/
unsigned mda_locns_match ( struct metadata_area * mda1 , struct metadata_area * mda2 )
{
if ( ! mda1 - > ops - > mda_locns_match | | ! mda2 - > ops - > mda_locns_match | |
mda1 - > ops - > mda_locns_match ! = mda2 - > ops - > mda_locns_match )
return 0 ;
return mda1 - > ops - > mda_locns_match ( mda1 , mda2 ) ;
}
2007-06-19 04:23:32 +00:00
2011-06-15 17:45:02 +00:00
struct device * mda_get_device ( struct metadata_area * mda )
{
if ( ! mda - > ops - > mda_get_device )
return NULL ;
return mda - > ops - > mda_get_device ( mda ) ;
}
2010-06-28 20:30:14 +00:00
unsigned mda_is_ignored ( struct metadata_area * mda )
{
2010-10-05 17:34:05 +00:00
return ( mda - > status & MDA_IGNORED ) ;
2010-06-28 20:30:14 +00:00
}
2010-06-30 17:13:05 +00:00
void mda_set_ignored ( struct metadata_area * mda , unsigned mda_ignored )
2010-06-28 20:30:14 +00:00
{
2010-06-29 22:37:32 +00:00
void * locn = mda - > metadata_locn ;
2010-06-30 17:13:05 +00:00
unsigned old_mda_ignored = mda_is_ignored ( mda ) ;
2010-06-29 22:37:32 +00:00
2010-06-30 17:13:05 +00:00
if ( mda_ignored & & ! old_mda_ignored )
2010-10-05 17:34:05 +00:00
mda - > status | = MDA_IGNORED ;
2010-06-30 17:13:05 +00:00
else if ( ! mda_ignored & & old_mda_ignored )
2010-10-05 17:34:05 +00:00
mda - > status & = ~ MDA_IGNORED ;
2010-06-29 22:37:32 +00:00
else
2010-06-30 13:51:11 +00:00
return ; /* No change */
2013-09-26 11:37:40 -05:00
log_debug_metadata ( " %s ignored flag for mda %s at offset % " PRIu64 " . " ,
2013-01-07 22:30:29 +00:00
mda_ignored ? " Setting " : " Clearing " ,
mda - > ops - > mda_metadata_locn_name ? mda - > ops - > mda_metadata_locn_name ( locn ) : " " ,
mda - > ops - > mda_metadata_locn_offset ? mda - > ops - > mda_metadata_locn_offset ( locn ) : UINT64_C ( 0 ) ) ;
2010-06-28 20:30:14 +00:00
}
2010-06-28 20:34:40 +00:00
int mdas_empty_or_ignored ( struct dm_list * mdas )
{
struct metadata_area * mda ;
2013-07-17 14:49:21 +02:00
if ( dm_list_empty ( mdas ) )
2010-06-28 20:34:40 +00:00
return 1 ;
dm_list_iterate_items ( mda , mdas ) {
if ( mda_is_ignored ( mda ) )
return 1 ;
}
return 0 ;
}
2010-06-30 20:03:52 +00:00
int pv_change_metadataignore ( struct physical_volume * pv , uint32_t mda_ignored )
2010-06-29 21:32:44 +00:00
{
const char * pv_name = pv_dev_name ( pv ) ;
2010-06-30 20:03:52 +00:00
if ( mda_ignored & & ! pv_mda_used_count ( pv ) ) {
2010-06-30 17:13:05 +00:00
log_error ( " Metadata areas on physical volume \" %s \" already "
" ignored. " , pv_name ) ;
2010-06-29 21:32:44 +00:00
return 0 ;
}
2010-06-30 17:13:05 +00:00
2010-06-30 20:03:52 +00:00
if ( ! mda_ignored & & ( pv_mda_used_count ( pv ) = = pv_mda_count ( pv ) ) ) {
2010-06-30 17:13:05 +00:00
log_error ( " Metadata areas on physical volume \" %s \" already "
" marked as in-use. " , pv_name ) ;
2010-06-29 21:32:44 +00:00
return 0 ;
}
2010-06-30 17:13:05 +00:00
2010-06-29 21:32:44 +00:00
if ( ! pv_mda_count ( pv ) ) {
log_error ( " Physical volume \" %s \" has no metadata "
2010-06-30 17:13:05 +00:00
" areas. " , pv_name ) ;
2010-06-29 21:32:44 +00:00
return 0 ;
}
2010-06-30 17:13:05 +00:00
log_verbose ( " Marking metadata areas on physical volume \" %s \" "
2010-06-30 20:03:52 +00:00
" as %s. " , pv_name , mda_ignored ? " ignored " : " in-use " ) ;
2010-06-30 17:13:05 +00:00
2010-06-30 20:03:52 +00:00
if ( ! pv_mda_set_ignored ( pv , mda_ignored ) )
2010-06-30 17:13:05 +00:00
return_0 ;
2010-06-29 21:32:44 +00:00
/*
* Update vg_mda_copies based on the mdas in this PV .
* This is most likely what the user would expect - if they
* specify a specific PV to be ignored / un - ignored , they will
* most likely not want LVM to turn around and change the
* ignore / un - ignore value when it writes the VG to disk .
* This does not guarantee this PV ' s ignore bits will be
* preserved in future operations .
*/
2010-07-07 18:59:45 +00:00
if ( ! is_orphan ( pv ) & &
vg_mda_copies ( pv - > vg ) ! = VGMETADATACOPIES_UNMANAGED ) {
log_warn ( " WARNING: Changing preferred number of copies of VG %s "
" metadata from % " PRIu32 " to % " PRIu32 , pv_vg_name ( pv ) ,
vg_mda_copies ( pv - > vg ) , vg_mda_used_count ( pv - > vg ) ) ;
2010-06-29 21:32:44 +00:00
vg_set_mda_copies ( pv - > vg , vg_mda_used_count ( pv - > vg ) ) ;
2010-07-07 18:59:45 +00:00
}
2010-06-30 17:13:05 +00:00
2010-06-29 21:32:44 +00:00
return 1 ;
}
2014-01-30 21:09:28 +00:00
char * tags_format_and_copy ( struct dm_pool * mem , const struct dm_list * tagsl )
2010-09-30 14:08:07 +00:00
{
2014-05-29 09:41:03 +02:00
struct dm_str_list * sl ;
2010-09-30 14:08:07 +00:00
if ( ! dm_pool_begin_object ( mem , 256 ) ) {
log_error ( " dm_pool_begin_object failed " ) ;
return NULL ;
}
2014-01-30 21:09:28 +00:00
dm_list_iterate_items ( sl , tagsl ) {
2010-09-30 14:08:07 +00:00
if ( ! dm_pool_grow_object ( mem , sl - > str , strlen ( sl - > str ) ) | |
2014-01-30 21:09:28 +00:00
( sl - > list . n ! = tagsl & & ! dm_pool_grow_object ( mem , " , " , 1 ) ) ) {
2010-09-30 14:08:07 +00:00
log_error ( " dm_pool_grow_object failed " ) ;
return NULL ;
}
}
if ( ! dm_pool_grow_object ( mem , " \0 " , 1 ) ) {
log_error ( " dm_pool_grow_object failed " ) ;
return NULL ;
}
return dm_pool_end_object ( mem ) ;
}
2013-03-17 16:27:44 +01:00
2015-11-25 11:10:32 +01:00
const struct logical_volume * lv_committed ( const struct logical_volume * lv )
2013-03-17 16:27:44 +01:00
{
struct volume_group * vg ;
2021-03-09 18:23:42 +01:00
const struct logical_volume * found_lv ;
2013-03-17 16:27:44 +01:00
if ( ! lv )
return NULL ;
2015-11-24 23:29:18 +01:00
if ( ! lv - > vg - > vg_committed )
2014-09-21 23:10:04 +02:00
return lv ;
2013-03-17 16:27:44 +01:00
2015-11-24 23:29:18 +01:00
vg = lv - > vg - > vg_committed ;
2013-03-17 16:27:44 +01:00
2015-11-21 23:31:44 +01:00
if ( ! ( found_lv = find_lv_in_vg_by_lvid ( vg , & lv - > lvid ) ) ) {
2015-11-25 11:10:32 +01:00
log_error ( INTERNAL_ERROR " LV %s (UUID %s) not found in committed metadata. " ,
2014-09-21 11:46:34 +02:00
display_lvname ( lv ) , lv - > lvid . s ) ;
2021-03-09 18:23:42 +01:00
found_lv = lv ; /* Use uncommitted LV as best effort */
2014-09-21 11:46:34 +02:00
}
2013-03-17 16:27:44 +01:00
2015-11-21 23:31:44 +01:00
return found_lv ;
2013-03-17 16:27:44 +01:00
}
2015-03-05 14:00:44 -06:00
/*
* Check if a lock_type uses lvmlockd .
* If not ( none , clvm ) , return 0.
* If so ( dlm , sanlock ) , return 1.
*/
int is_lockd_type ( const char * lock_type )
{
if ( ! lock_type )
return 0 ;
if ( ! strcmp ( lock_type , " dlm " ) )
return 1 ;
if ( ! strcmp ( lock_type , " sanlock " ) )
return 1 ;
2021-05-07 10:25:14 +08:00
if ( ! strcmp ( lock_type , " idm " ) )
return 1 ;
2015-03-05 14:00:44 -06:00
return 0 ;
}
2018-05-31 10:23:03 -05:00
int vg_is_shared ( const struct volume_group * vg )
{
return ( vg - > lock_type & & is_lockd_type ( vg - > lock_type ) ) ;
}
2016-03-01 15:27:21 +01:00
int vg_strip_outdated_historical_lvs ( struct volume_group * vg ) {
struct glv_list * glvl , * tglvl ;
time_t current_time = time ( NULL ) ;
2016-03-01 15:29:27 +01:00
uint64_t threshold = find_config_tree_int ( vg - > cmd , metadata_lvs_history_retention_time_CFG , NULL ) ;
2016-03-01 15:27:21 +01:00
if ( ! threshold )
return 1 ;
dm_list_iterate_items_safe ( glvl , tglvl , & vg - > historical_lvs ) {
/*
* Removal time in the future ? Not likely ,
* but skip this item in any case .
*/
2017-02-12 18:18:54 +01:00
if ( current_time < ( time_t ) glvl - > glv - > historical - > timestamp_removed )
2016-03-01 15:27:21 +01:00
continue ;
if ( ( current_time - glvl - > glv - > historical - > timestamp_removed ) > threshold ) {
if ( ! historical_glv_remove ( glvl - > glv ) ) {
log_error ( " Failed to destroy record about historical LV %s/%s. " ,
vg - > name , glvl - > glv - > historical - > name ) ;
return 0 ;
}
log_verbose ( " Outdated record for historical logical volume \" %s \" "
" automatically destroyed. " , glvl - > glv - > historical - > name ) ;
}
}
return 1 ;
}
2018-08-27 14:53:09 -05:00
int lv_on_pmem ( struct logical_volume * lv )
{
struct lv_segment * seg ;
struct physical_volume * pv ;
uint32_t s ;
int pmem_devs = 0 , other_devs = 0 ;
dm_list_iterate_items ( seg , & lv - > segments ) {
for ( s = 0 ; s < seg - > area_count ; s + + ) {
2021-06-02 10:51:12 -05:00
if ( seg_type ( seg , s ) ! = AREA_PV )
continue ;
2018-08-27 14:53:09 -05:00
pv = seg_pv ( seg , s ) ;
2021-02-07 15:03:13 +01:00
if ( dev_is_pmem ( lv - > vg - > cmd - > dev_types , pv - > dev ) ) {
log_debug ( " LV %s dev %s is pmem. " , display_lvname ( lv ) , dev_name ( pv - > dev ) ) ;
2018-08-27 14:53:09 -05:00
pmem_devs + + ;
} else {
2021-02-07 15:03:13 +01:00
log_debug ( " LV %s dev %s not pmem. " , display_lvname ( lv ) , dev_name ( pv - > dev ) ) ;
2018-08-27 14:53:09 -05:00
other_devs + + ;
}
}
}
if ( pmem_devs & & other_devs ) {
log_error ( " Invalid mix of cache device types in %s. " , display_lvname ( lv ) ) ;
return - 1 ;
}
if ( pmem_devs ) {
2021-02-07 15:03:13 +01:00
log_debug ( " LV %s on pmem " , display_lvname ( lv ) ) ;
2018-08-27 14:53:09 -05:00
return 1 ;
}
return 0 ;
}
2019-03-05 15:19:05 -06:00
int vg_is_foreign ( struct volume_group * vg )
{
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
return vg - > cmd - > system_id & & strcmp ( vg - > system_id , vg - > cmd - > system_id ) ;
2019-03-05 15:19:05 -06:00
}
2019-02-06 13:39:41 -06:00
void vg_write_commit_bad_mdas ( struct cmd_context * cmd , struct volume_group * vg )
{
2021-10-01 14:25:59 +02:00
char vgid [ ID_LEN + 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
DM_LIST_INIT ( bad_mda_list ) ;
2019-02-06 13:39:41 -06:00
struct mda_list * mdal ;
struct metadata_area * mda ;
struct device * dev ;
2021-10-01 14:25:59 +02:00
vgid [ ID_LEN ] = 0 ;
2021-08-03 15:32:33 -05:00
memcpy ( vgid , & vg - > id . uuid , ID_LEN ) ;
lvmcache_get_bad_mdas ( cmd , vg - > name , vgid , & bad_mda_list ) ;
2019-02-06 13:39:41 -06:00
dm_list_iterate_items ( mdal , & bad_mda_list ) {
mda = mdal - > mda ;
dev = mda_get_device ( mda ) ;
/*
* bad_fields :
*
* 0 : shouldn ' t happen
*
* READ | INTERNAL : there ' s probably nothing wrong on disk
*
* MAGIC | START : there ' s a good chance that we were
* reading the mda_header from the wrong location ; maybe
* the pv_header location was wrong . We don ' t want to
* write new metadata to the wrong location . To handle
* this we would want to do some further verification that
* we have the mda location correct .
*
* VERSION | CHECKSUM : when the others are correct these
* look safe to repair .
*
* HEADER : general error related to header , covered by fields
* above .
*
* TEXT : general error related to text metadata , we can repair .
2019-10-08 14:44:24 -05:00
*
* MISMATCH : different values between instances of metadata ,
* can repair .
2019-02-06 13:39:41 -06:00
*/
if ( ! mda - > bad_fields | |
( mda - > bad_fields & BAD_MDA_READ ) | |
( mda - > bad_fields & BAD_MDA_INTERNAL ) | |
( mda - > bad_fields & BAD_MDA_MAGIC ) | |
( mda - > bad_fields & BAD_MDA_START ) ) {
log_warn ( " WARNING: not repairing bad metadata (0x%x) for mda%d on %s " ,
mda - > bad_fields , mda - > mda_num , dev_name ( dev ) ) ;
continue ;
}
/*
* vg_write / vg_commit reread the mda_header which checks the
* mda header fields and fails if any are bad , which stops
* vg_write / vg_commit from continuing . Suppress these header
* field checks when we know the field is bad and we are going
* to replace it . FIXME : do vg_write / vg_commit really need to
* reread and recheck the mda_header again ( probably not ) ?
*/
if ( mda - > bad_fields & BAD_MDA_CHECKSUM )
mda - > ignore_bad_fields | = BAD_MDA_CHECKSUM ;
if ( mda - > bad_fields & BAD_MDA_VERSION )
mda - > ignore_bad_fields | = BAD_MDA_VERSION ;
log_warn ( " WARNING: repairing bad metadata (0x%x) in mda%d at %llu on %s. " ,
mda - > bad_fields , mda - > mda_num , ( unsigned long long ) mda - > header_start , dev_name ( dev ) ) ;
if ( ! mda - > ops - > vg_write ( vg - > fid , vg , mda ) ) {
log_warn ( " WARNING: failed to write VG %s metadata to bad mda%d at %llu on %s. " ,
vg - > name , mda - > mda_num , ( unsigned long long ) mda - > header_start , dev_name ( dev ) ) ;
continue ;
}
if ( ! mda - > ops - > vg_precommit ( vg - > fid , vg , mda ) ) {
log_warn ( " WARNING: failed to precommit VG %s metadata to bad mda%d at %llu on %s. " ,
vg - > name , mda - > mda_num , ( unsigned long long ) mda - > header_start , dev_name ( dev ) ) ;
continue ;
}
if ( ! mda - > ops - > vg_commit ( vg - > fid , vg , mda ) ) {
log_warn ( " WARNING: failed to commit VG %s metadata to bad mda%d at %llu on %s. " ,
vg - > name , mda - > mda_num , ( unsigned long long ) mda - > header_start , dev_name ( dev ) ) ;
continue ;
}
}
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
2019-11-26 11:56:51 -06:00
/*
* Reread an mda_header . If the text offset is the same as was seen and saved
* by label scan , it means the metadata is unchanged and we do not need to
* reread metadata .
*/
lvresize: add new options and defaults for fs handling
The new option "--fs String" for lvresize/lvreduce/lvextend
controls the handling of file systems before/after resizing
the LV. --resizefs is the same as --fs resize.
The new option "--fsmode String" can be used to control
mounting and unmounting of the fs during resizing.
Possible --fs values:
checksize
Only applies to reducing size; does nothing for extend.
Check the fs size and reduce the LV if the fs is not using
the affected space, i.e. the fs does not need to be shrunk.
Fail the command without reducing the fs or LV if the fs is
using the affected space.
resize
Resize the fs using the fs-specific resize command.
This may include mounting, unmounting, or running fsck.
See --fsmode to control mounting behavior, and --nofsck to
disable fsck.
resize_fsadm
Use the old method of calling fsadm to handle the fs
(deprecated.) Warning: this option does not prevent lvreduce
from destroying file systems that are unmounted (or mounted
if prompts are skipped.)
ignore
Resize the LV without checking for or handling a file system.
Warning: using ignore when reducing the LV size may destroy the
file system.
Possible --fsmode values:
manage
Mount or unmount the fs as needed to resize the fs,
and attempt to restore the original mount state at the end.
nochange
Do not mount or unmount the fs. If mounting or unmounting
is required to resize the fs, then do not resize the fs or
the LV and fail the command.
offline
Unmount the fs if it is mounted, and resize the fs while it
is unmounted. If mounting is required to resize the fs,
then do not resize the fs or the LV and fail the command.
Notes on lvreduce:
When no --fs or --resizefs option is specified:
. lvextend default behavior is fs ignore.
. lvreduce default behavior is fs checksize
(includes activating the LV.)
With the exception of --fs resize_fsadm|ignore, lvreduce requires
the recent libblkid fields FSLASTBLOCK and FSBLOCKSIZE.
FSLASTBLOCK*FSBLOCKSIZE is the last byte used by the fs on the LV,
which determines if reducing the fs is necessary.
2022-06-14 15:20:21 -05:00
bool scan_text_mismatch ( struct cmd_context * cmd , const char * vgname , const char * vgid )
2019-11-26 11:56:51 -06:00
{
2021-10-01 14:25:59 +02:00
DM_LIST_INIT ( mda_list ) ;
2019-11-26 11:56:51 -06:00
struct mda_list * mdal , * safe ;
struct metadata_area * mda ;
struct mda_context * mdac ;
struct device_area * area ;
struct mda_header * mdah ;
struct raw_locn * rlocn ;
struct device * dev ;
uint32_t bad_fields ;
2019-11-27 09:26:04 -06:00
bool ret = true ;
2019-11-26 11:56:51 -06:00
/*
* if cmd - > can_use_one_scan , check one mda_header is unchanged ,
* else check that all mda_headers are unchanged .
*/
lvmcache_get_mdas ( cmd , vgname , vgid , & mda_list ) ;
dm_list_iterate_items ( mdal , & mda_list ) {
mda = mdal - > mda ;
if ( ! mda - > scan_text_offset )
continue ;
if ( mda - > mda_num ! = 1 )
continue ;
if ( ! ( dev = mda_get_device ( mda ) ) ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text mismatch - no mda dev. " ) ;
2019-11-26 11:56:51 -06:00
goto out ;
}
bad_fields = 0 ;
mdac = mda - > metadata_locn ;
area = & mdac - > area ;
/*
* Invalidate mda_header in bcache so it will be reread from disk .
*/
if ( ! dev_invalidate_bytes ( dev , 4096 , 512 ) ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text mismatch - cannot invalidate. " ) ;
2019-11-26 11:56:51 -06:00
goto out ;
}
if ( ! ( mdah = raw_read_mda_header ( cmd - > fmt , area , 1 , 0 , & bad_fields ) ) ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text mismatch - no mda header. " ) ;
2019-11-26 11:56:51 -06:00
goto out ;
}
rlocn = mdah - > raw_locns ;
if ( bad_fields ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text mismatch - bad_fields. " ) ;
2019-11-26 11:56:51 -06:00
} else if ( rlocn - > checksum ! = mda - > scan_text_checksum ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text checksum mismatch - now %x prev %x. " ,
2019-11-26 11:56:51 -06:00
rlocn - > checksum , mda - > scan_text_checksum ) ;
} else if ( rlocn - > offset ! = mda - > scan_text_offset ) {
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text offset mismatch - now %llu prev %llu. " ,
2019-11-26 11:56:51 -06:00
( unsigned long long ) rlocn - > offset ,
( unsigned long long ) mda - > scan_text_offset ) ;
} else {
2019-11-27 09:26:04 -06:00
/* the common case where fields match and no rescan needed */
ret = false ;
2019-11-26 11:56:51 -06:00
}
dm_pool_free ( cmd - > mem , mdah ) ;
/* For can_use_one_scan commands, return result from checking one mda. */
if ( cmd - > can_use_one_scan )
goto out ;
/* For other commands, return mismatch immediately. */
if ( ret )
2021-03-12 22:30:29 +01:00
goto out ;
2019-11-26 11:56:51 -06:00
}
if ( ret ) {
/* shouldn't happen */
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan for text mismatch - no mdas. " ) ;
2019-11-26 11:56:51 -06:00
goto out ;
}
out :
if ( ! ret )
2021-03-16 14:16:36 +01:00
log_debug ( " Rescan skipped - unchanged offset %llu checksum %x. " ,
2019-11-27 09:26:04 -06:00
( unsigned long long ) mda - > scan_text_offset ,
mda - > scan_text_checksum ) ;
2019-11-26 11:56:51 -06:00
dm_list_iterate_items_safe ( mdal , safe , & mda_list ) {
dm_list_del ( & mdal - > list ) ;
free ( mdal ) ;
}
return ret ;
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
static struct volume_group * _vg_read ( struct cmd_context * cmd ,
const char * vgname ,
const char * vgid ,
2019-06-11 16:17:24 -05:00
unsigned precommitted ,
int writing )
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
{
const struct format_type * fmt = cmd - > fmt ;
struct format_instance * fid = NULL ;
struct format_instance_ctx fic ;
struct volume_group * vg , * vg_ret = NULL ;
struct metadata_area * mda , * mda2 ;
unsigned use_precommitted = precommitted ;
2023-02-02 16:15:13 -06:00
struct device * mda_dev , * dev_ret = NULL ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
struct cached_vg_fmtdata * vg_fmtdata = NULL ; /* Additional format-specific data about the vg */
int found_old_metadata = 0 ;
unsigned use_previous_vg ;
log_debug_metadata ( " Reading VG %s %s " , vgname ? : " <no name> " , vgid ? : " <no vgid> " ) ;
2020-09-18 14:42:23 -05:00
/*
* Devices are generally open readonly from scanning , and we need to
* reopen them rw to update metadata . We want to reopen them rw before
* before rescanning and / or writing . Reopening rw preserves the existing
* bcache blocks for the devs .
*/
if ( writing )
lvmcache_label_reopen_vg_rw ( cmd , vgname , vgid ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
/*
* Rescan the devices that are associated with this vg in lvmcache .
* This repeats what was done by the command ' s initial label scan ,
* but only the devices associated with this VG .
*
* The lvmcache info about these devs is from the initial label scan
* performed by the command before the vg lock was held . Now the VG
* lock is held , so we rescan all the info from the devs in case
* something changed between the initial scan and now that the lock
* is held .
*
* Some commands ( e . g . reporting ) are fine reporting data read by
* the label scan . It doesn ' t matter if the devs changed between
* the label scan and here , we can report what was seen in the
* scan , even though it is the old state , since we will not be
* making any modifications . If the VG was being modified during
* the scan , and caused us to see inconsistent metadata on the
* different PVs in the VG , then we do want to rescan the devs
* here to get a consistent view of the VG . Note that we don ' t
* know if the scan found all the PVs in the VG at this point .
* We don ' t know that until vg_read looks at the list of PVs in
* the metadata and compares that to the devices found by the scan .
*
* It ' s possible that a change made to the VG during scan was
* adding or removing a PV from the VG . In this case , the list
* of devices associated with the VG in lvmcache would change
* due to the rescan .
*
* The devs in the VG may be persistently inconsistent due to some
* previous problem . In this case , rescanning the labels here will
* find the same inconsistency . The VG repair ( mistakenly done by
* vg_read below ) is supposed to fix that .
*
2019-11-26 11:56:51 -06:00
* If the VG was not modified between the time we scanned the PVs
* and now , when we hold the lock , then we don ' t need to rescan .
* We can read the mda_header , and look at the text offset / checksum ,
* and if the current text offset / checksum matches what was seen during
* label scan , we know that metadata is unchanged and doesn ' t need
* to be rescanned . For reporting / display commands ( CAN_USE_ONE_SCAN /
* can_use_one_scan ) , we check that the text offset / checksum are unchanged
* in just one mda before deciding to skip rescanning . For other commands ,
* we check that they are unchanged in all mdas . This added checking is
* probably unnecessary ; all commands could likely just check a single mda .
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
*/
bcache: use indirection table for fd
Add a "device index" (di) for each device, and use this
in the bcache api to the rest of lvm. This replaces the
file descriptor (fd) in the api. The rest of lvm uses
new functions bcache_set_fd(), bcache_clear_fd(), and
bcache_change_fd() to control which fd bcache uses for
io to a particular device.
. lvm opens a dev and gets and fd.
fd = open(dev);
. lvm passes fd to the bcache layer and gets a di
to use in the bcache api for the dev.
di = bcache_set_fd(fd);
. lvm uses bcache functions, passing di for the dev.
bcache_write_bytes(di, ...), etc.
. bcache translates di to fd to do io.
. lvm closes the device and clears the di/fd bcache state.
close(fd);
bcache_clear_fd(di);
In the bcache layer, a di-to-fd translation table
(int *_fd_table) is added. When bcache needs to
perform io on a di, it uses _fd_table[di].
In the following commit, lvm will make use of the new
bcache_change_fd() function to change the fd that
bcache uses for the dev, without dropping cached blocks.
2020-09-17 09:40:18 -05:00
lvresize: add new options and defaults for fs handling
The new option "--fs String" for lvresize/lvreduce/lvextend
controls the handling of file systems before/after resizing
the LV. --resizefs is the same as --fs resize.
The new option "--fsmode String" can be used to control
mounting and unmounting of the fs during resizing.
Possible --fs values:
checksize
Only applies to reducing size; does nothing for extend.
Check the fs size and reduce the LV if the fs is not using
the affected space, i.e. the fs does not need to be shrunk.
Fail the command without reducing the fs or LV if the fs is
using the affected space.
resize
Resize the fs using the fs-specific resize command.
This may include mounting, unmounting, or running fsck.
See --fsmode to control mounting behavior, and --nofsck to
disable fsck.
resize_fsadm
Use the old method of calling fsadm to handle the fs
(deprecated.) Warning: this option does not prevent lvreduce
from destroying file systems that are unmounted (or mounted
if prompts are skipped.)
ignore
Resize the LV without checking for or handling a file system.
Warning: using ignore when reducing the LV size may destroy the
file system.
Possible --fsmode values:
manage
Mount or unmount the fs as needed to resize the fs,
and attempt to restore the original mount state at the end.
nochange
Do not mount or unmount the fs. If mounting or unmounting
is required to resize the fs, then do not resize the fs or
the LV and fail the command.
offline
Unmount the fs if it is mounted, and resize the fs while it
is unmounted. If mounting is required to resize the fs,
then do not resize the fs or the LV and fail the command.
Notes on lvreduce:
When no --fs or --resizefs option is specified:
. lvextend default behavior is fs ignore.
. lvreduce default behavior is fs checksize
(includes activating the LV.)
With the exception of --fs resize_fsadm|ignore, lvreduce requires
the recent libblkid fields FSLASTBLOCK and FSBLOCKSIZE.
FSLASTBLOCK*FSBLOCKSIZE is the last byte used by the fs on the LV,
which determines if reducing the fs is necessary.
2022-06-14 15:20:21 -05:00
if ( lvmcache_scan_mismatch ( cmd , vgname , vgid ) | | scan_text_mismatch ( cmd , vgname , vgid ) ) {
2019-06-11 16:17:24 -05:00
log_debug_metadata ( " Rescanning devices for %s %s " , vgname , writing ? " rw " : " " ) ;
if ( writing )
lvmcache_label_rescan_vg_rw ( cmd , vgname , vgid ) ;
else
lvmcache_label_rescan_vg ( cmd , vgname , vgid ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
/*
* A " format instance " is an abstraction for a VG location ,
* i . e . where a VG ' s metadata exists on disk .
*
* An fic ( format_instance_ctx ) is a temporary struct used
* to create an fid ( format_instance ) . The fid hangs around
* and is used to create a ' vg ' to which it connected ( vg - > fid ) .
*
* The ' fic ' describes a VG in terms of fmt / name / id .
*
* The ' fid ' describes a VG in more detail than the fic ,
* holding information about where to find the VG metadata .
*
* The ' vg ' describes the VG in the most detail representing
* all the VG metadata .
*
* The fic and fid are set up by create_instance ( ) to describe
* the VG location . This happens before the VG metadata is
* assembled into the more familiar struct volume_group " vg " .
*
* The fid has one main purpose : to keep track of the metadata
* locations for a given VG . It does this by putting ' mda '
* structs on fid - > metadata_areas_in_use , which specify where
* metadata is located on disk . It gets this information
* ( metadata locations for a specific VG ) from the command ' s
* initial label scan . The info is passed indirectly via
* lvmcache info / vginfo structs , which are created by the
* label scan and then copied into fid by create_instance ( ) .
*
* FIXME : just use the vginfo / info - > mdas lists directly instead
* of copying them into the fid list .
*/
fic . type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS ;
fic . context . vg_ref . vg_name = vgname ;
fic . context . vg_ref . vg_id = vgid ;
/*
* Sets up the metadata areas that we need to read below .
* For each info in vginfo - > infos , for each mda in info - > mdas ,
* ( found during label_scan ) , copy the mda to fid - > metadata_areas_in_use
*/
if ( ! ( fid = fmt - > ops - > create_instance ( fmt , & fic ) ) ) {
log_error ( " Failed to create format instance " ) ;
return NULL ;
}
/*
* We use the fid globally here so prevent the release_vg
* call to destroy the fid - we may want to reuse it !
*/
fid - > ref_count + + ;
/*
* label_scan found PVs for this VG and set up lvmcache to describe the
* VG / PVs that we use here to read the VG . It created ' vginfo ' for the
* VG , and created an ' info ' attached to vginfo for each PV . It also
* added a metadata_area struct to info - > mdas for each metadata area it
* found on the PV . The info - > mdas structs are copied to
* fid - > metadata_areas_in_use by create_instance above , and here we
* read VG metadata from each of those mdas .
*/
2021-09-28 14:58:03 -05:00
dm_list_iterate_items_safe ( mda , mda2 , & fid - > metadata_areas_in_use ) {
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
mda_dev = mda_get_device ( mda ) ;
/* I don't think this can happen */
if ( ! mda_dev ) {
log_warn ( " Ignoring metadata for VG %s from missing dev. " , vgname ) ;
continue ;
}
use_previous_vg = 0 ;
if ( use_precommitted ) {
log_debug_metadata ( " Reading VG %s precommit metadata from %s %llu " ,
vgname , dev_name ( mda_dev ) , ( unsigned long long ) mda - > header_start ) ;
2020-01-28 10:33:15 -06:00
vg = mda - > ops - > vg_read_precommit ( cmd , fid , vgname , mda , & vg_fmtdata , & use_previous_vg ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( ! vg & & ! use_previous_vg ) {
log_warn ( " WARNING: Reading VG %s precommit on %s failed. " , vgname , dev_name ( mda_dev ) ) ;
vg_fmtdata = NULL ;
continue ;
}
} else {
log_debug_metadata ( " Reading VG %s metadata from %s %llu " ,
vgname , dev_name ( mda_dev ) , ( unsigned long long ) mda - > header_start ) ;
2020-01-28 10:33:15 -06:00
vg = mda - > ops - > vg_read ( cmd , fid , vgname , mda , & vg_fmtdata , & use_previous_vg ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( ! vg & & ! use_previous_vg ) {
log_warn ( " WARNING: Reading VG %s on %s failed. " , vgname , dev_name ( mda_dev ) ) ;
vg_fmtdata = NULL ;
continue ;
}
}
if ( ! vg )
continue ;
2021-03-16 14:16:36 +01:00
if ( ! vg_ret ) {
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
vg_ret = vg ;
dev_ret = mda_dev ;
continue ;
}
/*
* Use the newest copy of the metadata found on any mdas .
* Above , We could check if the scan found an old metadata
* seqno in this mda and just skip reading it again ; then these
* seqno checks would just be sanity checks .
*/
if ( vg - > seqno = = vg_ret - > seqno ) {
release_vg ( vg ) ;
2019-11-28 15:09:27 +01:00
} else if ( vg - > seqno > vg_ret - > seqno ) {
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
log_warn ( " WARNING: ignoring metadata seqno %u on %s for seqno %u on %s for VG %s. " ,
vg_ret - > seqno , dev_name ( dev_ret ) ,
vg - > seqno , dev_name ( mda_dev ) , vg - > name ) ;
found_old_metadata = 1 ;
release_vg ( vg_ret ) ;
vg_ret = vg ;
dev_ret = mda_dev ;
vg_fmtdata = NULL ;
2019-11-28 15:09:27 +01:00
} else { /* vg->seqno < vg_ret->seqno */
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
log_warn ( " WARNING: ignoring metadata seqno %u on %s for seqno %u on %s for VG %s. " ,
vg - > seqno , dev_name ( mda_dev ) ,
vg_ret - > seqno , dev_name ( dev_ret ) , vg - > name ) ;
found_old_metadata = 1 ;
release_vg ( vg ) ;
vg_fmtdata = NULL ;
}
}
2020-06-03 12:38:27 -05:00
if ( found_old_metadata ) {
2019-11-28 13:19:44 +01:00
log_warn ( " WARNING: Inconsistent metadata found for VG %s. " , vgname ) ;
2020-06-03 12:38:27 -05:00
log_warn ( " See vgck --updatemetadata to correct inconsistency. " ) ;
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
vg = NULL ;
if ( vg_ret )
2021-02-05 16:16:03 -06:00
set_pv_devices ( fid , vg_ret ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
fid - > ref_count - - ;
if ( ! vg_ret ) {
_destroy_fid ( & fid ) ;
goto_out ;
}
/*
* Correct the lvmcache representation of the VG using the metadata
* that we have chosen above ( vg_ret ) .
*
* The vginfo / info representation created by label_scan was not
* entirely correct since it did not use the full or final metadata .
*
* In lvmcache , PVs with no mdas were not attached to the vginfo during
* label_scan because label_scan didn ' t know where they should go . Now
* that we have the VG metadata we can tell , so use that to attach those
* info ' s to the vginfo .
*
* Also , outdated PVs that have been removed from the VG were incorrectly
* attached to the vginfo during label_scan , and now need to be detached .
*/
lvmcache_update_vg_from_read ( vg_ret , vg_ret - > status & PRECOMMITTED ) ;
/*
* lvmcache_update_vg identified outdated mdas that we read above that
* are not actually part of the VG . Remove those outdated mdas from
* the fid ' s list of mdas .
*/
dm_list_iterate_items_safe ( mda , mda2 , & fid - > metadata_areas_in_use ) {
mda_dev = mda_get_device ( mda ) ;
if ( lvmcache_is_outdated_dev ( cmd , vg_ret - > name , ( const char * ) & vg_ret - > id , mda_dev ) ) {
log_debug_metadata ( " vg_read %s ignore mda for outdated dev %s " ,
vg_ret - > name , dev_name ( mda_dev ) ) ;
dm_list_del ( & mda - > list ) ;
}
}
out :
return vg_ret ;
}
struct volume_group * vg_read ( struct cmd_context * cmd , const char * vg_name , const char * vgid ,
2019-10-21 12:32:11 +02:00
uint32_t vg_read_flags , uint32_t lockd_state ,
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
uint32_t * error_flags , struct volume_group * * error_vg )
{
char uuidstr [ 64 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
struct volume_group * vg = NULL ;
struct lv_list * lvl ;
struct pv_list * pvl ;
int missing_pv_dev = 0 ;
int missing_pv_flag = 0 ;
uint32_t failure = 0 ;
2020-01-28 11:47:37 -06:00
int original_vgid_set = vgid ? 1 : 0 ;
2019-10-21 12:32:11 +02:00
int writing = ( vg_read_flags & READ_FOR_UPDATE ) ;
int activating = ( vg_read_flags & READ_FOR_ACTIVATE ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
2021-10-01 13:43:46 +02:00
* error_flags = SUCCESS ;
if ( error_vg )
* error_vg = NULL ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( is_orphan_vg ( vg_name ) ) {
2019-11-28 13:19:44 +01:00
log_very_verbose ( " Reading orphan VG %s. " , vg_name ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
vg = vg_read_orphans ( cmd , vg_name ) ;
return vg ;
}
if ( ! validate_name ( vg_name ) ) {
log_error ( " Volume group name \" %s \" has invalid characters. " , vg_name ) ;
2019-08-29 11:35:46 -05:00
failure | = FAILED_NOTFOUND ;
2019-11-28 13:19:44 +01:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
2019-06-11 16:17:24 -05:00
/*
* When a command is reading the VG with the intention of eventually
* writing it , it passes the READ_FOR_UPDATE flag . This causes vg_read
* to acquire an exclusive VG lock , and causes vg_read to do some more
* checks , e . g . that the VG is writable and not exported . It also
* means that when the label scan is repeated on the VG ' s devices , the
* VG ' s PVs can be reopened read - write when rescanning in anticipation
* of needing to write to them .
*/
2019-10-21 12:32:11 +02:00
if ( ! ( vg_read_flags & READ_WITHOUT_LOCK ) & &
2019-08-26 17:07:18 -05:00
! lock_vol ( cmd , vg_name , ( writing | | activating ) ? LCK_VG_WRITE : LCK_VG_READ , NULL ) ) {
2019-11-28 13:19:44 +01:00
log_error ( " Can't get lock for %s. " , vg_name ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
failure | = FAILED_LOCKING ;
2019-11-28 13:19:44 +01:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
2024-08-29 23:05:41 +02:00
/* I believe this is unused, the name is always set. */
2020-01-28 11:47:37 -06:00
if ( ! vg_name & & ! ( vg_name = lvmcache_vgname_from_vgid ( cmd - > mem , vgid ) ) ) {
unlock_vg ( cmd , NULL , vg_name ) ;
log_error ( " VG name not found for vgid %s " , vgid ) ;
failure | = FAILED_NOTFOUND ;
2021-10-01 13:44:17 +02:00
goto bad ;
2020-01-28 11:47:37 -06:00
}
/*
* If the command is process all vgs , process_each will get a list of vgname + vgid
2024-08-29 23:05:41 +02:00
* pairs , and then call vg_read ( ) for each vgname + vgid . In this case we know
2020-01-28 11:47:37 -06:00
* which VG to read even if there are duplicate names , and we don ' t fail .
*
* If the user has requested one VG by name , process_each passes only the vgname
* to vg_read ( ) , and we look up the vgid from lvmcache . lvmcache finds duplicate
* vgnames , doesn ' t know which is intended , returns a NULL vgid , and we fail .
*/
if ( ! vgid )
vgid = lvmcache_vgid_from_vgname ( cmd , vg_name ) ;
if ( ! vgid ) {
unlock_vg ( cmd , NULL , vg_name ) ;
/* Some callers don't care if the VG doesn't exist and don't want an error message. */
if ( ! ( vg_read_flags & READ_OK_NOTFOUND ) )
log_error ( " Volume group \" %s \" not found " , vg_name ) ;
failure | = FAILED_NOTFOUND ;
2021-10-01 13:44:17 +02:00
goto bad ;
2020-01-28 11:47:37 -06:00
}
/*
* vgchange - ay ( no vgname arg ) will activate multiple local VGs with the same
* name , but if the vgs have the same lv name , activating those lvs will fail .
*/
if ( activating & & original_vgid_set & & lvmcache_has_duplicate_local_vgname ( vgid , vg_name ) )
log_warn ( " WARNING: activating multiple VGs with the same name is dangerous and may fail. " ) ;
2019-06-11 16:17:24 -05:00
if ( ! ( vg = _vg_read ( cmd , vg_name , vgid , 0 , writing ) ) ) {
2020-01-28 11:47:37 -06:00
unlock_vg ( cmd , NULL , vg_name ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
/* Some callers don't care if the VG doesn't exist and don't want an error message. */
2019-10-21 12:32:11 +02:00
if ( ! ( vg_read_flags & READ_OK_NOTFOUND ) )
2019-11-28 13:19:44 +01:00
log_error ( " Volume group \" %s \" not found. " , vg_name ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
failure | = FAILED_NOTFOUND ;
2021-10-01 13:44:17 +02:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
/*
* Check and warn if PV ext info is not in sync with VG metadata
* ( vg_write fixes . )
*/
_check_pv_ext ( cmd , vg ) ;
if ( ! vg_strip_outdated_historical_lvs ( vg ) )
log_warn ( " WARNING: failed to strip outdated historical lvs. " ) ;
/*
* Check for missing devices in the VG . In most cases a VG cannot be
* changed while it ' s missing devices . This restriction is implemented
* here in vg_read . Below we return an error from vg_read if the
* vg_read flag indicates that the command is going to modify the VG .
* ( We should probably implement this restriction elsewhere instead of
* returning an error from vg_read . )
*
* The PV ' s device may be present while the PV for the device has the
* MISSING_PV flag set in the metadata . This happened because the VG
* was written while this dev was missing , so the MISSING flag was
* written in the metadata for PV . Now the device has reappeared .
* However , the VG has changed since the device was last present , and
* if the device has outdated data it may not be safe to just start
* using it again .
*
* If there were no PE ' s used on the PV , we can just clear the MISSING
* flag , but if there were PE ' s used we need to continue to treat the
* PV as if the device is missing , limiting operations like the VG has
* a missing device , and requiring the user to remove the reappeared
* device from the VG , like a missing device , with vgreduce
* - - removemissing .
*/
dm_list_iterate_items ( pvl , & vg - > pvs ) {
if ( ! id_write_format ( & pvl - > pv - > id , uuidstr , sizeof ( uuidstr ) ) )
uuidstr [ 0 ] = ' \0 ' ;
if ( ! pvl - > pv - > dev ) {
/* The obvious and common case of a missing device. */
2021-11-03 09:50:11 -05:00
if ( ( vg_is_foreign ( vg ) & & ! cmd - > include_foreign_vgs ) | | cmd - > expect_missing_vg_device )
2019-12-11 12:56:15 -06:00
log_debug ( " VG %s is missing PV %s (last written to %s) " , vg_name , uuidstr , pvl - > pv - > device_hint ? : " na " ) ;
else if ( pvl - > pv - > device_hint )
2019-09-04 14:13:14 -05:00
log_warn ( " WARNING: VG %s is missing PV %s (last written to %s). " , vg_name , uuidstr , pvl - > pv - > device_hint ) ;
else
log_warn ( " WARNING: VG %s is missing PV %s. " , vg_name , uuidstr ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
missing_pv_dev + + ;
} else if ( pvl - > pv - > status & MISSING_PV ) {
/* A device that was missing but has reappeared. */
if ( pvl - > pv - > pe_alloc_count = = 0 ) {
log_warn ( " WARNING: VG %s has unused reappeared PV %s %s. " , vg_name , dev_name ( pvl - > pv - > dev ) , uuidstr ) ;
pvl - > pv - > status & = ~ MISSING_PV ;
/* tell vgextend restoremissing that MISSING flag was cleared here */
pvl - > pv - > unused_missing_cleared = 1 ;
} else {
log_warn ( " WARNING: VG %s was missing PV %s %s. " , vg_name , dev_name ( pvl - > pv - > dev ) , uuidstr ) ;
missing_pv_flag + + ;
}
}
}
if ( missing_pv_dev | | missing_pv_flag )
vg_mark_partial_lvs ( vg , 1 ) ;
if ( ! check_pv_segments ( vg ) ) {
log_error ( INTERNAL_ERROR " PV segments corrupted in %s. " , vg - > name ) ;
failure | = FAILED_INTERNAL_ERROR ;
2019-11-28 13:19:44 +01:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
dm_list_iterate_items ( lvl , & vg - > lvs ) {
/* Checks that cross-reference other LVs. */
2024-10-19 00:05:45 +02:00
if ( ! check_lv_segments_complete_vg ( lvl - > lv ) ) {
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
log_error ( INTERNAL_ERROR " LV segments corrupted in %s. " , lvl - > lv - > name ) ;
failure | = FAILED_INTERNAL_ERROR ;
2019-11-28 13:19:44 +01:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
}
if ( ! check_pv_dev_sizes ( vg ) )
log_warn ( " WARNING: One or more devices used as PVs in VG %s have changed sizes. " , vg - > name ) ;
2021-07-01 17:25:43 -05:00
if ( cmd - > check_devs_used )
2024-07-08 15:32:41 -05:00
_check_devs_used_correspond_with_vg ( vg ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( ! _access_vg_lock_type ( cmd , vg , lockd_state , & failure ) ) {
/* Either FAILED_LOCK_TYPE or FAILED_LOCK_MODE were set. */
goto_bad ;
}
if ( ! _access_vg_systemid ( cmd , vg ) ) {
failure | = FAILED_SYSTEMID ;
goto_bad ;
}
if ( ! _access_vg_clustered ( cmd , vg ) ) {
failure | = FAILED_CLUSTERED ;
goto_bad ;
}
2019-06-21 13:37:11 -05:00
if ( ! _access_vg_exported ( cmd , vg ) ) {
failure | = FAILED_EXPORTED ;
goto_bad ;
}
2019-06-11 16:17:24 -05:00
/*
* If the command intends to write or activate the VG , there are
* additional restrictions . FIXME : These restrictions should
* probably be checked / applied after vg_read returns .
*/
if ( writing | | activating ) {
if ( ! ( vg - > status & LVM_WRITE ) ) {
2019-11-28 13:19:44 +01:00
log_error ( " Volume group %s is read-only. " , vg - > name ) ;
2019-06-11 16:17:24 -05:00
failure | = FAILED_READ_ONLY ;
2019-11-28 13:19:44 +01:00
goto bad ;
2019-06-11 16:17:24 -05:00
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
2019-06-11 16:17:24 -05:00
if ( ! cmd - > handles_missing_pvs & & ( missing_pv_dev | | missing_pv_flag ) ) {
log_error ( " Cannot change VG %s while PVs are missing. " , vg - > name ) ;
log_error ( " See vgreduce --removemissing and vgextend --restoremissing. " ) ;
failure | = FAILED_NOT_ENABLED ;
2019-11-28 13:19:44 +01:00
goto bad ;
2019-06-11 16:17:24 -05:00
}
2019-10-14 15:51:35 -05:00
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
2019-10-14 15:51:35 -05:00
if ( writing & & ! cmd - > handles_unknown_segments & & vg_has_unknown_segments ( vg ) ) {
log_error ( " Cannot change VG %s with unknown segments in it! " , vg - > name ) ;
failure | = FAILED_NOT_ENABLED ; /* FIXME new failure code here? */
2019-11-28 13:19:44 +01:00
goto bad ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
/*
* When we are reading the VG with the intention of writing it ,
* we save a second copy of the VG in vg - > vg_committed . This
* copy remains unmodified by the command operation , and is used
* later if there is an error and we want to reactivate LVs .
* FIXME : be specific about exactly when this works correctly .
*/
if ( writing ) {
if ( dm_pool_locked ( vg - > vgmem ) ) {
/* FIXME: can this happen? */
2019-11-28 13:19:44 +01:00
log_warn ( " WARNING: vg_read no vg copy: pool locked. " ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
goto out ;
}
if ( vg - > vg_committed ) {
/* FIXME: can this happen? */
2019-11-28 13:19:44 +01:00
log_warn ( " WARNING: vg_read no vg copy: copy exists. " ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
release_vg ( vg - > vg_committed ) ;
vg - > vg_committed = NULL ;
}
if ( vg - > vg_precommitted ) {
/* FIXME: can this happen? */
2019-11-28 13:19:44 +01:00
log_warn ( " WARNING: vg_read no vg copy: pre copy exists. " ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
release_vg ( vg - > vg_precommitted ) ;
vg - > vg_precommitted = NULL ;
}
2021-03-05 23:04:44 +01:00
if ( ! vg - > committed_cft ) {
2021-10-01 13:45:34 +02:00
log_error ( INTERNAL_ERROR " Missing committed config tree. " ) ;
goto out ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
2021-10-01 13:45:34 +02:00
if ( ! ( vg - > vg_committed = import_vg_from_config_tree ( cmd , vg - > fid , vg - > committed_cft ) ) ) {
log_error ( " Failed to import written VG. " ) ;
goto out ;
}
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
} else {
if ( vg - > vg_precommitted )
2020-08-28 19:35:25 +02:00
log_error ( INTERNAL_ERROR " vg_read vg %p vg_precommitted %p " , ( void * ) vg , ( void * ) vg - > vg_precommitted ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( vg - > vg_committed )
2020-08-28 19:35:25 +02:00
log_error ( INTERNAL_ERROR " vg_read vg %p vg_committed %p " , ( void * ) vg , ( void * ) vg - > vg_committed ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
out :
/* We return with the VG lock held when read is successful. */
2021-10-01 13:43:46 +02:00
return vg ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
bad :
* error_flags = failure ;
/*
* FIXME : get rid of this case so we don ' t have to return the vg when
* there ' s an error . It is here for process_each_pv ( ) which wants to
* eliminate the VG ' s devs from the list of devs it is processing , even
* when it can ' t access the VG because of wrong system id or similar .
2024-08-29 23:05:41 +02:00
* This could be done by looking at lvmcache info structs instead of ' vg ' .
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
* It ' s also used by process_each_vg / process_each_lv which want to
* include error_vg values ( like system_id ) in error messages .
* These values could also be found from lvmcache vginfo .
*/
if ( error_vg & & vg ) {
if ( vg - > vg_precommitted )
2020-08-28 19:35:25 +02:00
log_error ( INTERNAL_ERROR " vg_read vg %p vg_precommitted %p " , ( void * ) vg , ( void * ) vg - > vg_precommitted ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
if ( vg - > vg_committed )
2020-08-28 19:35:25 +02:00
log_error ( INTERNAL_ERROR " vg_read vg %p vg_committed %p " , ( void * ) vg , ( void * ) vg - > vg_committed ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
/* caller must unlock_vg and release_vg */
* error_vg = vg ;
2021-10-01 13:44:17 +02:00
return NULL ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
if ( vg ) {
unlock_vg ( cmd , vg , vg_name ) ;
release_vg ( vg ) ;
}
2021-10-01 13:43:46 +02:00
return NULL ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
}
/*
* Simply a version of vg_read ( ) that automatically sets the READ_FOR_UPDATE
* flag , which means the caller intends to write the VG after reading it ,
* so vg_read should acquire an exclusive file lock on the vg .
*/
struct volume_group * vg_read_for_update ( struct cmd_context * cmd , const char * vg_name ,
2019-10-21 12:32:11 +02:00
const char * vgid , uint32_t vg_read_flags , uint32_t lockd_state )
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
{
struct volume_group * vg ;
uint32_t error_flags = 0 ;
2019-10-21 12:32:11 +02:00
vg = vg_read ( cmd , vg_name , vgid , vg_read_flags | READ_FOR_UPDATE , lockd_state , & error_flags , NULL ) ;
improve reading and repairing vg metadata
The fact that vg repair is implemented as a part of vg read
has led to a messy and complicated implementation of vg_read,
and limited and uncontrolled repair capability. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read ignores bad or old copies of metadata
- vg_read proceeds with a single good copy of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- vg_write will do basic repairs
- new command vgck --updatemetdata will do all repairs
Details
-------
- In scan, do not delete dev from lvmcache if reading/processing fails;
the dev is still present, and removing it makes it look like the dev
is not there. Records are now kept about the problems with each PV
so they be fixed/repaired in the appropriate places.
- In scan, record a bad mda on failure, and delete the mda from
mda in use list so it will not be used by vg_read or vg_write,
only by repair.
- In scan, succeed if any good mda on a device is found, instead of
failing if any is bad. The bad/old copies of metadata should not
interfere with normal usage while good copies can be used.
- In scan, add a record of old mdas in lvmcache for later, do not repair
them while reading, and do not let them prevent us from finding and
using a good copy of metadata from elsewhere. One result is that
"inconsistent metadata" is no longer a read error, but instead a
record in lvmcache that can be addressed separate from the read.
- Treat a dev with no good mdas like a dev with no mdas, which is an
existing case we already handle.
- Don't use a fake vg "handle" for returning an error from vg_read,
or the vg_read_error function for getting that error number;
just return null if the vg cannot be read or used, and an error_flags
arg with flags set for the specific kind of error (which can be used
later for determining the kind of repair.)
- Saving an original copy of the vg metadata, for purposes of reverting
a write, is now done explicitly in vg_read instead of being hidden in
the vg_make_handle function.
- When a vg is not accessible due to "access restrictions" but is
otherwise fine, return the vg through the new error_vg arg so that
process_each_pv can skip the PVs in the VG while processing.
(This is a temporary accomodation for the way process_each_pv
tracks which devs have been looked at, and can be dropped later
when process_each_pv implementation dev tracking is changed.)
- vg_read does not try to fix or recover a vg, but now just reads the
metadata, checks access restrictions and returns it.
(Checking access restrictions might be better done outside of vg_read,
but this is a later improvement.)
- _vg_read now simply makes one attempt to read metadata from
each mda, and uses the most recent copy to return to the caller
in the form of a 'vg' struct.
(bad mdas were excluded during the scan and are not retried)
(old mdas were not excluded during scan and are retried here)
- vg_read uses _vg_read to get the latest copy of metadata from mdas,
and then makes various checks against it to produce warnings,
and to check if VG access is allowed (access restrictions include:
writable, foreign, shared, clustered, missing pvs).
- Things that were previously silently/automatically written by vg_read
that are now done by vg_write, based on the records made in lvmcache
during the scan and read:
. clearing the missing flag
. updating old copies of metadata
. clearing outdated pvs
. updating pv header flags
- Bad/corrupt metadata are now repaired; they were not before.
Test changes
------------
- A read command no longer writes the VG to repair it, so add a write
command to do a repair.
(inconsistent-metadata, unlost-pv)
- When a missing PV is removed from a VG, and then the device is
enabled again, vgck --updatemetadata is needed to clear the
outdated PV before it can be used again, where it wasn't before.
(lvconvert-repair-policy, lvconvert-repair-raid, lvconvert-repair,
mirror-vgreduce-removemissing, pv-ext-flags, unlost-pv)
Reading bad/old metadata
------------------------
- "bad metadata": the mda_header or metadata text has invalid fields
or can't be parsed by lvm. This is a form of corruption that would
not be caused by known failure scenarios. A checksum error is
typically included among the errors reported.
- "old metadata": a valid copy of the metadata that has a smaller seqno
than other copies of the metadata. This can happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that has been
removed from the VG is the "outdated" case below.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad metadata in particular is something that users will want to
investigate and repair themselves, since it should not happen and
may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with the metadata repair command.
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the old version. If successful, this will resolve
the old metadata problem (without needing to run a metadata
repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata is written to disk with the MISSING flag on the PV
with the missing device. When the VG is next used, it is treated
as if the PV with the MISSING flag still has a missing device, even
if that device has reappeared.
If all LVs that were using a PV with the MISSING flag are removed
or repaired so that the MISSING PV is no longer used, then the
next time the VG metadata is written, the MISSING flag will be
dropped.
Alternative methods of clearing the MISSING flag are:
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
Bad mda repair
--------------
The new command:
vgck --updatemetadata VG
first uses vg_write to repair old metadata, and other basic
issues mentioned above (old metadata, outdated PVs, pv_header
flags, MISSING_PV flags). It will also go further and repair
bad metadata:
. text metadata that has a bad checksum
. text metadata that is not parsable
. corrupt mda_header checksum and version fields
(To keep a clean diff, #if 0 is added around functions that
are replaced by new code. These commented functions are
removed by the following commit.)
2019-05-24 12:04:37 -05:00
return vg ;
}
pvscan: add options listlvs listvg checkcomplete
pvscan --cache <dev>
. read only dev
. create online file for dev
pvscan --listvg <dev>
. read only dev
. list VG using dev
pvscan --listlvs <dev>
. read only dev
. list VG using dev
. list LVs using dev
pvscan --cache --listvg [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. [check online files and report if VG is complete]
pvscan --cache --listlvs [--checkcomplete] <dev>
. read only dev
. create online file for dev
. list VG using dev
. list LVs using dev
. [check online files and report if VG is complete]
. [check online files and report if LVs are complete]
[--vgonline]
can be used with --checkcomplete, to enable use of a vg online
file. This results in only the first pvscan command to see
the complete VG to report 'VG complete', and others will report
'VG finished'. This allows the caller to easily run a single
activation of the VG.
[--udevoutput]
can be used with --cache --listvg --checkcomplete, to enable
an output mode that prints LVM_VG_NAME_COMPLETE='vgname' that
a udev rule can import, and prevents other output from the
command (other output causes udev to ignore the command.)
The list of complete LVs is meant to be passed to lvchange -aay,
or the complete VG used with vgchange -aay.
When --checkcomplete is used, lvm assumes that that the output
will be used to trigger event-based autoactivation, so the pvscan
does nothing if event_activation=0 and --checkcomplete is used.
Example of listlvs
------------------
$ lvs -a vg -olvname,devices
LV Devices
lv_a /dev/loop0(0)
lv_ab /dev/loop0(1),/dev/loop1(1)
lv_abc /dev/loop0(3),/dev/loop1(3),/dev/loop2(1)
lv_b /dev/loop1(0)
lv_c /dev/loop2(0)
$ pvscan --cache --listlvs --checkcomplete /dev/loop0
pvscan[35680] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
LV vg/lv_a complete
LV vg/lv_ab incomplete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop1
pvscan[35681] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
LV vg/lv_b complete
LV vg/lv_ab complete
LV vg/lv_abc incomplete
$ pvscan --cache --listlvs --checkcomplete /dev/loop2
pvscan[35682] PV /dev/loop2 online, VG vg is complete.
VG vg complete
LV vg/lv_c complete
LV vg/lv_abc complete
Example of listvg
-----------------
$ pvscan --cache --listvg --checkcomplete /dev/loop0
pvscan[35684] PV /dev/loop0 online, VG vg incomplete (need 2).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop1
pvscan[35685] PV /dev/loop1 online, VG vg incomplete (need 1).
VG vg incomplete
$ pvscan --cache --listvg --checkcomplete /dev/loop2
pvscan[35686] PV /dev/loop2 online, VG vg is complete.
VG vg complete
2020-12-09 10:59:40 -06:00
int get_visible_lvs_using_pv ( struct cmd_context * cmd , struct volume_group * vg , struct device * dev ,
struct dm_list * lvs_list )
{
struct pv_list * pvl ;
struct lv_list * lvl , * lvl2 ;
struct physical_volume * pv = NULL ;
dm_list_iterate_items ( pvl , & vg - > pvs ) {
if ( pvl - > pv - > dev = = dev ) {
pv = pvl - > pv ;
break ;
}
}
if ( ! pv )
return_0 ;
dm_list_iterate_items ( lvl , & vg - > lvs ) {
if ( ! lv_is_visible ( lvl - > lv ) )
continue ;
if ( ! lv_is_on_pv ( lvl - > lv , pv ) )
continue ;
if ( ! ( lvl2 = dm_pool_zalloc ( cmd - > mem , sizeof ( * lvl2 ) ) ) )
return_0 ;
lvl2 - > lv = lvl - > lv ;
dm_list_add ( lvs_list , & lvl2 - > list ) ;
}
return 1 ;
}
2022-09-09 16:07:07 -05:00
int lv_is_linear ( struct logical_volume * lv )
{
struct lv_segment * seg = first_seg ( lv ) ;
return segtype_is_linear ( seg - > segtype ) ;
}
int lv_is_striped ( struct logical_volume * lv )
{
struct lv_segment * seg = first_seg ( lv ) ;
return segtype_is_striped ( seg - > segtype ) ;
}
2024-08-05 13:20:58 -05:00
int setting_str_list_add ( const char * field , uint64_t val , char * val_str , struct dm_list * result , struct dm_pool * mem )
{
char buf [ 128 ] ;
char * list_item ;
if ( val_str ) {
if ( dm_snprintf ( buf , sizeof ( buf ) , " %s=%s " , field , val_str ) < 0 )
return_0 ;
} else {
if ( dm_snprintf ( buf , sizeof ( buf ) , " %s=%llu " , field , ( unsigned long long ) val ) < 0 )
return_0 ;
}
if ( ! ( list_item = dm_pool_strdup ( mem , buf ) ) )
return_0 ;
if ( ! str_list_add_no_dup_check ( mem , result , list_item ) )
return_0 ;
return 1 ;
}