2002-02-25 15:02:33 +03:00
/*
2008-01-30 17:00:02 +03:00
* Copyright ( C ) 2002 - 2004 Sistina Software , Inc . All rights reserved .
2014-01-28 22:24:51 +04:00
* Copyright ( C ) 2004 - 2014 Red Hat , Inc . All rights reserved .
2002-02-25 15:02:33 +03:00
*
2004-03-30 23:35:44 +04:00
* This file is part of LVM2 .
*
* This copyrighted material is made available to anyone wishing to use ,
* modify , copy , or redistribute it subject to the terms and conditions
2007-08-21 00:55:30 +04:00
* of the GNU Lesser General Public License v .2 .1 .
2004-03-30 23:35:44 +04:00
*
2007-08-21 00:55:30 +04:00
* You should have received a copy of the GNU Lesser General Public License
2004-03-30 23:35:44 +04:00
* along with this program ; if not , write to the Free Software Foundation ,
* Inc . , 59 Temple Place , Suite 330 , Boston , MA 02111 - 1307 USA
2002-02-25 15:02:33 +03:00
*/
2002-11-18 17:01:16 +03:00
# include "lib.h"
2002-02-25 15:02:33 +03:00
# include "dev_manager.h"
2002-02-25 19:53:12 +03:00
# include "lvm-string.h"
2002-02-26 14:49:17 +03:00
# include "fs.h"
2003-04-25 02:09:13 +04:00
# include "defaults.h"
2004-09-16 22:40:56 +04:00
# include "segtype.h"
2004-05-05 01:25:57 +04:00
# include "display.h"
2003-11-13 21:47:22 +03:00
# include "toolcontext.h"
2004-05-05 01:25:57 +04:00
# include "targets.h"
# include "config.h"
2007-01-26 00:22:30 +03:00
# include "activate.h"
2012-03-03 01:49:43 +04:00
# include "lvm-exec.h"
2014-11-13 12:08:40 +03:00
# include "str_list.h"
2002-02-25 15:02:33 +03:00
2002-02-26 17:44:13 +03:00
# include <limits.h>
2002-03-07 20:37:38 +03:00
# include <dirent.h>
2002-02-25 15:02:33 +03:00
2003-08-20 16:53:57 +04:00
# define MAX_TARGET_PARAMSIZE 50000
2013-10-08 15:27:21 +04:00
# define LVM_UDEV_NOSCAN_FLAG DM_SUBSYSTEM_UDEV_FLAG0
2003-08-20 16:53:57 +04:00
2002-03-16 01:59:12 +03:00
typedef enum {
2005-11-09 01:52:26 +03:00
PRELOAD ,
2002-03-16 01:59:12 +03:00
ACTIVATE ,
2002-03-27 21:17:43 +03:00
DEACTIVATE ,
2002-03-16 01:59:12 +03:00
SUSPEND ,
2006-08-09 01:20:00 +04:00
SUSPEND_WITH_LOCKFS ,
2005-11-09 01:52:26 +03:00
CLEAN
2002-03-27 21:17:43 +03:00
} action_t ;
2002-03-16 01:59:12 +03:00
2014-07-31 00:55:11 +04:00
/* This list must match lib/misc/lvm-string.c:build_dm_uuid(). */
const char * uuid_suffix_list [ ] = { " pool " , " cdata " , " cmeta " , " tdata " , " tmeta " , NULL } ;
2014-11-13 12:08:40 +03:00
struct dlid_list {
struct dm_list list ;
const char * dlid ;
const struct logical_volume * lv ;
} ;
2002-02-25 15:02:33 +03:00
struct dev_manager {
2005-10-17 03:03:59 +04:00
struct dm_pool * mem ;
2002-02-25 15:02:33 +03:00
2004-05-05 01:25:57 +04:00
struct cmd_context * cmd ;
void * target_state ;
2003-05-06 16:00:29 +04:00
uint32_t pvmove_mirror_count ;
2009-05-20 13:52:37 +04:00
int flush_required ;
2013-07-09 14:34:49 +04:00
int activation ; /* building activation tree */
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
int suspend ; /* building suspend tree */
2013-07-09 14:34:49 +04:00
int skip_external_lv ;
2014-11-13 12:08:40 +03:00
struct dm_list pending_delete ; /* str_list of dlid(s) with pending delete */
unsigned track_pending_delete ;
2011-06-11 04:03:06 +04:00
unsigned track_pvmove_deps ;
2003-04-25 02:09:13 +04:00
2002-02-25 15:02:33 +03:00
char * vg_name ;
} ;
2005-11-09 01:52:26 +03:00
struct lv_layer {
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv ;
2005-11-09 01:52:26 +03:00
const char * old_name ;
} ;
2002-03-07 19:48:46 +03:00
2014-09-22 17:50:07 +04:00
int read_only_lv ( const struct logical_volume * lv , const struct lv_activate_opts * laopts )
2005-11-09 01:52:26 +03:00
{
2014-02-13 17:42:29 +04:00
return ( laopts - > read_only | | ! ( lv - > status & LVM_WRITE ) ) ;
2005-11-09 01:52:26 +03:00
}
2002-02-25 17:46:57 +03:00
/*
* Low level device - layer operations .
*/
2003-05-06 16:00:29 +04:00
static struct dm_task * _setup_task ( const char * name , const char * uuid ,
2008-12-19 18:23:03 +03:00
uint32_t * event_nr , int task ,
2014-11-02 19:22:32 +03:00
uint32_t major , uint32_t minor ,
int with_open_count )
2002-02-25 17:46:57 +03:00
{
struct dm_task * dmt ;
2008-01-30 16:19:47 +03:00
if ( ! ( dmt = dm_task_create ( task ) ) )
return_NULL ;
2002-02-25 17:46:57 +03:00
2011-01-05 17:03:36 +03:00
if ( name & & ! dm_task_set_name ( dmt , name ) )
goto_out ;
2002-03-25 21:54:59 +03:00
2011-01-05 17:03:36 +03:00
if ( uuid & & * uuid & & ! dm_task_set_uuid ( dmt , uuid ) )
goto_out ;
2002-03-25 21:54:59 +03:00
2011-01-05 17:03:36 +03:00
if ( event_nr & & ! dm_task_set_event_nr ( dmt , * event_nr ) )
goto_out ;
2003-04-30 19:26:25 +04:00
2011-01-05 17:03:36 +03:00
if ( major & & ! dm_task_set_major_minor ( dmt , major , minor , 1 ) )
goto_out ;
2008-12-19 18:23:03 +03:00
2011-07-01 18:09:19 +04:00
if ( activation_checks ( ) & & ! dm_task_enable_checks ( dmt ) )
goto_out ;
2014-11-02 19:22:32 +03:00
if ( ! with_open_count & & ! dm_task_no_open_count ( dmt ) )
log_warn ( " WARNING: Failed to disable open_count. " ) ;
2002-02-25 17:46:57 +03:00
return dmt ;
2011-01-05 17:03:36 +03:00
out :
dm_task_destroy ( dmt ) ;
return NULL ;
2002-02-25 17:46:57 +03:00
}
2014-11-04 17:00:32 +03:00
static int _get_segment_status_from_target_params ( const char * target_name ,
const char * params ,
struct lv_seg_status * seg_status )
{
struct segment_type * segtype ;
2015-01-20 15:14:16 +03:00
seg_status - > type = SEG_STATUS_UNKNOWN ;
2014-11-11 16:55:58 +03:00
/*
* TODO : Add support for other segment types too !
* The segment to report status for must be properly
* selected for all the other types - mainly make sure
* linear / striped , old snapshots and raids have proper
* segment selected for status !
*/
2015-01-14 14:51:03 +03:00
if ( strcmp ( target_name , " cache " ) & & strcmp ( target_name , " thin-pool " ) )
2014-11-04 17:00:32 +03:00
return 1 ;
2014-11-13 19:40:30 +03:00
if ( ! ( segtype = get_segtype_from_string ( seg_status - > seg - > lv - > vg - > cmd , target_name ) ) )
return_0 ;
2014-11-04 17:00:32 +03:00
if ( segtype ! = seg_status - > seg - > segtype ) {
log_error ( INTERNAL_ERROR " _get_segment_status_from_target_params: "
" segment type %s found does not match expected segment type %s " ,
segtype - > name , seg_status - > seg - > segtype - > name ) ;
return 0 ;
}
2015-09-24 16:59:07 +03:00
if ( segtype_is_cache ( segtype ) ) {
2015-01-20 15:14:16 +03:00
if ( ! dm_get_status_cache ( seg_status - > mem , params , & ( seg_status - > cache ) ) )
return_0 ;
2015-01-14 14:51:59 +03:00
seg_status - > type = SEG_STATUS_CACHE ;
2015-09-24 16:59:07 +03:00
} else if ( segtype_is_raid ( segtype ) ) {
2015-01-20 15:14:16 +03:00
if ( ! dm_get_status_raid ( seg_status - > mem , params , & seg_status - > raid ) )
2015-01-14 14:51:59 +03:00
return_0 ;
seg_status - > type = SEG_STATUS_RAID ;
2015-09-24 16:59:07 +03:00
} else if ( segtype_is_thin_volume ( segtype ) ) {
2015-01-20 15:14:16 +03:00
if ( ! dm_get_status_thin ( seg_status - > mem , params , & seg_status - > thin ) )
2015-01-14 14:51:59 +03:00
return_0 ;
seg_status - > type = SEG_STATUS_THIN ;
2015-09-24 16:59:07 +03:00
} else if ( segtype_is_thin_pool ( segtype ) ) {
2015-01-20 15:14:16 +03:00
if ( ! dm_get_status_thin_pool ( seg_status - > mem , params , & seg_status - > thin_pool ) )
2015-01-14 14:51:59 +03:00
return_0 ;
seg_status - > type = SEG_STATUS_THIN_POOL ;
2015-09-24 16:59:07 +03:00
} else if ( segtype_is_snapshot ( segtype ) ) {
2015-01-20 15:14:16 +03:00
if ( ! dm_get_status_snapshot ( seg_status - > mem , params , & seg_status - > snapshot ) )
2015-01-14 14:51:59 +03:00
return_0 ;
2014-11-04 17:00:32 +03:00
seg_status - > type = SEG_STATUS_SNAPSHOT ;
2015-01-20 15:14:16 +03:00
} else {
log_error ( INTERNAL_ERROR " Unsupported segment type %s. " , segtype - > name ) ;
return 0 ;
2014-11-04 17:00:32 +03:00
}
return 1 ;
}
typedef enum {
INFO , /* DM_DEVICE_INFO ioctl */
STATUS , /* DM_DEVICE_STATUS ioctl */
MKNODES
} info_type_t ;
static int _info_run ( info_type_t type , const char * name , const char * dlid ,
struct dm_info * dminfo , uint32_t * read_ahead ,
struct lv_seg_status * seg_status ,
int with_open_count , int with_read_ahead ,
uint32_t major , uint32_t minor )
2002-03-15 00:17:30 +03:00
{
int r = 0 ;
struct dm_task * dmt ;
2003-11-13 17:11:41 +03:00
int dmtask ;
2014-11-04 17:00:32 +03:00
void * target = NULL ;
uint64_t target_start , target_length ;
char * target_name , * target_params , * params_to_process = NULL ;
uint32_t extent_size ;
switch ( type ) {
case INFO :
dmtask = DM_DEVICE_INFO ;
break ;
case STATUS :
dmtask = DM_DEVICE_STATUS ;
break ;
case MKNODES :
dmtask = DM_DEVICE_MKNODES ;
break ;
2014-11-12 11:42:53 +03:00
default :
log_error ( INTERNAL_ERROR " _info_run: unhandled info type " ) ;
return 0 ;
2014-11-04 17:00:32 +03:00
}
2002-03-15 00:17:30 +03:00
2014-11-21 21:36:51 +03:00
if ( ! ( dmt = _setup_task ( ( type = = MKNODES ) ? name : NULL , dlid , 0 , dmtask ,
2014-11-04 17:00:32 +03:00
major , minor , with_open_count ) ) )
2008-01-30 16:19:47 +03:00
return_0 ;
2002-03-15 00:17:30 +03:00
2005-11-09 01:52:26 +03:00
if ( ! dm_task_run ( dmt ) )
goto_out ;
2002-03-15 00:17:30 +03:00
2014-11-04 17:00:32 +03:00
if ( ! dm_task_get_info ( dmt , dminfo ) )
2005-11-09 01:52:26 +03:00
goto_out ;
2002-03-19 02:25:50 +03:00
2014-11-04 17:00:32 +03:00
if ( with_read_ahead & & dminfo - > exists ) {
2007-12-03 21:00:38 +03:00
if ( ! dm_task_get_read_ahead ( dmt , read_ahead ) )
goto_out ;
} else if ( read_ahead )
2007-11-29 18:04:12 +03:00
* read_ahead = DM_READ_AHEAD_NONE ;
2007-11-12 23:51:54 +03:00
2014-11-04 17:00:32 +03:00
if ( type = = STATUS ) {
extent_size = seg_status - > seg - > lv - > vg - > extent_size ;
do {
target = dm_get_next_target ( dmt , target , & target_start ,
& target_length , & target_name , & target_params ) ;
2014-11-12 12:03:27 +03:00
if ( ( ( uint64_t ) seg_status - > seg - > le * extent_size = = target_start ) & &
( ( uint64_t ) seg_status - > seg - > len * extent_size = = target_length ) ) {
2014-11-04 17:00:32 +03:00
params_to_process = target_params ;
break ;
}
} while ( target ) ;
if ( params_to_process & &
! _get_segment_status_from_target_params ( target_name , params_to_process , seg_status ) )
goto_out ;
}
2002-03-15 00:17:30 +03:00
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2002-02-25 17:46:57 +03:00
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
/*
* _parse_mirror_status
* @ mirror_status_string
* @ image_health : return for allocated copy of image health characters
2012-10-25 09:42:45 +04:00
* @ log_device : return for ' dev_t ' of log device
2013-05-15 04:50:42 +04:00
* @ log_health : NULL if corelog , otherwise dm_malloc ' ed log health char which
* the caller must free
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
*
* This function takes the mirror status string , breaks it up and returns
* its components . For now , we only return the health characters . This
* is an internal function . If there are more things we want to return
* later , we can do that then .
*
* Returns : 1 on success , 0 on failure
*/
static int _parse_mirror_status ( char * mirror_status_str ,
2012-10-25 09:42:45 +04:00
char * * images_health ,
dev_t * log_dev , char * * log_health )
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
{
2012-10-25 09:42:45 +04:00
int major , minor ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
char * p = NULL ;
char * * args , * * log_args ;
unsigned num_devs , log_argc ;
2012-12-07 02:37:21 +04:00
* images_health = NULL ;
* log_health = NULL ;
* log_dev = 0 ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
if ( ! dm_split_words ( mirror_status_str , 1 , 0 , & p ) | |
! ( num_devs = ( unsigned ) atoi ( p ) ) )
/* On errors, we must assume the mirror is to be avoided */
return_0 ;
p + = strlen ( p ) + 1 ;
args = alloca ( ( num_devs + 5 ) * sizeof ( char * ) ) ;
if ( ( unsigned ) dm_split_words ( p , num_devs + 4 , 0 , args ) < num_devs + 4 )
return_0 ;
log_argc = ( unsigned ) atoi ( args [ 3 + num_devs ] ) ;
log_args = alloca ( log_argc * sizeof ( char * ) ) ;
if ( ( unsigned ) dm_split_words ( args [ 3 + num_devs ] + strlen ( args [ 3 + num_devs ] ) + 1 ,
log_argc , 0 , log_args ) < log_argc )
return_0 ;
2012-10-25 09:42:45 +04:00
if ( ! strcmp ( log_args [ 0 ] , " disk " ) ) {
2012-12-07 02:37:21 +04:00
if ( ! ( * log_health = dm_strdup ( log_args [ 2 ] ) ) ) {
log_error ( " Allocation of log string failed. " ) ;
return 0 ;
}
if ( sscanf ( log_args [ 1 ] , " %d:%d " , & major , & minor ) ! = 2 ) {
2013-05-15 04:50:42 +04:00
log_error ( " Failed to parse log's device number from %s. " , log_args [ 1 ] ) ;
2012-12-07 02:37:21 +04:00
goto out ;
}
2012-10-25 09:42:45 +04:00
* log_dev = MKDEV ( ( dev_t ) major , minor ) ;
}
2012-12-07 02:37:21 +04:00
if ( ! ( * images_health = dm_strdup ( args [ 2 + num_devs ] ) ) ) {
log_error ( " Allocation of images string failed. " ) ;
goto out ;
}
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
return 1 ;
2012-12-07 02:37:21 +04:00
out :
dm_free ( * log_health ) ;
* log_health = NULL ;
* log_dev = 0 ;
return 0 ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
}
/*
* ignore_blocked_mirror_devices
* @ dev
* @ start
* @ length
* @ mirror_status_str
*
* When a DM ' mirror ' target is created with ' block_on_error ' or
* ' handle_errors ' , it will block I / O if there is a device failure
* until the mirror is reconfigured . Thus , LVM should never attempt
* to read labels from a mirror that has a failed device . ( LVM
* commands are issued to repair mirrors ; and if LVM is blocked
* attempting to read a mirror , a circular dependency would be created . )
*
* This function is a slimmed - down version of lib / mirror / mirrored . c :
2012-10-25 09:42:45 +04:00
* _mirrored_transient_status ( ) .
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
*
* If a failed device is detected in the status string , then it must be
* determined if ' block_on_error ' or ' handle_errors ' was used when
* creating the mirror . This info can only be determined from the mirror
* table . The ' dev ' , ' start ' , ' length ' trio allow us to correlate the
* ' mirror_status_str ' with the correct device table in order to check
* for blocking .
*
* Returns : 1 if mirror should be ignored , 0 if safe to use
*/
static int _ignore_blocked_mirror_devices ( struct device * dev ,
uint64_t start , uint64_t length ,
char * mirror_status_str )
{
unsigned i , check_for_blocking = 0 ;
2012-10-25 09:42:45 +04:00
dev_t log_dev ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
char * images_health , * log_health ;
uint64_t s , l ;
2013-04-12 20:30:04 +04:00
char * p , * params , * target_type = NULL ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
void * next = NULL ;
2012-12-07 02:37:21 +04:00
struct dm_task * dmt = NULL ;
int r = 0 ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
if ( ! _parse_mirror_status ( mirror_status_str ,
2012-10-25 09:42:45 +04:00
& images_health , & log_dev , & log_health ) )
2012-12-07 02:37:21 +04:00
return_0 ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
for ( i = 0 ; images_health [ i ] ; i + + )
if ( images_health [ i ] ! = ' A ' ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: Mirror image %d marked as failed " ,
dev_name ( dev ) , i ) ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
check_for_blocking = 1 ;
}
2012-10-25 09:42:45 +04:00
if ( ! check_for_blocking & & log_dev ) {
if ( log_health [ 0 ] ! = ' A ' ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: Mirror log device marked as failed " ,
dev_name ( dev ) ) ;
2012-10-25 09:42:45 +04:00
check_for_blocking = 1 ;
} else {
struct device * tmp_dev ;
char buf [ 16 ] ;
if ( dm_snprintf ( buf , sizeof ( buf ) , " %d:%d " ,
( int ) MAJOR ( log_dev ) ,
( int ) MINOR ( log_dev ) ) < 0 )
goto_out ;
2012-12-07 02:37:21 +04:00
if ( ! ( tmp_dev = dev_create_file ( buf , NULL , NULL , 0 ) ) )
2012-10-25 09:42:45 +04:00
goto_out ;
tmp_dev - > dev = log_dev ;
2014-09-23 14:47:11 +04:00
if ( device_is_usable ( tmp_dev , ( struct dev_usable_check_params )
{ . check_empty = 1 ,
. check_blocked = 1 ,
. check_suspended = ignore_suspended_devices ( ) ,
. check_error_target = 1 ,
. check_reserved = 0 } ) )
2012-10-25 09:42:45 +04:00
goto_out ;
}
}
2012-12-07 02:37:21 +04:00
if ( ! check_for_blocking ) {
r = 1 ;
goto out ;
}
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
/*
* We avoid another system call if we can , but if a device is
* dead , we have no choice but to look up the table too .
*/
if ( ! ( dmt = dm_task_create ( DM_DEVICE_TABLE ) ) )
goto_out ;
if ( ! dm_task_set_major_minor ( dmt , MAJOR ( dev - > dev ) , MINOR ( dev - > dev ) , 1 ) )
goto_out ;
if ( activation_checks ( ) & & ! dm_task_enable_checks ( dmt ) )
goto_out ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
do {
next = dm_get_next_target ( dmt , next , & s , & l ,
& target_type , & params ) ;
if ( ( s = = start ) & & ( l = = length ) ) {
if ( strcmp ( target_type , " mirror " ) )
goto_out ;
2013-04-12 20:30:04 +04:00
if ( ( ( p = strstr ( params , " block_on_error " ) ) & &
( p [ 15 ] = = ' \0 ' | | p [ 15 ] = = ' ' ) ) | |
( ( p = strstr ( params , " handle_errors " ) ) & &
( p [ 14 ] = = ' \0 ' | | p [ 14 ] = = ' ' ) ) ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: I/O blocked to mirror device " ,
dev_name ( dev ) ) ;
2012-12-07 02:37:21 +04:00
goto out ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
}
}
} while ( next ) ;
2012-12-07 02:37:21 +04:00
r = 1 ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
out :
2012-12-07 02:37:21 +04:00
if ( dmt )
dm_task_destroy ( dmt ) ;
dm_free ( log_health ) ;
dm_free ( images_health ) ;
return r ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
}
2015-06-17 14:37:53 +03:00
static int _device_is_suspended ( int major , int minor )
{
struct dm_task * dmt ;
struct dm_info info ;
int r = 0 ;
if ( ! ( dmt = dm_task_create ( DM_DEVICE_INFO ) ) )
return 0 ;
if ( ! dm_task_set_major_minor ( dmt , major , minor , 1 ) )
goto_out ;
if ( activation_checks ( ) & & ! dm_task_enable_checks ( dmt ) )
goto_out ;
if ( ! dm_task_run ( dmt ) | |
! dm_task_get_info ( dmt , & info ) ) {
log_error ( " Failed to get info for device %d:%d " , major , minor ) ;
goto out ;
}
r = info . exists & & info . suspended ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
static int _ignore_suspended_snapshot_component ( struct device * dev )
{
struct dm_task * dmt ;
void * next = NULL ;
char * params , * target_type = NULL ;
uint64_t start , length ;
int major1 , minor1 , major2 , minor2 ;
int r = 0 ;
if ( ! ( dmt = dm_task_create ( DM_DEVICE_TABLE ) ) )
return_0 ;
if ( ! dm_task_set_major_minor ( dmt , MAJOR ( dev - > dev ) , MINOR ( dev - > dev ) , 1 ) )
goto_out ;
if ( activation_checks ( ) & & ! dm_task_enable_checks ( dmt ) )
goto_out ;
if ( ! dm_task_run ( dmt ) ) {
log_error ( " Failed to get state of snapshot or snapshot origin device " ) ;
goto out ;
}
do {
next = dm_get_next_target ( dmt , next , & start , & length , & target_type , & params ) ;
if ( ! strcmp ( target_type , " snapshot " ) ) {
if ( sscanf ( params , " %d:%d %d:%d " , & major1 , & minor1 , & major2 , & minor2 ) ! = 4 ) {
log_error ( " Incorrect snapshot table found " ) ;
goto_out ;
}
2015-06-17 15:12:18 +03:00
r = r | | _device_is_suspended ( major1 , minor1 ) | | _device_is_suspended ( major2 , minor2 ) ;
2015-06-17 14:37:53 +03:00
} else if ( ! strcmp ( target_type , " snapshot-origin " ) ) {
if ( sscanf ( params , " %d:%d " , & major1 , & minor1 ) ! = 2 ) {
log_error ( " Incorrect snapshot-origin table found " ) ;
goto_out ;
}
2015-06-17 15:12:18 +03:00
r = r | | _device_is_suspended ( major1 , minor1 ) ;
2015-06-17 14:37:53 +03:00
}
} while ( next ) ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2015-09-02 17:13:31 +03:00
static int _ignore_unusable_thins ( struct device * dev )
{
/* TODO make function for thin testing */
struct dm_pool * mem ;
struct dm_status_thin_pool * status ;
struct dm_task * dmt = NULL ;
void * next = NULL ;
uint64_t start , length ;
char * target_type = NULL ;
char * params ;
int minor , major ;
int r = 0 ;
if ( ! ( mem = dm_pool_create ( " unusable_thins " , 128 ) ) )
return_0 ;
if ( ! ( dmt = dm_task_create ( DM_DEVICE_TABLE ) ) )
goto_out ;
if ( ! dm_task_no_open_count ( dmt ) )
goto_out ;
if ( ! dm_task_set_major_minor ( dmt , MAJOR ( dev - > dev ) , MINOR ( dev - > dev ) , 1 ) )
goto_out ;
if ( ! dm_task_run ( dmt ) ) {
log_error ( " Failed to get state of mapped device. " ) ;
goto out ;
}
dm_get_next_target ( dmt , next , & start , & length , & target_type , & params ) ;
if ( sscanf ( params , " %d:%d " , & minor , & major ) ! = 2 ) {
log_error ( " Failed to get thin-pool major:minor for thin device %d:%d. " ,
( int ) MAJOR ( dev - > dev ) , ( int ) MINOR ( dev - > dev ) ) ;
goto out ;
}
dm_task_destroy ( dmt ) ;
if ( ! ( dmt = dm_task_create ( DM_DEVICE_STATUS ) ) )
goto_out ;
if ( ! dm_task_no_flush ( dmt ) )
log_warn ( " Can't set no_flush. " ) ;
if ( ! dm_task_no_open_count ( dmt ) )
goto_out ;
if ( ! dm_task_set_major_minor ( dmt , minor , major , 1 ) )
goto_out ;
if ( ! dm_task_run ( dmt ) ) {
log_error ( " Failed to get state of mapped device. " ) ;
goto out ;
}
dm_get_next_target ( dmt , next , & start , & length , & target_type , & params ) ;
if ( ! dm_get_status_thin_pool ( mem , params , & status ) )
return_0 ;
if ( status - > read_only | | status - > out_of_data_space ) {
log_warn ( " WARNING: %s: Thin's thin-pool needs inspection. " ,
dev_name ( dev ) ) ;
goto out ;
}
r = 1 ;
out :
if ( dmt )
dm_task_destroy ( dmt ) ;
dm_pool_destroy ( mem ) ;
return r ;
}
2013-08-08 02:42:26 +04:00
/*
2014-09-23 14:47:11 +04:00
* device_is_usable
2013-08-08 02:42:26 +04:00
* @ dev
* @ check_lv_names
*
* A device is considered not usable if it is :
* 1 ) An empty device ( no targets )
* 2 ) A blocked mirror ( i . e . a mirror with a failure and block_on_error set )
* 3 ) ignore_suspended_devices is set and
* a ) the device is suspended
* b ) it is a snapshot origin
* 4 ) an error target
* 5 ) the LV name is a reserved name .
*
* Returns : 1 if usable , 0 otherwise
*/
2014-09-23 14:47:11 +04:00
int device_is_usable ( struct device * dev , struct dev_usable_check_params check )
2007-01-26 00:22:30 +03:00
{
struct dm_task * dmt ;
struct dm_info info ;
2010-05-13 22:38:38 +04:00
const char * name , * uuid ;
2008-01-30 17:00:02 +03:00
uint64_t start , length ;
char * target_type = NULL ;
2010-05-14 16:03:32 +04:00
char * params , * vgname = NULL , * lvname , * layer ;
2007-01-26 02:03:48 +03:00
void * next = NULL ;
2010-10-25 14:37:34 +04:00
int only_error_target = 1 ;
2007-01-26 00:22:30 +03:00
int r = 0 ;
2011-03-18 15:17:57 +03:00
if ( ! ( dmt = dm_task_create ( DM_DEVICE_STATUS ) ) )
return_0 ;
2007-01-26 00:22:30 +03:00
2010-05-13 22:38:38 +04:00
if ( ! dm_task_set_major_minor ( dmt , MAJOR ( dev - > dev ) , MINOR ( dev - > dev ) , 1 ) )
2007-01-26 00:22:30 +03:00
goto_out ;
2011-07-01 18:09:19 +04:00
if ( activation_checks ( ) & & ! dm_task_enable_checks ( dmt ) )
goto_out ;
2007-01-26 00:22:30 +03:00
if ( ! dm_task_run ( dmt ) ) {
log_error ( " Failed to get state of mapped device " ) ;
goto out ;
}
if ( ! dm_task_get_info ( dmt , & info ) )
goto_out ;
2010-08-09 18:05:16 +04:00
if ( ! info . exists )
2007-01-26 00:22:30 +03:00
goto out ;
2007-01-26 02:03:48 +03:00
name = dm_task_get_name ( dmt ) ;
2010-05-13 22:38:38 +04:00
uuid = dm_task_get_uuid ( dmt ) ;
2007-01-26 02:03:48 +03:00
2014-09-23 14:47:11 +04:00
if ( check . check_empty & & ! info . target_count ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: Empty device %s not usable. " , dev_name ( dev ) , name ) ;
2010-10-25 14:37:34 +04:00
goto out ;
}
2014-09-23 14:47:11 +04:00
if ( check . check_suspended & & info . suspended ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: Suspended device %s not usable. " , dev_name ( dev ) , name ) ;
2010-08-09 18:05:16 +04:00
goto out ;
}
2014-03-12 22:38:34 +04:00
/* Check internal lvm devices */
2014-09-23 14:47:11 +04:00
if ( check . check_reserved & &
2014-03-12 22:38:34 +04:00
uuid & & ! strncmp ( uuid , UUID_PREFIX , sizeof ( UUID_PREFIX ) - 1 ) ) {
if ( strlen ( uuid ) > ( sizeof ( UUID_PREFIX ) + 2 * ID_LEN ) ) { /* 68 */
log_debug_activation ( " %s: Reserved uuid %s on internal LV device %s not usable. " ,
dev_name ( dev ) , uuid , name ) ;
goto out ;
}
if ( ! ( vgname = dm_strdup ( name ) ) | |
! dm_split_lvm_name ( NULL , NULL , & vgname , & lvname , & layer ) )
goto_out ;
/* FIXME: fails to handle dev aliases i.e. /dev/dm-5, replace with UUID suffix */
if ( lvname & & ( is_reserved_lvname ( lvname ) | | * layer ) ) {
log_debug_activation ( " %s: Reserved internal LV device %s/%s%s%s not usable. " ,
dev_name ( dev ) , vgname , lvname , * layer ? " - " : " " , layer ) ;
goto out ;
}
}
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
/* FIXME Also check for mpath no paths */
2008-01-30 17:00:02 +03:00
do {
next = dm_get_next_target ( dmt , next , & start , & length ,
& target_type , & params ) ;
mirror: Avoid reading from mirrors that have failed devices
Addresses: rhbz855398 (Allow VGs to be built on cluster mirrors),
and other issues.
The LVM code attempts to avoid reading labels from devices that are
suspended to try to avoid situations that may cause the commands to
block indefinitely. When scanning devices, 'ignore_suspended_devices'
can be set so the code (lib/activate/dev_manager.c:device_is_usable())
checks any DM devices it finds and avoids them if they are suspended.
The mirror target has an additional mechanism that can cause I/O to
be blocked. If a device in a mirror fails, all I/O will be blocked
by the kernel until a new table (a linear target or a mirror with
replacement devices) is loaded. The mirror indicates that this condition
has happened by marking a 'D' for the faulty device in its status
output. This condition must also be checked by 'device_is_usable()' to
avoid the possibility of blocking LVM commands indefinitely due to an
attempt to read the blocked mirror for labels.
Until now, mirrors were avoided if the 'ignore_suspended_devices'
condition was set. This check seemed to suggest, "if we are concerned
about suspended devices, then let's ignore mirrors altogether just
in case". This is insufficient and doesn't solve any problems. All
devices that are suspended are already avoided if
'ignore_suspended_devices' is set; and if a mirror is blocking because
of an error condition, it will block the LVM command regardless of the
setting of that variable.
Rather than avoiding mirrors whenever 'ignore_suspended_devices' is
set, this patch causes mirrors to be avoided whenever they are blocking
due to an error. (As mentioned above, the case where a DM device is
suspended is already covered.) This solves a number of issues that weren't
handled before. For example, pvcreate (or any command that does a
pv_read or vg_read, which eventually call device_is_usable()) will be
protected from blocked mirrors regardless of how
'ignore_suspended_devices' is set. Additionally, a mirror that is
neither suspended nor blocking is /allowed/ to be read regardless
of how 'ignore_suspended_devices' is set. (The latter point being the
source of the fix for rhbz855398.)
2012-10-24 08:10:33 +04:00
2014-09-23 14:47:11 +04:00
if ( check . check_blocked & & target_type & & ! strcmp ( target_type , " mirror " ) ) {
Mirror: Fix hangs and lock-ups caused by attempting label reads of mirrors
There is a problem with the way mirrors have been designed to handle
failures that is resulting in stuck LVM processes and hung I/O. When
mirrors encounter a write failure, they block I/O and notify userspace
to reconfigure the mirror to remove failed devices. This process is
open to a couple races:
1) Any LVM process other than the one that is meant to deal with the
mirror failure can attempt to read the mirror, fail, and block other
LVM commands (including the repair command) from proceeding due to
holding a lock on the volume group.
2) If there are multiple mirrors that suffer a failure in the same
volume group, a repair can block while attempting to read the LVM
label from one mirror while trying to repair the other.
Mitigation of these races has been attempted by disallowing label reading
of mirrors that are either suspended or are indicated as blocking by
the kernel. While this has closed the window of opportunity for hitting
the above problems considerably, it hasn't closed it completely. This is
because it is still possible to start an LVM command, read the status of
the mirror as healthy, and then perform the read for the label at the
moment after a the failure is discovered by the kernel.
I can see two solutions to this problem:
1) Allow users to configure whether mirrors can be candidates for LVM
labels (i.e. whether PVs can be created on mirror LVs). If the user
chooses to allow label scanning of mirror LVs, it will be at the expense
of a possible hang in I/O or LVM processes.
2) Instrument a way to allow asynchronous label reading - allowing
blocked label reads to be ignored while continuing to process the LVM
command. This would action would allow LVM commands to continue even
though they would have otherwise blocked trying to read a mirror. They
can then release their lock and allow a repair command to commence. In
the event of #2 above, the repair command already in progress can continue
and repair the failed mirror.
This patch brings solution #1. If solution #2 is developed later on, the
configuration option created in #1 can be negated - allowing mirrors to
be scanned for labels by default once again.
2013-10-23 04:14:33 +04:00
if ( ignore_lvm_mirrors ( ) ) {
log_debug_activation ( " %s: Scanning mirror devices is disabled. " , dev_name ( dev ) ) ;
goto out ;
}
if ( ! _ignore_blocked_mirror_devices ( dev , start ,
length , params ) ) {
log_debug_activation ( " %s: Mirror device %s not usable. " ,
dev_name ( dev ) , name ) ;
goto out ;
}
2010-05-13 22:38:38 +04:00
}
2010-08-26 18:21:50 +04:00
/*
2015-01-09 13:24:16 +03:00
* FIXME : Snapshot origin could be sitting on top of a mirror
* which could be blocking I / O . We should add a check for the
* stack here and see if there ' s blocked mirror underneath .
* Currently , mirrors used as origin or snapshot is not
* supported anymore and in general using mirrors in a stack
* is disabled by default ( with a warning that if enabled ,
* it could cause various deadlocks ) .
2015-06-17 14:37:53 +03:00
* Similar situation can happen with RAID devices where
* a RAID device can be snapshotted .
* If one of the RAID legs are down and we ' re doing
* lvconvert - - repair , there ' s a time period in which
* snapshot components are ( besides other devs ) suspended .
* See also https : //bugzilla.redhat.com/show_bug.cgi?id=1219222
* for an example where this causes problems .
*
* This is a quick check for now , but replace it with more
* robust and better check that would check the stack
* correctly , not just snapshots but any cobimnation possible
* in a stack - use proper dm tree to check this instead .
2010-08-26 18:21:50 +04:00
*/
2015-06-17 15:27:48 +03:00
if ( check . check_suspended & & target_type & &
2015-06-17 14:37:53 +03:00
( ! strcmp ( target_type , " snapshot " ) | | ! strcmp ( target_type , " snapshot-origin " ) ) & &
_ignore_suspended_snapshot_component ( dev ) ) {
log_debug_activation ( " %s: %s device %s not usable. " , dev_name ( dev ) , target_type , name ) ;
2010-08-26 18:21:50 +04:00
goto out ;
2015-06-17 14:37:53 +03:00
}
2010-10-24 21:36:58 +04:00
2015-09-02 17:13:31 +03:00
/* TODO: extend check struct ? */
if ( target_type & & ! strcmp ( target_type , " thin " ) & &
! _ignore_unusable_thins ( dev ) ) {
log_debug_activation ( " %s: %s device %s not usable. " , dev_name ( dev ) , target_type , name ) ;
goto out ;
}
2010-10-25 14:37:34 +04:00
if ( target_type & & strcmp ( target_type , " error " ) )
only_error_target = 0 ;
2008-01-30 17:00:02 +03:00
} while ( next ) ;
2007-01-26 00:22:30 +03:00
2010-10-25 14:37:34 +04:00
/* Skip devices consisting entirely of error targets. */
/* FIXME Deal with device stacked above error targets? */
2014-09-23 14:47:11 +04:00
if ( check . check_error_target & & only_error_target ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " %s: Error device %s not usable. " ,
dev_name ( dev ) , name ) ;
2010-10-25 14:37:34 +04:00
goto out ;
}
2007-01-26 00:22:30 +03:00
/* FIXME Also check dependencies? */
r = 1 ;
out :
2010-05-14 16:03:32 +04:00
dm_free ( vgname ) ;
2007-01-26 00:22:30 +03:00
dm_task_destroy ( dmt ) ;
return r ;
}
2010-02-24 23:00:56 +03:00
static int _info ( const char * dlid , int with_open_count , int with_read_ahead ,
2014-11-04 17:00:32 +03:00
struct dm_info * dminfo , uint32_t * read_ahead ,
struct lv_seg_status * seg_status )
2002-03-25 21:54:59 +03:00
{
2010-01-26 10:58:23 +03:00
int r = 0 ;
2014-07-31 00:55:11 +04:00
char old_style_dlid [ sizeof ( UUID_PREFIX ) + 2 * ID_LEN ] ;
const char * suffix , * suffix_position ;
unsigned i = 0 ;
2010-01-26 10:58:23 +03:00
2014-07-31 00:55:11 +04:00
/* Check for dlid */
2014-11-04 17:00:32 +03:00
if ( ( r = _info_run ( seg_status ? STATUS : INFO , NULL , dlid , dminfo , read_ahead ,
seg_status , with_open_count , with_read_ahead , 0 , 0 ) ) & & dminfo - > exists )
2010-02-24 23:00:56 +03:00
return 1 ;
2014-07-31 00:55:11 +04:00
/* Check for original version of dlid before the suffixes got added in 2.02.106 */
if ( ( suffix_position = rindex ( dlid , ' - ' ) ) ) {
while ( ( suffix = uuid_suffix_list [ i + + ] ) ) {
if ( strcmp ( suffix_position + 1 , suffix ) )
continue ;
( void ) strncpy ( old_style_dlid , dlid , sizeof ( old_style_dlid ) ) ;
old_style_dlid [ sizeof ( old_style_dlid ) - 1 ] = ' \0 ' ;
2014-11-04 17:00:32 +03:00
if ( ( r = _info_run ( seg_status ? STATUS : INFO , NULL , old_style_dlid , dminfo ,
read_ahead , seg_status , with_open_count ,
with_read_ahead , 0 , 0 ) ) & & dminfo - > exists )
2014-07-31 00:55:11 +04:00
return 1 ;
}
}
/* Check for dlid before UUID_PREFIX was added */
2014-11-04 17:00:32 +03:00
if ( ( r = _info_run ( seg_status ? STATUS : INFO , NULL , dlid + sizeof ( UUID_PREFIX ) - 1 ,
dminfo , read_ahead , seg_status , with_open_count ,
with_read_ahead , 0 , 0 ) ) & & dminfo - > exists )
2010-02-24 23:00:56 +03:00
return 1 ;
2002-03-25 21:54:59 +03:00
2010-01-26 10:58:23 +03:00
return r ;
2002-03-25 21:54:59 +03:00
}
2008-12-19 18:23:03 +03:00
static int _info_by_dev ( uint32_t major , uint32_t minor , struct dm_info * info )
{
2014-11-04 17:00:32 +03:00
return _info_run ( INFO , NULL , NULL , info , NULL , 0 , 0 , 0 , major , minor ) ;
2008-12-19 18:23:03 +03:00
}
2010-02-24 23:00:56 +03:00
int dev_manager_info ( struct dm_pool * mem , const struct logical_volume * lv ,
2010-08-17 20:25:32 +04:00
const char * layer ,
2007-11-12 23:51:54 +03:00
int with_open_count , int with_read_ahead ,
2014-11-04 17:00:32 +03:00
struct dm_info * dminfo , uint32_t * read_ahead ,
struct lv_seg_status * seg_status )
2005-10-17 22:00:02 +04:00
{
2010-08-03 17:13:01 +04:00
char * dlid , * name ;
2010-02-24 23:00:56 +03:00
int r ;
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( mem , lv - > vg - > name , lv - > name , layer ) ) ) {
2010-02-24 23:00:56 +03:00
log_error ( " name build failed for %s " , lv - > name ) ;
return 0 ;
}
2005-10-19 17:59:18 +04:00
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( mem , lv , layer ) ) ) {
2010-08-17 20:25:32 +04:00
log_error ( " dlid build failed for %s " , name ) ;
2013-09-20 00:23:43 +04:00
r = 0 ;
goto out ;
2005-10-19 17:59:18 +04:00
}
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device info for %s [%s] " , name , dlid ) ;
2014-11-04 17:00:32 +03:00
r = _info ( dlid , with_open_count , with_read_ahead ,
dminfo , read_ahead , seg_status ) ;
2013-09-20 00:23:43 +04:00
out :
2010-08-03 17:13:01 +04:00
dm_pool_free ( mem , name ) ;
2013-09-20 00:23:43 +04:00
2010-02-24 23:00:56 +03:00
return r ;
2005-10-17 22:00:02 +04:00
}
2014-11-02 22:59:57 +03:00
static const struct dm_info * _cached_dm_info ( struct dm_pool * mem ,
struct dm_tree * dtree ,
const struct logical_volume * lv ,
const char * layer )
2010-01-22 18:40:31 +03:00
{
2013-07-14 17:08:26 +04:00
char * dlid ;
const struct dm_tree_node * dnode ;
const struct dm_info * dinfo = NULL ;
2010-01-22 18:40:31 +03:00
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( mem , lv , layer ) ) ) {
2013-07-14 17:08:26 +04:00
log_error ( " Failed to build dlid for %s. " , lv - > name ) ;
2010-01-22 18:40:31 +03:00
return NULL ;
}
2010-02-09 02:28:06 +03:00
if ( ! ( dnode = dm_tree_find_node_by_uuid ( dtree , dlid ) ) )
2014-11-03 14:52:24 +03:00
goto_out ;
2010-01-22 18:40:31 +03:00
if ( ! ( dinfo = dm_tree_node_get_info ( dnode ) ) ) {
2013-07-14 17:08:26 +04:00
log_error ( " Failed to get info from tree node for %s. " , lv - > name ) ;
goto out ;
2010-01-22 18:40:31 +03:00
}
if ( ! dinfo - > exists )
2013-07-14 17:08:26 +04:00
dinfo = NULL ;
out :
dm_pool_free ( mem , dlid ) ;
2010-01-22 18:40:31 +03:00
return dinfo ;
}
2010-12-22 18:32:15 +03:00
#if 0
2002-05-22 18:03:45 +04:00
/* FIXME Interface must cope with multiple targets */
2002-05-10 20:06:06 +04:00
static int _status_run ( const char * name , const char * uuid ,
unsigned long long * s , unsigned long long * l ,
char * * t , uint32_t t_size , char * * p , uint32_t p_size )
2002-05-10 01:17:57 +04:00
{
int r = 0 ;
struct dm_task * dmt ;
2005-10-26 21:56:31 +04:00
struct dm_info info ;
2002-05-10 01:17:57 +04:00
void * next = NULL ;
2002-06-07 12:37:07 +04:00
uint64_t start , length ;
2002-05-10 01:17:57 +04:00
char * type = NULL ;
2002-05-10 20:06:06 +04:00
char * params = NULL ;
2002-05-10 01:17:57 +04:00
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( name , uuid , 0 , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2008-01-30 16:19:47 +03:00
return_0 ;
2002-05-10 01:17:57 +04:00
2005-11-09 01:52:26 +03:00
if ( ! dm_task_run ( dmt ) )
goto_out ;
2002-05-10 01:17:57 +04:00
2005-11-09 01:52:26 +03:00
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
2005-10-26 21:56:31 +04:00
2002-05-10 01:17:57 +04:00
do {
next = dm_get_next_target ( dmt , next , & start , & length ,
& type , & params ) ;
2002-05-22 18:03:45 +04:00
if ( type ) {
2002-05-10 20:06:06 +04:00
* s = start ;
* l = length ;
/* Make sure things are null terminated */
strncpy ( * t , type , t_size ) ;
2002-05-22 18:03:45 +04:00
( * t ) [ t_size - 1 ] = ' \0 ' ;
2002-05-10 20:06:06 +04:00
strncpy ( * p , params , p_size ) ;
2002-05-22 18:03:45 +04:00
( * p ) [ p_size - 1 ] = ' \0 ' ;
2002-05-10 20:06:06 +04:00
r = 1 ;
2002-05-22 18:03:45 +04:00
/* FIXME Cope with multiple targets! */
2002-05-10 20:06:06 +04:00
break ;
2002-05-10 01:17:57 +04:00
}
2002-05-22 18:03:45 +04:00
} while ( next ) ;
2002-05-10 01:17:57 +04:00
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2003-04-25 02:09:13 +04:00
static int _status ( const char * name , const char * uuid ,
unsigned long long * start , unsigned long long * length ,
char * * type , uint32_t type_size , char * * params ,
uint32_t param_size ) __attribute__ ( ( unused ) ) ;
2002-05-10 20:06:06 +04:00
static int _status ( const char * name , const char * uuid ,
unsigned long long * start , unsigned long long * length ,
char * * type , uint32_t type_size , char * * params ,
uint32_t param_size )
2002-05-10 01:17:57 +04:00
{
2005-10-26 19:00:51 +04:00
if ( uuid & & * uuid ) {
if ( _status_run ( NULL , uuid , start , length , type ,
type_size , params , param_size ) & &
* params )
return 1 ;
2005-10-26 21:56:31 +04:00
else if ( _status_run ( NULL , uuid + sizeof ( UUID_PREFIX ) - 1 , start ,
2005-10-26 19:00:51 +04:00
length , type , type_size , params ,
param_size ) & &
* params )
return 1 ;
}
2002-05-10 01:17:57 +04:00
2002-05-10 20:06:06 +04:00
if ( name & & _status_run ( name , NULL , start , length , type , type_size ,
2002-05-22 18:03:45 +04:00
params , param_size ) )
2002-05-10 01:17:57 +04:00
return 1 ;
2002-05-22 18:03:45 +04:00
2002-05-10 01:17:57 +04:00
return 0 ;
}
2010-12-22 18:32:15 +03:00
# endif
2002-05-10 01:17:57 +04:00
2014-09-22 17:50:07 +04:00
int lv_has_target_type ( struct dm_pool * mem , const struct logical_volume * lv ,
2010-04-23 06:57:39 +04:00
const char * layer , const char * target_type )
2010-01-13 04:54:34 +03:00
{
int r = 0 ;
char * dlid ;
struct dm_task * dmt ;
struct dm_info info ;
void * next = NULL ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( mem , lv , layer ) ) )
2010-01-13 04:54:34 +03:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2011-11-18 23:42:03 +04:00
goto_bad ;
2010-01-13 04:54:34 +03:00
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
do {
next = dm_get_next_target ( dmt , next , & start , & length ,
& type , & params ) ;
if ( type & & strncmp ( type , target_type ,
strlen ( target_type ) ) = = 0 ) {
2010-01-22 16:28:54 +03:00
if ( info . live_table )
2010-01-13 04:54:34 +03:00
r = 1 ;
break ;
}
} while ( next ) ;
2011-11-18 23:42:03 +04:00
out :
2010-01-13 04:54:34 +03:00
dm_task_destroy ( dmt ) ;
2011-11-18 23:42:03 +04:00
bad :
dm_pool_free ( mem , dlid ) ;
2010-01-13 04:54:34 +03:00
return r ;
}
2011-11-29 00:37:51 +04:00
int add_linear_area_to_dtree ( struct dm_tree_node * node , uint64_t size , uint32_t extent_size , int use_linear_target , const char * vgname , const char * lvname )
{
uint32_t page_size ;
/*
* Use striped or linear target ?
*/
if ( ! use_linear_target ) {
page_size = lvm_getpagesize ( ) > > SECTOR_SHIFT ;
/*
* We ' ll use the extent size as the stripe size .
* Extent size and page size are always powers of 2.
* The striped target requires that the stripe size is
* divisible by the page size .
*/
if ( extent_size > = page_size ) {
/* Use striped target */
if ( ! dm_tree_node_add_striped_target ( node , size , extent_size ) )
return_0 ;
return 1 ;
} else
/* Some exotic cases are unsupported by striped. */
log_warn ( " WARNING: Using linear target for %s/%s: Striped requires extent size (% " PRIu32 " sectors) >= page size (% " PRIu32 " ). " ,
vgname , lvname , extent_size , page_size ) ;
}
/*
* Use linear target .
*/
if ( ! dm_tree_node_add_linear_target ( node , size ) )
return_0 ;
return 1 ;
}
2014-06-09 14:08:27 +04:00
static dm_percent_range_t _combine_percent ( dm_percent_t a , dm_percent_t b ,
uint32_t numerator , uint32_t denominator )
2009-10-01 04:35:29 +04:00
{
2014-06-09 14:08:27 +04:00
if ( a = = LVM_PERCENT_MERGE_FAILED | | b = = LVM_PERCENT_MERGE_FAILED )
return LVM_PERCENT_MERGE_FAILED ;
2012-01-21 02:02:04 +04:00
2014-06-09 14:08:27 +04:00
if ( a = = DM_PERCENT_INVALID | | b = = DM_PERCENT_INVALID )
return DM_PERCENT_INVALID ;
2009-10-01 04:35:29 +04:00
2014-06-09 14:08:27 +04:00
if ( a = = DM_PERCENT_100 & & b = = DM_PERCENT_100 )
return DM_PERCENT_100 ;
2009-10-01 04:35:29 +04:00
2014-06-09 14:08:27 +04:00
if ( a = = DM_PERCENT_0 & & b = = DM_PERCENT_0 )
return DM_PERCENT_0 ;
2009-10-01 04:35:29 +04:00
2014-06-09 14:08:27 +04:00
return ( dm_percent_range_t ) dm_make_percent ( numerator , denominator ) ;
2009-10-01 04:35:29 +04:00
}
2003-05-06 16:00:29 +04:00
static int _percent_run ( struct dev_manager * dm , const char * name ,
2005-10-19 17:59:18 +04:00
const char * dlid ,
2003-05-06 16:00:29 +04:00
const char * target_type , int wait ,
2014-06-09 14:08:27 +04:00
const struct logical_volume * lv , dm_percent_t * overall_percent ,
2010-02-10 17:38:24 +03:00
uint32_t * event_nr , int fail_if_percent_unsupported )
2003-04-25 02:09:13 +04:00
{
int r = 0 ;
struct dm_task * dmt ;
2003-04-30 19:26:25 +04:00
struct dm_info info ;
2003-04-25 02:09:13 +04:00
void * next = NULL ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
2011-12-21 16:59:22 +04:00
const struct dm_list * segh = lv ? & lv - > segments : NULL ;
2003-05-06 16:00:29 +04:00
struct lv_segment * seg = NULL ;
2004-05-05 01:25:57 +04:00
struct segment_type * segtype ;
2009-10-01 04:35:29 +04:00
int first_time = 1 ;
2014-06-09 14:08:27 +04:00
dm_percent_t percent = DM_PERCENT_INVALID ;
2003-04-25 02:09:13 +04:00
uint64_t total_numerator = 0 , total_denominator = 0 ;
2012-06-20 14:05:00 +04:00
* overall_percent = percent ;
2003-04-25 02:09:13 +04:00
2005-10-19 17:59:18 +04:00
if ( ! ( dmt = _setup_task ( name , dlid , event_nr ,
2014-11-02 19:22:32 +03:00
wait ? DM_DEVICE_WAITEVENT : DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2008-01-30 16:19:47 +03:00
return_0 ;
2003-04-25 02:09:13 +04:00
2015-09-03 23:57:50 +03:00
/* No freeze on overfilled thin-pool, read existing slightly outdated data */
if ( lv & & lv_is_thin_pool ( lv ) & &
! dm_task_no_flush ( dmt ) )
log_warn ( " Can't set no_flush flag. " ) ; /* Non fatal */
2005-11-09 01:52:26 +03:00
if ( ! dm_task_run ( dmt ) )
goto_out ;
2003-04-25 02:09:13 +04:00
2005-11-09 01:52:26 +03:00
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
2003-04-30 19:26:25 +04:00
if ( event_nr )
* event_nr = info . event_nr ;
2003-04-25 02:09:13 +04:00
do {
next = dm_get_next_target ( dmt , next , & start , & length , & type ,
& params ) ;
2003-05-06 16:00:29 +04:00
if ( lv ) {
2008-11-04 01:14:30 +03:00
if ( ! ( segh = dm_list_next ( & lv - > segments , segh ) ) ) {
2003-05-06 16:00:29 +04:00
log_error ( " Number of segments in active LV %s "
" does not match metadata " , lv - > name ) ;
goto out ;
}
2008-11-04 01:14:30 +03:00
seg = dm_list_item ( segh , struct lv_segment ) ;
2003-05-06 16:00:29 +04:00
}
2003-04-25 02:09:13 +04:00
2010-01-15 19:35:26 +03:00
if ( ! type | | ! params )
2003-04-25 02:09:13 +04:00
continue ;
2010-01-13 04:43:32 +03:00
if ( ! ( segtype = get_segtype_from_string ( dm - > cmd , target_type ) ) )
2003-04-30 19:26:25 +04:00
continue ;
2010-01-15 19:35:26 +03:00
if ( strcmp ( type , target_type ) ) {
/* If kernel's type isn't an exact match is it compatible? */
if ( ! segtype - > ops - > target_status_compatible | |
! segtype - > ops - > target_status_compatible ( type ) )
continue ;
}
2010-12-08 22:26:35 +03:00
if ( ! segtype - > ops - > target_percent )
continue ;
if ( ! segtype - > ops - > target_percent ( & dm - > target_state ,
2010-11-30 14:53:31 +03:00
& percent , dm - > mem ,
2006-05-17 00:53:13 +04:00
dm - > cmd , seg , params ,
2004-05-05 01:25:57 +04:00
& total_numerator ,
2008-07-15 04:25:52 +04:00
& total_denominator ) )
2005-11-09 01:52:26 +03:00
goto_out ;
2004-05-05 01:25:57 +04:00
2009-10-01 04:35:29 +04:00
if ( first_time ) {
2010-11-30 14:53:31 +03:00
* overall_percent = percent ;
2009-10-01 04:35:29 +04:00
first_time = 0 ;
} else
2010-11-30 14:53:31 +03:00
* overall_percent =
_combine_percent ( * overall_percent , percent ,
total_numerator , total_denominator ) ;
2003-04-25 02:09:13 +04:00
} while ( next ) ;
2010-12-20 17:04:43 +03:00
if ( lv & & dm_list_next ( & lv - > segments , segh ) ) {
2003-05-06 16:00:29 +04:00
log_error ( " Number of segments in active LV %s does not "
" match metadata " , lv - > name ) ;
goto out ;
}
2010-11-30 14:53:31 +03:00
if ( first_time ) {
/* above ->target_percent() was not executed! */
/* FIXME why return PERCENT_100 et. al. in this case? */
2014-06-09 14:08:27 +04:00
* overall_percent = DM_PERCENT_100 ;
2010-11-30 14:53:31 +03:00
if ( fail_if_percent_unsupported )
goto_out ;
2009-10-01 04:35:29 +04:00
}
2003-04-25 02:09:13 +04:00
2015-09-02 17:13:50 +03:00
log_debug_activation ( " LV percent: %.2f " , dm_percent_to_float ( * overall_percent ) ) ;
2003-04-25 02:09:13 +04:00
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2005-10-19 17:59:18 +04:00
static int _percent ( struct dev_manager * dm , const char * name , const char * dlid ,
2003-05-06 16:00:29 +04:00
const char * target_type , int wait ,
2014-06-09 14:08:27 +04:00
const struct logical_volume * lv , dm_percent_t * percent ,
2010-11-30 14:53:31 +03:00
uint32_t * event_nr , int fail_if_percent_unsupported )
2003-04-25 02:09:13 +04:00
{
2005-10-26 19:00:51 +04:00
if ( dlid & & * dlid ) {
if ( _percent_run ( dm , NULL , dlid , target_type , wait , lv , percent ,
2010-11-30 14:53:31 +03:00
event_nr , fail_if_percent_unsupported ) )
2005-10-26 19:00:51 +04:00
return 1 ;
2005-10-26 21:56:31 +04:00
else if ( _percent_run ( dm , NULL , dlid + sizeof ( UUID_PREFIX ) - 1 ,
2005-10-26 19:00:51 +04:00
target_type , wait , lv , percent ,
2010-11-30 14:53:31 +03:00
event_nr , fail_if_percent_unsupported ) )
2005-10-26 19:00:51 +04:00
return 1 ;
}
2003-04-25 02:09:13 +04:00
2003-05-06 16:00:29 +04:00
if ( name & & _percent_run ( dm , name , NULL , target_type , wait , lv , percent ,
2010-11-30 14:53:31 +03:00
event_nr , fail_if_percent_unsupported ) )
2003-04-25 02:09:13 +04:00
return 1 ;
2013-07-17 01:35:56 +04:00
return_0 ;
2003-04-25 02:09:13 +04:00
}
2010-08-17 05:51:12 +04:00
/* FIXME Merge with the percent function */
2014-09-22 17:50:07 +04:00
int dev_manager_transient ( struct dev_manager * dm , const struct logical_volume * lv )
2010-05-24 19:32:20 +04:00
{
int r = 0 ;
struct dm_task * dmt ;
struct dm_info info ;
void * next = NULL ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
char * dlid = NULL ;
2013-02-02 03:16:36 +04:00
const char * layer = lv_layer ( lv ) ;
2010-05-24 19:32:20 +04:00
const struct dm_list * segh = & lv - > segments ;
struct lv_segment * seg = NULL ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2010-05-24 19:32:20 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( 0 , dlid , NULL , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2010-05-24 19:32:20 +04:00
return_0 ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
do {
next = dm_get_next_target ( dmt , next , & start , & length , & type ,
& params ) ;
2010-12-01 01:28:06 +03:00
if ( ! ( segh = dm_list_next ( & lv - > segments , segh ) ) ) {
log_error ( " Number of segments in active LV %s "
" does not match metadata " , lv - > name ) ;
goto out ;
2010-05-24 19:32:20 +04:00
}
2010-12-01 01:28:06 +03:00
seg = dm_list_item ( segh , struct lv_segment ) ;
2010-05-24 19:32:20 +04:00
if ( ! type | | ! params )
continue ;
2012-06-21 14:43:31 +04:00
if ( ! seg ) {
log_error ( INTERNAL_ERROR " Segment is not selected. " ) ;
goto out ;
}
2010-05-24 19:32:20 +04:00
if ( seg - > segtype - > ops - > check_transient_status & &
! seg - > segtype - > ops - > check_transient_status ( seg , params ) )
goto_out ;
} while ( next ) ;
2010-12-20 17:04:43 +03:00
if ( dm_list_next ( & lv - > segments , segh ) ) {
2010-05-24 19:32:20 +04:00
log_error ( " Number of segments in active LV %s does not "
" match metadata " , lv - > name ) ;
goto out ;
}
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2005-11-09 01:52:26 +03:00
/*
* dev_manager implementation .
*/
struct dev_manager * dev_manager_create ( struct cmd_context * cmd ,
2011-06-11 04:03:06 +04:00
const char * vg_name ,
unsigned track_pvmove_deps )
2002-03-19 02:25:50 +03:00
{
2005-11-09 01:52:26 +03:00
struct dm_pool * mem ;
struct dev_manager * dm ;
2002-03-19 02:25:50 +03:00
2008-01-30 16:19:47 +03:00
if ( ! ( mem = dm_pool_create ( " dev_manager " , 16 * 1024 ) ) )
return_NULL ;
2002-03-19 02:25:50 +03:00
2009-05-07 16:01:21 +04:00
if ( ! ( dm = dm_pool_zalloc ( mem , sizeof ( * dm ) ) ) )
2007-04-26 20:44:59 +04:00
goto_bad ;
2002-03-19 02:25:50 +03:00
2005-11-09 01:52:26 +03:00
dm - > cmd = cmd ;
dm - > mem = mem ;
2005-01-13 01:58:21 +03:00
2007-04-26 20:44:59 +04:00
if ( ! ( dm - > vg_name = dm_pool_strdup ( dm - > mem , vg_name ) ) )
goto_bad ;
2004-05-13 00:40:34 +04:00
2011-06-11 04:03:06 +04:00
/*
* When we manipulate ( normally suspend / resume ) the PVMOVE
* device directly , there ' s no need to touch the LVs above .
*/
dm - > track_pvmove_deps = track_pvmove_deps ;
2005-11-09 01:52:26 +03:00
dm - > target_state = NULL ;
2005-01-13 01:58:21 +03:00
2009-08-04 19:36:13 +04:00
dm_udev_set_sync_support ( cmd - > current_settings . udev_sync ) ;
2014-11-13 12:08:40 +03:00
dm_list_init ( & dm - > pending_delete ) ;
2005-11-09 01:52:26 +03:00
return dm ;
2004-05-13 00:40:34 +04:00
2005-11-09 01:52:26 +03:00
bad :
dm_pool_destroy ( mem ) ;
return NULL ;
2004-05-13 00:40:34 +04:00
}
2005-11-09 01:52:26 +03:00
void dev_manager_destroy ( struct dev_manager * dm )
2004-05-13 00:40:34 +04:00
{
2005-11-09 01:52:26 +03:00
dm_pool_destroy ( dm - > mem ) ;
}
2004-05-13 00:40:34 +04:00
2006-05-16 20:48:31 +04:00
void dev_manager_release ( void )
{
dm_lib_release ( ) ;
}
2005-11-09 01:52:26 +03:00
void dev_manager_exit ( void )
{
dm_lib_exit ( ) ;
2004-05-13 00:40:34 +04:00
}
2005-11-09 01:52:26 +03:00
int dev_manager_snapshot_percent ( struct dev_manager * dm ,
2006-04-06 18:06:27 +04:00
const struct logical_volume * lv ,
2014-06-09 14:08:27 +04:00
dm_percent_t * percent )
2004-05-13 00:40:34 +04:00
{
2012-01-21 01:56:01 +04:00
const struct logical_volume * snap_lv ;
2005-11-09 01:52:26 +03:00
char * name ;
const char * dlid ;
2010-02-10 17:38:24 +03:00
int fail_if_percent_unsupported = 0 ;
if ( lv_is_merging_origin ( lv ) ) {
/*
* Set ' fail_if_percent_unsupported ' , otherwise passing
* unsupported LV types to _percent will lead to a default
* successful return with percent_range as PERCENT_100 .
* - For a merging origin , this will result in a polldaemon
* that runs infinitely ( because completion is PERCENT_0 )
* - We unfortunately don ' t yet _know_ if a snapshot - merge
* target is active ( activation is deferred if dev is open ) ;
* so we can ' t short - circuit origin devices based purely on
* existing LVM LV attributes .
*/
fail_if_percent_unsupported = 1 ;
}
2005-11-09 01:52:26 +03:00
2012-01-21 01:56:01 +04:00
if ( lv_is_merging_cow ( lv ) ) {
/* must check percent of origin for a merging snapshot */
snap_lv = origin_from_cow ( lv ) ;
} else
snap_lv = lv ;
2005-11-09 01:52:26 +03:00
/*
* Build a name for the top layer .
*/
2012-01-21 01:56:01 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , snap_lv - > vg - > name , snap_lv - > name , NULL ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , snap_lv , NULL ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2004-05-13 00:40:34 +04:00
2005-11-09 01:52:26 +03:00
/*
* Try and get some info on this device .
*/
2013-07-17 01:35:56 +04:00
if ( ! _percent ( dm , name , dlid , " snapshot " , 0 , NULL , percent ,
NULL , fail_if_percent_unsupported ) )
2008-01-30 16:19:47 +03:00
return_0 ;
2004-05-13 00:40:34 +04:00
2005-11-09 01:52:26 +03:00
/* If the snapshot isn't available, percent will be -1 */
2004-05-13 00:40:34 +04:00
return 1 ;
}
2005-11-09 01:52:26 +03:00
/* FIXME Merge with snapshot_percent, auto-detecting target type */
/* FIXME Cope with more than one target */
int dev_manager_mirror_percent ( struct dev_manager * dm ,
2010-01-16 01:58:25 +03:00
const struct logical_volume * lv , int wait ,
2014-06-09 14:08:27 +04:00
dm_percent_t * percent , uint32_t * event_nr )
2002-02-25 17:46:57 +03:00
{
2005-11-09 01:52:26 +03:00
char * name ;
const char * dlid ;
2011-08-03 02:07:20 +04:00
const char * target_type = first_seg ( lv ) - > segtype - > name ;
2013-02-02 03:16:36 +04:00
const char * layer = lv_layer ( lv ) ;
2002-02-25 17:46:57 +03:00
2002-03-26 16:41:37 +03:00
/*
2005-11-09 01:52:26 +03:00
* Build a name for the top layer .
2002-03-26 16:41:37 +03:00
*/
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , lv - > vg - > name , lv - > name , layer ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2002-03-08 13:41:48 +03:00
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) ) {
2005-11-09 01:52:26 +03:00
log_error ( " dlid build failed for %s " , lv - > name ) ;
return 0 ;
2003-07-05 02:34:56 +04:00
}
2002-03-15 00:17:30 +03:00
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device %s status percentage for %s " ,
target_type , name ) ;
2013-07-17 01:35:56 +04:00
if ( ! _percent ( dm , name , dlid , target_type , wait , lv , percent ,
event_nr , 0 ) )
2008-01-30 16:19:47 +03:00
return_0 ;
2002-03-15 00:17:30 +03:00
2005-11-09 01:52:26 +03:00
return 1 ;
}
2003-07-05 02:34:56 +04:00
2013-02-01 21:31:47 +04:00
int dev_manager_raid_status ( struct dev_manager * dm ,
const struct logical_volume * lv ,
struct dm_status_raid * * status )
{
int r = 0 ;
const char * dlid ;
struct dm_task * dmt ;
struct dm_info info ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
2013-02-05 03:10:16 +04:00
const char * layer = lv_layer ( lv ) ;
2013-02-01 21:31:47 +04:00
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2013-02-01 21:31:47 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2013-02-01 21:31:47 +04:00
return_0 ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
dm_get_next_target ( dmt , NULL , & start , & length , & type , & params ) ;
2013-04-09 00:04:08 +04:00
if ( ! type | | strcmp ( type , " raid " ) ) {
2014-11-03 14:52:24 +03:00
log_error ( " Expected raid segment type but got %s instead " ,
2013-04-09 00:04:08 +04:00
type ? type : " NULL " ) ;
goto out ;
}
2013-06-15 04:28:54 +04:00
/* FIXME Check there's only one target */
2013-02-01 21:31:47 +04:00
if ( ! dm_get_status_raid ( dm - > mem , params , status ) )
goto_out ;
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2013-04-12 00:33:59 +04:00
int dev_manager_raid_message ( struct dev_manager * dm ,
const struct logical_volume * lv ,
const char * msg )
{
int r = 0 ;
const char * dlid ;
struct dm_task * dmt ;
const char * layer = lv_layer ( lv ) ;
2014-09-16 00:33:53 +04:00
if ( ! lv_is_raid ( lv ) ) {
2013-04-12 00:33:59 +04:00
log_error ( INTERNAL_ERROR " %s/%s is not a RAID logical volume " ,
lv - > vg - > name , lv - > name ) ;
return 0 ;
}
/* These are the supported RAID messages for dm-raid v1.5.0 */
if ( ! strcmp ( msg , " idle " ) & &
! strcmp ( msg , " frozen " ) & &
! strcmp ( msg , " resync " ) & &
! strcmp ( msg , " recover " ) & &
! strcmp ( msg , " check " ) & &
! strcmp ( msg , " repair " ) & &
! strcmp ( msg , " reshape " ) ) {
log_error ( " Unknown RAID message: %s " , msg ) ;
return 0 ;
}
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2013-04-12 00:33:59 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_TARGET_MSG , 0 , 0 , 0 ) ) )
2013-04-12 00:33:59 +04:00
return_0 ;
if ( ! dm_task_set_message ( dmt , msg ) )
goto_out ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2014-01-28 22:24:51 +04:00
int dev_manager_cache_status ( struct dev_manager * dm ,
const struct logical_volume * lv ,
2014-11-03 14:52:29 +03:00
struct lv_status_cache * * status )
2014-01-28 22:24:51 +04:00
{
int r = 0 ;
const char * dlid ;
struct dm_task * dmt ;
struct dm_info info ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
2014-11-03 14:52:29 +03:00
struct dm_status_cache * c ;
2014-01-28 22:24:51 +04:00
2014-11-03 14:52:29 +03:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
return_0 ;
if ( ! ( * status = dm_pool_zalloc ( dm - > mem , sizeof ( struct lv_status_cache ) ) ) )
2014-01-28 22:24:51 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2014-01-28 22:24:51 +04:00
return_0 ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
dm_get_next_target ( dmt , NULL , & start , & length , & type , & params ) ;
if ( ! type | | strcmp ( type , " cache " ) ) {
2014-11-03 14:52:24 +03:00
log_error ( " Expected cache segment type but got %s instead " ,
2014-01-28 22:24:51 +04:00
type ? type : " NULL " ) ;
goto out ;
}
2014-11-03 14:52:29 +03:00
/*
* FIXME :
* - > target_percent ( ) API is able to transfer only a single value .
* Needs to be able to pass whole structure .
*/
if ( ! dm_get_status_cache ( dm - > mem , params , & ( ( * status ) - > cache ) ) )
2014-01-28 22:24:51 +04:00
goto_out ;
2014-11-03 14:52:29 +03:00
c = ( * status ) - > cache ;
2014-11-06 22:36:53 +03:00
( * status ) - > mem = dm - > mem ; /* User has to destroy this mem pool later */
2014-11-03 14:52:29 +03:00
( * status ) - > data_usage = dm_make_percent ( c - > used_blocks ,
c - > total_blocks ) ;
( * status ) - > metadata_usage = dm_make_percent ( c - > metadata_used_blocks ,
c - > metadata_total_blocks ) ;
( * status ) - > dirty_usage = dm_make_percent ( c - > dirty_blocks ,
c - > used_blocks ) ;
2014-01-28 22:24:51 +04:00
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
//FIXME: Can we get rid of this crap below?
2005-11-09 01:52:26 +03:00
#if 0
log_very_verbose ( " %s %s " , sus ? " Suspending " : " Resuming " , name ) ;
2004-05-13 00:40:34 +04:00
2005-11-09 01:52:26 +03:00
log_verbose ( " Loading %s " , dl - > name ) ;
log_very_verbose ( " Activating %s read-only " , dl - > name ) ;
2003-07-05 02:34:56 +04:00
log_very_verbose ( " Activated %s %s %03u:%03u " , dl - > name ,
dl - > dlid , dl - > info . major , dl - > info . minor ) ;
2002-03-16 01:59:12 +03:00
if ( _get_flag ( dl , VISIBLE ) )
2002-03-18 16:09:27 +03:00
log_verbose ( " Removing %s " , dl - > name ) ;
2002-03-16 01:59:12 +03:00
else
2002-03-18 16:09:27 +03:00
log_very_verbose ( " Removing %s " , dl - > name ) ;
2002-03-16 01:59:12 +03:00
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Adding target: % " PRIu64 " % " PRIu64 " %s %s " ,
2005-11-09 01:52:26 +03:00
extent_size * seg - > le , extent_size * seg - > len , target , params ) ;
2002-02-25 17:46:57 +03:00
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Adding target: 0 % " PRIu64 " snapshot-origin %s " ,
2005-11-09 01:52:26 +03:00
dl - > lv - > size , params ) ;
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Adding target: 0 % " PRIu64 " snapshot %s " , size , params ) ;
log_debug_activation ( " Getting device info for %s " , dl - > name ) ;
2005-01-13 01:58:21 +03:00
2005-11-09 01:52:26 +03:00
/* Rename? */
2007-07-02 15:17:21 +04:00
if ( ( suffix = strrchr ( dl - > dlid + sizeof ( UUID_PREFIX ) - 1 , ' - ' ) ) )
2005-11-09 01:52:26 +03:00
suffix + + ;
2011-08-30 18:55:15 +04:00
new_name = dm_build_dm_name ( dm - > mem , dm - > vg_name , dl - > lv - > name ,
2005-11-09 01:52:26 +03:00
suffix ) ;
2002-03-16 01:59:12 +03:00
2005-11-09 01:52:26 +03:00
static int _belong_to_vg ( const char * vgname , const char * name )
{
const char * v = vgname , * n = name ;
2002-02-25 17:46:57 +03:00
2005-11-09 01:52:26 +03:00
while ( * v ) {
if ( ( * v ! = * n ) | | ( * v = = ' - ' & & * ( + + n ) ! = ' - ' ) )
return 0 ;
v + + , n + + ;
2002-04-16 18:42:20 +04:00
}
2002-03-18 16:09:27 +03:00
2005-11-09 01:52:26 +03:00
if ( * n = = ' - ' & & * ( n + 1 ) ! = ' - ' )
return 1 ;
else
return 0 ;
}
2002-04-16 18:42:20 +04:00
2013-07-03 00:26:03 +04:00
if ( ! ( snap_seg = find_snapshot ( lv ) ) )
2002-04-16 18:42:20 +04:00
return 1 ;
2005-04-07 16:39:44 +04:00
old_origin = snap_seg - > origin ;
2002-04-16 18:42:20 +04:00
/* Was this the last active snapshot with this origin? */
2008-11-04 01:14:30 +03:00
dm_list_iterate_items ( lvl , active_head ) {
2005-06-01 20:51:55 +04:00
active = lvl - > lv ;
2013-07-03 00:26:03 +04:00
if ( ( snap_seg = find_snapshot ( active ) ) & &
2005-04-07 16:39:44 +04:00
snap_seg - > origin = = old_origin ) {
2002-04-16 18:42:20 +04:00
return 1 ;
2002-04-24 22:20:51 +04:00
}
2002-03-18 16:09:27 +03:00
}
2005-11-09 01:52:26 +03:00
# endif
2012-01-25 12:48:42 +04:00
int dev_manager_thin_pool_status ( struct dev_manager * dm ,
const struct logical_volume * lv ,
2013-01-17 13:54:41 +04:00
struct dm_status_thin_pool * * status ,
int noflush )
2012-01-25 12:48:42 +04:00
{
const char * dlid ;
struct dm_task * dmt ;
struct dm_info info ;
uint64_t start , length ;
char * type = NULL ;
char * params = NULL ;
int r = 0 ;
/* Build dlid for the thin pool layer */
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2012-01-25 12:48:42 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_STATUS , 0 , 0 , 0 ) ) )
2012-01-25 12:48:42 +04:00
return_0 ;
2013-01-17 13:54:41 +04:00
if ( noflush & & ! dm_task_no_flush ( dmt ) )
log_warn ( " Can't set no_flush. " ) ;
2012-01-25 12:48:42 +04:00
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
dm_get_next_target ( dmt , NULL , & start , & length , & type , & params ) ;
2013-06-15 04:28:54 +04:00
/* FIXME Check for thin and check there's exactly one target */
2012-01-25 12:48:42 +04:00
if ( ! dm_get_status_thin_pool ( dm - > mem , params , status ) )
goto_out ;
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2011-12-21 17:09:33 +04:00
int dev_manager_thin_pool_percent ( struct dev_manager * dm ,
const struct logical_volume * lv ,
2014-06-09 14:08:27 +04:00
int metadata , dm_percent_t * percent )
2011-12-21 17:09:33 +04:00
{
char * name ;
const char * dlid ;
2012-01-19 19:19:18 +04:00
/* Build a name for the top layer */
if ( ! ( name = dm_build_dm_name ( dm - > mem , lv - > vg - > name , lv - > name ,
2013-02-18 12:53:05 +04:00
lv_layer ( lv ) ) ) )
2011-12-21 17:09:33 +04:00
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2011-12-21 17:09:33 +04:00
return_0 ;
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device status percentage for %s " , name ) ;
2012-01-19 19:25:37 +04:00
if ( ! ( _percent ( dm , name , dlid , " thin-pool " , 0 ,
( metadata ) ? lv : NULL , percent , NULL , 1 ) ) )
2011-12-21 17:09:33 +04:00
return_0 ;
return 1 ;
}
2012-01-19 19:27:54 +04:00
int dev_manager_thin_percent ( struct dev_manager * dm ,
const struct logical_volume * lv ,
2014-06-09 14:08:27 +04:00
int mapped , dm_percent_t * percent )
2012-01-19 19:27:54 +04:00
{
char * name ;
const char * dlid ;
2013-02-02 03:16:36 +04:00
const char * layer = lv_layer ( lv ) ;
2012-01-19 19:27:54 +04:00
/* Build a name for the top layer */
2012-01-29 00:12:26 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , lv - > vg - > name , lv - > name , layer ) ) )
2012-01-19 19:27:54 +04:00
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2012-01-19 19:27:54 +04:00
return_0 ;
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device status percentage for %s " , name ) ;
2012-01-19 19:27:54 +04:00
if ( ! ( _percent ( dm , name , dlid , " thin " , 0 ,
( mapped ) ? NULL : lv , percent , NULL , 1 ) ) )
return_0 ;
return 1 ;
}
2013-12-04 16:57:27 +04:00
int dev_manager_thin_device_id ( struct dev_manager * dm ,
const struct logical_volume * lv ,
uint32_t * device_id )
{
const char * dlid ;
struct dm_task * dmt ;
struct dm_info info ;
uint64_t start , length ;
char * params , * target_type = NULL ;
int r = 0 ;
/* Build dlid for the thin layer */
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2013-12-04 16:57:27 +04:00
return_0 ;
2014-11-02 19:22:32 +03:00
if ( ! ( dmt = _setup_task ( NULL , dlid , 0 , DM_DEVICE_TABLE , 0 , 0 , 0 ) ) )
2013-12-04 16:57:27 +04:00
return_0 ;
if ( ! dm_task_run ( dmt ) )
goto_out ;
if ( ! dm_task_get_info ( dmt , & info ) | | ! info . exists )
goto_out ;
if ( dm_get_next_target ( dmt , NULL , & start , & length ,
& target_type , & params ) ) {
log_error ( " More then one table line found for %s. " , lv - > name ) ;
goto out ;
}
if ( strcmp ( target_type , " thin " ) ) {
log_error ( " Unexpected target type %s found for thin %s. " , target_type , lv - > name ) ;
goto out ;
}
if ( sscanf ( params , " %*u:%*u %u " , device_id ) ! = 1 ) {
log_error ( " Cannot parse table like parameters %s for %s. " , params , lv - > name ) ;
goto out ;
}
r = 1 ;
out :
dm_task_destroy ( dmt ) ;
return r ;
}
2005-11-09 01:52:26 +03:00
/*************************/
/* NEW CODE STARTS HERE */
/*************************/
2010-02-24 23:00:56 +03:00
static int _dev_manager_lv_mknodes ( const struct logical_volume * lv )
2005-11-09 01:52:26 +03:00
{
char * name ;
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( lv - > vg - > cmd - > mem , lv - > vg - > name ,
2005-11-09 01:52:26 +03:00
lv - > name , NULL ) ) )
return_0 ;
return fs_add_lv ( lv , name ) ;
2002-03-18 16:09:27 +03:00
}
2010-02-24 23:00:56 +03:00
static int _dev_manager_lv_rmnodes ( const struct logical_volume * lv )
2003-04-25 02:09:13 +04:00
{
2005-11-09 01:52:26 +03:00
return fs_del_lv ( lv ) ;
}
2010-02-24 23:00:56 +03:00
int dev_manager_mknodes ( const struct logical_volume * lv )
{
struct dm_info dminfo ;
2010-08-03 17:13:01 +04:00
char * name ;
2010-02-24 23:00:56 +03:00
int r = 0 ;
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( lv - > vg - > cmd - > mem , lv - > vg - > name , lv - > name , NULL ) ) )
2010-02-24 23:00:56 +03:00
return_0 ;
2014-11-04 17:00:32 +03:00
if ( ( r = _info_run ( MKNODES , name , NULL , & dminfo , NULL , NULL , 0 , 0 , 0 , 0 ) ) ) {
2010-02-24 23:00:56 +03:00
if ( dminfo . exists ) {
if ( lv_is_visible ( lv ) )
r = _dev_manager_lv_mknodes ( lv ) ;
} else
r = _dev_manager_lv_rmnodes ( lv ) ;
}
2010-08-03 17:13:01 +04:00
dm_pool_free ( lv - > vg - > cmd - > mem , name ) ;
2010-02-24 23:00:56 +03:00
return r ;
}
2013-05-13 13:46:24 +04:00
# ifdef UDEV_SYNC_SUPPORT
/*
* Until the DM_UEVENT_GENERATED_FLAG was introduced in kernel patch
* 856 a6f1dbd8940e72755af145ebcd806408ecedd
* some operations could not be performed by udev , requiring our fallback code .
*/
static int _dm_driver_has_stable_udev_support ( void )
{
char vsn [ 80 ] ;
unsigned maj , min , patchlevel ;
return driver_version ( vsn , sizeof ( vsn ) ) & &
( sscanf ( vsn , " %u.%u.%u " , & maj , & min , & patchlevel ) = = 3 ) & &
( maj = = 4 ? min > = 18 : maj > 4 ) ;
}
static int _check_udev_fallback ( struct cmd_context * cmd )
{
struct config_info * settings = & cmd - > current_settings ;
if ( settings - > udev_fallback ! = - 1 )
goto out ;
/*
* Use udev fallback automatically in case udev
* is disabled via DM_DISABLE_UDEV environment
* variable or udev rules are switched off .
*/
settings - > udev_fallback = ! settings - > udev_rules ? 1 :
2013-06-25 14:31:53 +04:00
find_config_tree_bool ( cmd , activation_verify_udev_operations_CFG , NULL ) ;
2013-05-13 13:46:24 +04:00
/* Do not rely fully on udev if the udev support is known to be incomplete. */
if ( ! settings - > udev_fallback & & ! _dm_driver_has_stable_udev_support ( ) ) {
log_very_verbose ( " Kernel driver has incomplete udev support so "
" LVM will check and perform some operations itself. " ) ;
settings - > udev_fallback = 1 ;
}
out :
return settings - > udev_fallback ;
}
# else /* UDEV_SYNC_SUPPORT */
static int _check_udev_fallback ( struct cmd_context * cmd )
{
/* We must use old node/symlink creation code if not compiled with udev support at all! */
return cmd - > current_settings . udev_fallback = 1 ;
}
# endif /* UDEV_SYNC_SUPPORT */
2014-09-22 17:50:07 +04:00
static uint16_t _get_udev_flags ( struct dev_manager * dm , const struct logical_volume * lv ,
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
const char * layer , int noscan , int temporary )
2010-04-23 18:16:32 +04:00
{
uint16_t udev_flags = 0 ;
2011-06-17 18:50:53 +04:00
/*
* Instruct also libdevmapper to disable udev
* fallback in accordance to LVM2 settings .
*/
2013-05-13 13:46:24 +04:00
if ( ! _check_udev_fallback ( dm - > cmd ) )
2011-06-17 18:50:53 +04:00
udev_flags | = DM_UDEV_DISABLE_LIBRARY_FALLBACK ;
2010-04-23 18:16:32 +04:00
/*
* Is this top - level and visible device ?
* If not , create just the / dev / mapper content .
*/
2011-10-29 00:34:45 +04:00
/* FIXME: add target's method for this */
2014-11-04 12:33:35 +03:00
if ( lv_is_new_thin_pool ( lv ) )
/* New thin-pool is regular LV with -tpool UUID suffix. */
udev_flags | = DM_UDEV_DISABLE_DISK_RULES_FLAG |
DM_UDEV_DISABLE_OTHER_RULES_FLAG ;
else if ( layer | | ! lv_is_visible ( lv ) | | lv_is_thin_pool ( lv ) )
2010-04-23 18:16:32 +04:00
udev_flags | = DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG |
DM_UDEV_DISABLE_DISK_RULES_FLAG |
DM_UDEV_DISABLE_OTHER_RULES_FLAG ;
/*
* There ' s no need for other udev rules to touch special LVs with
* reserved names . We don ' t need to populate / dev / disk here either .
* Even if they happen to be visible and top - level .
*/
else if ( is_reserved_lvname ( lv - > name ) )
udev_flags | = DM_UDEV_DISABLE_DISK_RULES_FLAG |
DM_UDEV_DISABLE_OTHER_RULES_FLAG ;
/*
* Snapshots and origins could have the same rule applied that will
* give symlinks exactly the same name ( e . g . a name based on
* filesystem UUID ) . We give preference to origins to make such
* naming deterministic ( e . g . symlinks in / dev / disk / by - uuid ) .
*/
if ( lv_is_cow ( lv ) )
udev_flags | = DM_UDEV_LOW_PRIORITY_FLAG ;
/*
* Finally , add flags to disable / dev / mapper and / dev / < vgname > content
* to be created by udev if it is requested by user ' s configuration .
* This is basically an explicit fallback to old node / symlink creation
* without udev .
*/
if ( ! dm - > cmd - > current_settings . udev_rules )
udev_flags | = DM_UDEV_DISABLE_DM_RULES_FLAG |
DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG ;
2013-10-08 15:27:21 +04:00
/*
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
* LVM subsystem specific flags .
2013-10-08 15:27:21 +04:00
*/
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
if ( noscan )
udev_flags | = DM_SUBSYSTEM_UDEV_FLAG0 ;
if ( temporary )
udev_flags | = DM_UDEV_DISABLE_DISK_RULES_FLAG |
DM_UDEV_DISABLE_OTHER_RULES_FLAG ;
2013-10-08 15:27:21 +04:00
2010-04-23 18:16:32 +04:00
return udev_flags ;
}
2005-11-09 16:08:41 +03:00
static int _add_dev_to_dtree ( struct dev_manager * dm , struct dm_tree * dtree ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv , const char * layer )
2005-11-09 01:52:26 +03:00
{
char * dlid , * name ;
2008-12-19 18:23:03 +03:00
struct dm_info info , info2 ;
2003-04-25 02:09:13 +04:00
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , lv - > vg - > name , lv - > name , layer ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2003-04-25 02:09:13 +04:00
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device info for %s [%s] " , name , dlid ) ;
2014-11-04 17:00:32 +03:00
if ( ! _info ( dlid , 1 , 0 , & info , NULL , NULL ) ) {
2008-01-30 17:00:02 +03:00
log_error ( " Failed to get info for %s [%s]. " , name , dlid ) ;
return 0 ;
}
2005-11-09 01:52:26 +03:00
2008-12-19 18:23:03 +03:00
/*
* For top level volumes verify that existing device match
* requested major / minor and that major / minor pair is available for use
*/
if ( ! layer & & lv - > major ! = - 1 & & lv - > minor ! = - 1 ) {
2009-02-12 23:42:07 +03:00
/*
* FIXME compare info . major with lv - > major if multiple major support
*/
if ( info . exists & & ( info . minor ! = lv - > minor ) ) {
2008-12-19 18:23:03 +03:00
log_error ( " Volume %s (% " PRIu32 " :% " PRIu32 " ) "
" differs from already active device "
" (% " PRIu32 " :% " PRIu32 " ) " ,
lv - > name , lv - > major , lv - > minor , info . major , info . minor ) ;
return 0 ;
}
if ( ! info . exists & & _info_by_dev ( lv - > major , lv - > minor , & info2 ) & &
info2 . exists ) {
log_error ( " The requested major:minor pair "
" (% " PRIu32 " :% " PRIu32 " ) is already used " ,
lv - > major , lv - > minor ) ;
return 0 ;
}
}
2010-04-23 18:16:32 +04:00
if ( info . exists & & ! dm_tree_add_dev_with_udev_flags ( dtree , info . major , info . minor ,
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
_get_udev_flags ( dm , lv , layer , 0 , 0 ) ) ) {
2005-11-09 16:08:41 +03:00
log_error ( " Failed to add device (% " PRIu32 " :% " PRIu32 " ) to dtree " ,
2005-11-09 01:52:26 +03:00
info . major , info . minor ) ;
return 0 ;
2003-04-25 02:09:13 +04:00
}
2014-11-13 12:08:40 +03:00
if ( info . exists & & dm - > track_pending_delete ) {
log_debug_activation ( " Tracking pending delete for %s (%s). " , lv - > name , dlid ) ;
if ( ! str_list_add ( dm - > mem , & dm - > pending_delete , dlid ) )
return_0 ;
}
2005-11-09 01:52:26 +03:00
return 1 ;
}
2010-05-21 18:47:58 +04:00
/*
* Add replicator devices
*
* Using _add_dev_to_dtree ( ) directly instead of _add_lv_to_dtree ( )
* to avoid extra checks with extensions .
*/
static int _add_partial_replicator_to_dtree ( struct dev_manager * dm ,
struct dm_tree * dtree ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv )
2010-05-21 18:47:58 +04:00
{
struct logical_volume * rlv = first_seg ( lv ) - > replicator ;
struct replicator_device * rdev ;
struct replicator_site * rsite ;
struct dm_tree_node * rep_node , * rdev_node ;
const char * uuid ;
if ( ! lv_is_active_replicator_dev ( lv ) ) {
if ( ! _add_dev_to_dtree ( dm , dtree , lv - > rdevice - > lv ,
NULL ) )
return_0 ;
return 1 ;
}
/* Add _rlog and replicator device */
if ( ! _add_dev_to_dtree ( dm , dtree , first_seg ( rlv ) - > rlog_lv , NULL ) )
return_0 ;
if ( ! _add_dev_to_dtree ( dm , dtree , rlv , NULL ) )
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( uuid = build_dm_uuid ( dm - > mem , rlv , NULL ) ) )
2010-05-21 18:47:58 +04:00
return_0 ;
rep_node = dm_tree_find_node_by_uuid ( dtree , uuid ) ;
/* Add all related devices for replicator */
dm_list_iterate_items ( rsite , & rlv - > rsites )
dm_list_iterate_items ( rdev , & rsite - > rdevices ) {
if ( rsite - > state = = REPLICATOR_STATE_ACTIVE ) {
/* Add _rimage LV */
if ( ! _add_dev_to_dtree ( dm , dtree , rdev - > lv , NULL ) )
return_0 ;
/* Add replicator-dev LV, except of the already added one */
if ( ( lv ! = rdev - > replicator_dev - > lv ) & &
! _add_dev_to_dtree ( dm , dtree ,
rdev - > replicator_dev - > lv , NULL ) )
return_0 ;
/* If replicator exists - try connect existing heads */
if ( rep_node ) {
uuid = build_dm_uuid ( dm - > mem ,
2014-03-11 20:13:47 +04:00
rdev - > replicator_dev - > lv ,
2010-05-21 18:47:58 +04:00
NULL ) ;
if ( ! uuid )
return_0 ;
rdev_node = dm_tree_find_node_by_uuid ( dtree , uuid ) ;
if ( rdev_node )
dm_tree_node_set_presuspend_node ( rdev_node ,
rep_node ) ;
}
}
if ( ! rdev - > rsite - > vg_name )
continue ;
if ( ! _add_dev_to_dtree ( dm , dtree , rdev - > lv , NULL ) )
return_0 ;
if ( rdev - > slog & &
! _add_dev_to_dtree ( dm , dtree , rdev - > slog , NULL ) )
return_0 ;
}
return 1 ;
}
2014-07-09 19:24:34 +04:00
struct pool_cb_data {
2012-03-03 01:49:43 +04:00
struct dev_manager * dm ;
2014-07-09 19:24:34 +04:00
const struct logical_volume * pool_lv ;
int skip_zero ; /* to skip zeroed device header (check first 64B) */
int exec ; /* which binary to call */
int opts ;
const char * global ;
2012-03-03 01:49:43 +04:00
} ;
2014-07-09 19:24:34 +04:00
static int _pool_callback ( struct dm_tree_node * node ,
dm_node_callback_t type , void * cb_data )
2012-03-03 01:49:43 +04:00
{
2014-07-09 19:24:34 +04:00
int ret , status , fd ;
2012-03-14 21:12:05 +04:00
const struct dm_config_node * cn ;
const struct dm_config_value * cv ;
2014-07-09 19:24:34 +04:00
const struct pool_cb_data * data = cb_data ;
const struct logical_volume * pool_lv = data - > pool_lv ;
const struct logical_volume * mlv = first_seg ( pool_lv ) - > metadata_lv ;
long buf [ 64 / sizeof ( long ) ] ; /* buffer for short disk header (64B) */
2012-03-14 21:12:05 +04:00
int args = 0 ;
2014-07-09 19:24:34 +04:00
const char * argv [ 19 ] = { /* Max supported 15 args */
find_config_tree_str_allow_empty ( pool_lv - > vg - > cmd , data - > exec , NULL ) /* argv[0] */
} ;
2012-03-03 01:49:43 +04:00
2014-07-09 19:24:34 +04:00
if ( ! * argv [ 0 ] )
2012-03-03 01:49:43 +04:00
return 1 ; /* Checking disabled */
config: {thin,cache}_{check,repair}_options are never undefined
Require global/{thin,cache}_{check,repair}_options to be always defined.
If not defined directly by user in the configuration and if there's no
concrete default option to use, make "" (empty string) the default one -
it's then clearly visible in the "lvmconfig --type default" (and
generated lvm.conf) and also it makes its handling in the code more
straightforward so we don't need to handle undefined values.
This means, if there are no default values for these settings defined,
we end up with this generated now:
{thin,cache}_{check,repair}_options = [ "" ]
So the value is never undefined and if it is, it's an error.
(The cache_repair_options is actually not used in the code at the moment,
but once the code using this setting is in, it will follow the same logic
as used for thin_repair_options.)
2015-07-14 11:03:19 +03:00
if ( ! ( cn = find_config_tree_array ( mlv - > vg - > cmd , data - > opts , NULL ) ) ) {
log_error ( INTERNAL_ERROR " Unable to find configuration for pool check options. " ) ;
return 0 ;
}
2015-07-08 12:22:24 +03:00
for ( cv = cn - > v ; cv & & args < 16 ; cv = cv - > next ) {
if ( cv - > type ! = DM_CFG_STRING ) {
log_error ( " Invalid string in config file: "
" global/%s_check_options " ,
data - > global ) ;
2012-03-14 21:12:05 +04:00
return 0 ;
}
2015-07-08 12:22:24 +03:00
argv [ + + args ] = cv - > v . str ;
2012-03-03 01:49:43 +04:00
}
2012-03-05 18:15:50 +04:00
if ( args = = 16 ) {
2014-07-09 19:24:34 +04:00
log_error ( " Too many options for %s command. " , argv [ 0 ] ) ;
2012-03-05 18:15:50 +04:00
return 0 ;
}
2012-03-14 21:12:05 +04:00
2014-07-09 19:24:34 +04:00
if ( ! ( argv [ + + args ] = lv_dmpath_dup ( data - > dm - > mem , mlv ) ) ) {
log_error ( " Failed to build pool metadata path. " ) ;
return 0 ;
}
2012-03-03 01:49:43 +04:00
2014-07-09 19:24:34 +04:00
if ( data - > skip_zero ) {
if ( ( fd = open ( argv [ args ] , O_RDONLY ) ) < 0 ) {
log_sys_error ( " open " , argv [ args ] ) ;
return 0 ;
}
/* let's assume there is no problem to read 64 bytes */
if ( read ( fd , buf , sizeof ( buf ) ) < sizeof ( buf ) ) {
2014-11-12 17:16:54 +03:00
log_sys_error ( " read " , argv [ args ] ) ;
2014-11-12 11:18:55 +03:00
if ( close ( fd ) )
log_sys_error ( " close " , argv [ args ] ) ;
2014-07-09 19:24:34 +04:00
return 0 ;
}
for ( ret = 0 ; ret < DM_ARRAY_SIZE ( buf ) ; + + ret )
if ( buf [ ret ] )
break ;
if ( close ( fd ) )
log_sys_error ( " close " , argv [ args ] ) ;
if ( ret = = DM_ARRAY_SIZE ( buf ) ) {
log_debug ( " %s skipped, detect empty disk header on %s. " ,
argv [ 0 ] , argv [ args ] ) ;
return 1 ;
}
}
if ( ! ( ret = exec_cmd ( pool_lv - > vg - > cmd , ( const char * const * ) argv ,
2012-03-03 01:49:43 +04:00
& status , 0 ) ) ) {
2012-03-05 18:15:50 +04:00
switch ( type ) {
case DM_NODE_CALLBACK_PRELOADED :
2014-07-09 19:24:34 +04:00
log_err_once ( " Check of pool %s failed (status:%d). "
" Manual repair required! " ,
display_lvname ( pool_lv ) , status ) ;
2012-03-05 18:15:50 +04:00
break ;
default :
2014-07-09 19:24:34 +04:00
log_warn ( " WARNING: Integrity check of metadata for pool "
" %s failed. " , display_lvname ( pool_lv ) ) ;
2012-03-05 18:15:50 +04:00
}
2012-03-03 01:49:43 +04:00
/*
* FIXME : What should we do here ? ?
*
* Maybe mark the node , so it ' s not activating
2014-07-09 19:24:34 +04:00
* as pool but as error / linear and let the
2012-03-03 01:49:43 +04:00
* dm tree resolve the issue .
*/
}
return ret ;
}
2014-07-09 19:24:34 +04:00
static int _pool_register_callback ( struct dev_manager * dm ,
struct dm_tree_node * node ,
const struct logical_volume * lv )
2012-03-03 01:49:43 +04:00
{
2014-07-09 19:24:34 +04:00
struct pool_cb_data * data ;
2012-03-03 01:49:43 +04:00
2014-11-04 11:51:58 +03:00
/* Do not skip metadata of testing even for unused thin pools */
#if 0
2014-07-09 19:24:34 +04:00
/* Skip metadata testing for unused thin pool. */
if ( lv_is_thin_pool ( lv ) & &
( ! first_seg ( lv ) - > transaction_id | |
( ( first_seg ( lv ) - > transaction_id = = 1 ) & &
pool_has_message ( first_seg ( lv ) , NULL , 0 ) ) ) )
2012-03-03 01:49:43 +04:00
return 1 ;
2014-11-04 11:51:58 +03:00
# endif
2012-03-03 01:49:43 +04:00
2014-07-09 19:24:34 +04:00
if ( ! ( data = dm_pool_zalloc ( dm - > mem , sizeof ( * data ) ) ) ) {
2012-03-03 01:49:43 +04:00
log_error ( " Failed to allocated path for callback. " ) ;
return 0 ;
}
data - > dm = dm ;
2014-07-09 19:24:34 +04:00
if ( lv_is_thin_pool ( lv ) ) {
data - > pool_lv = lv ;
data - > skip_zero = 1 ;
data - > exec = global_thin_check_executable_CFG ;
data - > opts = global_thin_check_options_CFG ;
data - > global = " thin " ;
} else if ( lv_is_cache ( lv ) ) { /* cache pool */
data - > pool_lv = first_seg ( lv ) - > pool_lv ;
data - > skip_zero = dm - > activation ;
data - > exec = global_cache_check_executable_CFG ;
data - > opts = global_cache_check_options_CFG ;
data - > global = " cache " ;
} else {
log_error ( INTERNAL_ERROR " Registering unsupported pool callback. " ) ;
return 0 ;
}
dm_tree_node_set_callback ( node , _pool_callback , data ) ;
2012-03-03 01:49:43 +04:00
return 1 ;
}
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
/* Declaration to resolve suspend tree and message passing for thin-pool */
static int _add_target_to_dtree ( struct dev_manager * dm ,
struct dm_tree_node * dnode ,
struct lv_segment * seg ,
struct lv_activate_opts * laopts ) ;
2005-11-09 01:52:26 +03:00
/*
* Add LV and any known dependencies
*/
2011-08-11 08:18:17 +04:00
static int _add_lv_to_dtree ( struct dev_manager * dm , struct dm_tree * dtree ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv , int origin_only )
2005-11-09 01:52:26 +03:00
{
2011-08-11 08:18:17 +04:00
uint32_t s ;
2011-06-11 04:03:06 +04:00
struct seg_list * sl ;
2013-02-21 13:39:47 +04:00
struct dm_list * snh ;
2013-04-19 23:09:28 +04:00
struct lv_segment * seg ;
2014-07-09 19:24:34 +04:00
struct dm_tree_node * node ;
2012-01-25 13:10:13 +04:00
const char * uuid ;
2011-06-11 04:03:06 +04:00
2014-04-01 19:53:18 +04:00
if ( lv_is_cache_pool ( lv ) ) {
2014-11-02 21:34:50 +03:00
if ( ! dm_list_empty ( & lv - > segs_using_this_lv ) ) {
if ( ! _add_lv_to_dtree ( dm , dtree , seg_lv ( first_seg ( lv ) , 0 ) , 0 ) )
return_0 ;
if ( ! _add_lv_to_dtree ( dm , dtree , first_seg ( lv ) - > metadata_lv , 0 ) )
return_0 ;
2014-11-11 15:31:25 +03:00
/* Cache pool does not have a real device node */
return 1 ;
}
/* Unused cache pool is activated as metadata */
2014-04-01 19:53:18 +04:00
}
2013-02-02 03:44:07 +04:00
if ( ! origin_only & & ! _add_dev_to_dtree ( dm , dtree , lv , NULL ) )
2005-11-09 01:52:26 +03:00
return_0 ;
/* FIXME Can we avoid doing this every time? */
2013-02-21 13:25:44 +04:00
/* Reused also for lv_is_external_origin(lv) */
2005-11-09 16:08:41 +03:00
if ( ! _add_dev_to_dtree ( dm , dtree , lv , " real " ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2010-08-17 20:25:32 +04:00
if ( ! origin_only & & ! _add_dev_to_dtree ( dm , dtree , lv , " cow " ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2003-04-25 02:09:13 +04:00
2013-02-21 13:39:47 +04:00
if ( origin_only & & lv_is_thin_volume ( lv ) ) {
2013-07-15 13:47:10 +04:00
if ( ! _add_dev_to_dtree ( dm , dtree , lv , lv_layer ( lv ) ) )
2013-02-21 13:39:47 +04:00
return_0 ;
#if 0
/* ? Use origin_only to avoid 'deep' thin pool suspend ? */
2013-07-15 15:05:15 +04:00
/* FIXME Implement dm_tree_node_skip_childrens optimisation */
2014-03-11 20:13:47 +04:00
if ( ! ( uuid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2013-02-21 13:39:47 +04:00
return_0 ;
2014-07-09 19:24:34 +04:00
if ( ( node = dm_tree_find_node_by_uuid ( dtree , uuid ) ) )
dm_tree_node_skip_childrens ( node , 1 ) ;
2013-02-21 13:39:47 +04:00
# endif
}
2013-07-09 14:34:49 +04:00
if ( origin_only & & dm - > activation & & ! dm - > skip_external_lv & &
lv_is_external_origin ( lv ) ) {
/* Find possible users of external origin lv */
dm - > skip_external_lv = 1 ; /* avoid recursion */
dm_list_iterate_items ( sl , & lv - > segs_using_this_lv )
/* Match only external_lv users */
if ( ( sl - > seg - > external_lv = = lv ) & &
! _add_lv_to_dtree ( dm , dtree , sl - > seg - > lv , 1 ) )
return_0 ;
dm - > skip_external_lv = 0 ;
}
2013-02-21 13:39:47 +04:00
if ( lv_is_thin_pool ( lv ) ) {
2013-07-09 14:34:49 +04:00
/*
* For both origin_only and ! origin_only
* skips test for - tpool - real and tpool - cow
*/
2013-02-21 13:39:47 +04:00
if ( ! _add_dev_to_dtree ( dm , dtree , lv , lv_layer ( lv ) ) )
return_0 ;
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
/*
* TODO : change API and move this code
* Could be easier to handle this in _add_dev_to_dtree ( )
* and base this according to info . exists ?
*/
2013-07-09 14:34:49 +04:00
if ( ! dm - > activation ) {
2014-03-11 20:13:47 +04:00
if ( ! ( uuid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2013-07-09 14:34:49 +04:00
return_0 ;
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
if ( ( node = dm_tree_find_node_by_uuid ( dtree , uuid ) ) ) {
if ( origin_only ) {
struct lv_activate_opts laopts = {
. origin_only = 1 ,
. send_messages = 1 /* Node with messages */
} ;
/*
* Add some messsages if right node exist in the table only
* when building SUSPEND tree for origin - only thin - pool .
*
* TODO : Fix call of ' _add_target_to_dtree ( ) ' to add message
* to thin - pool node as we already know the pool node exists
* in the table . Any better / cleaner API way ?
*
* Probably some ' new ' target method to add messages for any node ?
*/
if ( dm - > suspend & &
! dm_list_empty ( & ( first_seg ( lv ) - > thin_messages ) ) & &
! _add_target_to_dtree ( dm , node , first_seg ( lv ) , & laopts ) )
return_0 ;
} else {
/* Setup callback for non-activation partial tree */
/* Activation gets own callback when needed */
/* TODO: extend _cached_dm_info() to return dnode */
if ( ! _pool_register_callback ( dm , node , lv ) )
return_0 ;
}
}
2014-07-09 19:24:34 +04:00
}
}
2014-11-10 12:56:43 +03:00
if ( lv_is_cache ( lv ) ) {
2014-11-13 12:08:40 +03:00
if ( ! origin_only & & ! dm - > activation & & ! dm - > track_pending_delete ) {
2014-07-09 19:24:34 +04:00
/* Setup callback for non-activation partial tree */
/* Activation gets own callback when needed */
2014-11-02 22:59:57 +03:00
/* TODO: extend _cached_dm_info() to return dnode */
2014-07-09 19:24:34 +04:00
if ( ! ( uuid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
return_0 ;
if ( ( node = dm_tree_find_node_by_uuid ( dtree , uuid ) ) & &
! _pool_register_callback ( dm , node , lv ) )
2013-07-09 14:34:49 +04:00
return_0 ;
}
2013-02-21 13:39:47 +04:00
}
2006-04-28 21:01:07 +04:00
2013-02-21 13:39:47 +04:00
/* Add any snapshots of this LV */
2013-07-09 14:34:49 +04:00
if ( ! origin_only & & lv_is_origin ( lv ) )
2013-02-21 13:39:47 +04:00
dm_list_iterate ( snh , & lv - > snapshot_segs )
if ( ! _add_lv_to_dtree ( dm , dtree , dm_list_struct_base ( snh , struct lv_segment , origin_list ) - > cow , 0 ) )
2011-08-11 08:18:17 +04:00
return_0 ;
2013-11-30 00:18:34 +04:00
if ( dm - > activation & & ! origin_only & & lv_is_merging_origin ( lv ) & &
! _add_lv_to_dtree ( dm , dtree , find_snapshot ( lv ) - > lv , 1 ) )
return_0 ;
2011-08-11 08:18:17 +04:00
2011-06-11 04:03:06 +04:00
/* Add any LVs referencing a PVMOVE LV unless told not to. */
2014-09-16 00:33:53 +04:00
if ( dm - > track_pvmove_deps & & lv_is_pvmove ( lv ) ) {
2013-08-29 01:56:23 +04:00
dm - > track_pvmove_deps = 0 ;
2011-06-11 04:03:06 +04:00
dm_list_iterate_items ( sl , & lv - > segs_using_this_lv )
2013-08-29 01:56:23 +04:00
if ( ! _add_lv_to_dtree ( dm , dtree , sl - > seg - > lv , origin_only ) )
2011-06-11 04:03:06 +04:00
return_0 ;
2013-08-29 01:56:23 +04:00
dm - > track_pvmove_deps = 1 ;
}
2011-06-11 04:03:06 +04:00
2014-11-13 12:08:40 +03:00
if ( ! dm - > track_pending_delete )
dm_list_iterate_items ( sl , & lv - > segs_using_this_lv ) {
if ( lv_is_pending_delete ( sl - > seg - > lv ) ) {
/* LV is referenced by 'cache pending delete LV */
dm - > track_pending_delete = 1 ;
if ( ! _add_lv_to_dtree ( dm , dtree , sl - > seg - > lv , origin_only ) )
return_0 ;
dm - > track_pending_delete = 0 ;
}
2014-11-10 12:56:43 +03:00
}
2010-05-21 18:47:58 +04:00
/* Adding LV head of replicator adds all other related devs */
if ( lv_is_replicator_dev ( lv ) & &
! _add_partial_replicator_to_dtree ( dm , dtree , lv ) )
return_0 ;
2013-02-21 13:39:47 +04:00
/* Add any LVs used by segments in this LV */
dm_list_iterate_items ( seg , & lv - > segments ) {
2013-07-09 14:34:49 +04:00
if ( seg - > external_lv & & ! dm - > skip_external_lv & &
2013-02-21 13:25:44 +04:00
! _add_lv_to_dtree ( dm , dtree , seg - > external_lv , 1 ) ) /* stack */
return_0 ;
2013-02-21 13:39:47 +04:00
if ( seg - > log_lv & &
2013-07-15 15:39:53 +04:00
! _add_lv_to_dtree ( dm , dtree , seg - > log_lv , 0 ) )
2011-11-03 18:52:09 +04:00
return_0 ;
2013-02-21 13:39:47 +04:00
if ( seg - > metadata_lv & &
2013-07-15 15:39:53 +04:00
! _add_lv_to_dtree ( dm , dtree , seg - > metadata_lv , 0 ) )
2012-03-03 01:49:43 +04:00
return_0 ;
2014-04-01 19:53:18 +04:00
if ( seg - > pool_lv & &
( lv_is_cache_pool ( seg - > pool_lv ) | | ! dm - > skip_external_lv ) & &
2013-02-21 13:39:47 +04:00
! _add_lv_to_dtree ( dm , dtree , seg - > pool_lv , 1 ) ) /* stack */
2012-03-03 01:49:43 +04:00
return_0 ;
2013-02-21 13:39:47 +04:00
for ( s = 0 ; s < seg - > area_count ; s + + ) {
if ( seg_type ( seg , s ) = = AREA_LV & & seg_lv ( seg , s ) & &
2014-11-13 12:08:40 +03:00
/* origin only for cache without pending delete */
( ! dm - > track_pending_delete | | ! lv_is_cache ( lv ) ) & &
2013-02-21 13:39:47 +04:00
! _add_lv_to_dtree ( dm , dtree , seg_lv ( seg , s ) , 0 ) )
return_0 ;
if ( seg_is_raid ( seg ) & &
! _add_lv_to_dtree ( dm , dtree , seg_metalv ( seg , s ) , 0 ) )
return_0 ;
}
2013-11-30 00:18:34 +04:00
/* When activating, detect merging LV presence */
if ( dm - > activation & & seg - > merge_lv & &
! _add_lv_to_dtree ( dm , dtree , seg - > merge_lv , 1 ) )
return_0 ;
2012-01-25 13:10:13 +04:00
}
2011-10-03 22:24:47 +04:00
2003-04-25 02:09:13 +04:00
return 1 ;
}
2014-09-22 17:50:07 +04:00
static struct dm_tree * _create_partial_dtree ( struct dev_manager * dm , const struct logical_volume * lv , int origin_only )
2004-05-05 22:11:43 +04:00
{
2005-11-09 16:05:17 +03:00
struct dm_tree * dtree ;
2004-05-05 22:11:43 +04:00
2005-11-09 16:05:17 +03:00
if ( ! ( dtree = dm_tree_create ( ) ) ) {
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Partial dtree creation failed for %s. " , lv - > name ) ;
2005-11-09 01:52:26 +03:00
return NULL ;
}
2004-05-05 22:11:43 +04:00
2014-07-31 00:55:11 +04:00
dm_tree_set_optional_uuid_suffixes ( dtree , & uuid_suffix_list [ 0 ] ) ;
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
if ( ! _add_lv_to_dtree ( dm , dtree , lv , ( lv_is_origin ( lv ) | | lv_is_thin_volume ( lv ) | | lv_is_thin_pool ( lv ) ) ? origin_only : 0 ) )
2008-01-30 16:19:47 +03:00
goto_bad ;
2004-05-05 22:11:43 +04:00
2005-11-09 01:52:26 +03:00
return dtree ;
2004-05-05 22:11:43 +04:00
2008-01-30 16:19:47 +03:00
bad :
2005-11-09 16:05:17 +03:00
dm_tree_free ( dtree ) ;
2005-11-09 01:52:26 +03:00
return NULL ;
2004-05-05 22:11:43 +04:00
}
2008-09-19 10:42:00 +04:00
static char * _add_error_device ( struct dev_manager * dm , struct dm_tree * dtree ,
struct lv_segment * seg , int s )
{
2012-04-24 04:51:26 +04:00
char * dlid , * name ;
2008-09-19 10:42:00 +04:00
char errid [ 32 ] ;
struct dm_tree_node * node ;
struct lv_segment * seg_i ;
2012-04-24 04:51:26 +04:00
struct dm_info info ;
2011-03-30 00:30:05 +04:00
int segno = - 1 , i = 0 ;
2012-03-05 19:05:24 +04:00
uint64_t size = ( uint64_t ) seg - > len * seg - > lv - > vg - > extent_size ;
2008-09-19 10:42:00 +04:00
2008-11-04 01:14:30 +03:00
dm_list_iterate_items ( seg_i , & seg - > lv - > segments ) {
2008-09-19 10:42:00 +04:00
if ( seg = = seg_i )
segno = i ;
+ + i ;
}
if ( segno < 0 ) {
log_error ( " _add_error_device called with bad segment " ) ;
2011-10-11 13:02:20 +04:00
return NULL ;
2008-09-19 10:42:00 +04:00
}
sprintf ( errid , " missing_%d_%d " , segno , s ) ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , seg - > lv , errid ) ) )
2008-09-19 10:42:00 +04:00
return_NULL ;
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , seg - > lv - > vg - > name ,
2008-09-19 10:42:00 +04:00
seg - > lv - > name , errid ) ) )
return_NULL ;
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Getting device info for %s [%s] " , name , dlid ) ;
2014-11-04 17:00:32 +03:00
if ( ! _info ( dlid , 1 , 0 , & info , NULL , NULL ) ) {
2012-04-24 04:51:26 +04:00
log_error ( " Failed to get info for %s [%s]. " , name , dlid ) ;
return 0 ;
}
if ( ! info . exists ) {
/* Create new node */
if ( ! ( node = dm_tree_add_new_dev ( dtree , name , dlid , 0 , 0 , 0 , 0 , 0 ) ) )
return_NULL ;
if ( ! dm_tree_node_add_error_target ( node , size ) )
return_NULL ;
} else {
/* Already exists */
if ( ! dm_tree_add_dev ( dtree , info . major , info . minor ) ) {
log_error ( " Failed to add device (% " PRIu32 " :% " PRIu32 " ) to dtree " ,
info . major , info . minor ) ;
return_NULL ;
}
}
return dlid ;
2008-09-19 10:42:00 +04:00
}
static int _add_error_area ( struct dev_manager * dm , struct dm_tree_node * node ,
struct lv_segment * seg , int s )
{
char * dlid ;
uint64_t extent_size = seg - > lv - > vg - > extent_size ;
if ( ! strcmp ( dm - > cmd - > stripe_filler , " error " ) ) {
/*
* FIXME , the tree pointer is first field of dm_tree_node , but
* we don ' t have the struct definition available .
*/
struct dm_tree * * tree = ( struct dm_tree * * ) node ;
2011-10-11 13:03:33 +04:00
if ( ! ( dlid = _add_error_device ( dm , * tree , seg , s ) ) )
2008-09-19 10:42:00 +04:00
return_0 ;
2011-07-06 03:10:14 +04:00
if ( ! dm_tree_node_add_target_area ( node , NULL , dlid , extent_size * seg_le ( seg , s ) ) )
return_0 ;
2008-09-19 11:18:03 +04:00
} else
2011-07-06 03:10:14 +04:00
if ( ! dm_tree_node_add_target_area ( node , dm - > cmd - > stripe_filler , NULL , UINT64_C ( 0 ) ) )
return_0 ;
2008-09-19 11:18:03 +04:00
2008-09-19 10:42:00 +04:00
return 1 ;
}
2005-11-09 01:52:26 +03:00
int add_areas_line ( struct dev_manager * dm , struct lv_segment * seg ,
2006-05-10 01:23:51 +04:00
struct dm_tree_node * node , uint32_t start_area ,
uint32_t areas )
2002-03-11 14:27:48 +03:00
{
2005-11-09 01:52:26 +03:00
uint64_t extent_size = seg - > lv - > vg - > extent_size ;
uint32_t s ;
2002-03-19 02:25:50 +03:00
char * dlid ;
2011-07-06 04:29:44 +04:00
struct stat info ;
const char * name ;
2012-02-01 17:47:27 +04:00
unsigned num_error_areas = 0 ;
unsigned num_existing_areas = 0 ;
2002-03-11 14:27:48 +03:00
2011-07-06 04:29:44 +04:00
/* FIXME Avoid repeating identical stat in dm_tree_node_add_target_area */
2005-11-09 01:52:26 +03:00
for ( s = start_area ; s < areas ; s + + ) {
if ( ( seg_type ( seg , s ) = = AREA_PV & &
2011-07-06 04:29:44 +04:00
( ! seg_pvseg ( seg , s ) | | ! seg_pv ( seg , s ) | | ! seg_dev ( seg , s ) | |
! ( name = dev_name ( seg_dev ( seg , s ) ) ) | | ! * name | |
stat ( name , & info ) < 0 | | ! S_ISBLK ( info . st_mode ) ) ) | |
2008-09-19 10:42:00 +04:00
( seg_type ( seg , s ) = = AREA_LV & & ! seg_lv ( seg , s ) ) ) {
2011-07-06 04:29:44 +04:00
if ( ! seg - > lv - > vg - > cmd - > partial_activation ) {
activation: Add "degraded" activation mode
Currently, we have two modes of activation, an unnamed nominal mode
(which I will refer to as "complete") and "partial" mode. The
"complete" mode requires that a volume group be 'complete' - that
is, no missing PVs. If there are any missing PVs, no affected LVs
are allowed to activate - even RAID LVs which might be able to
tolerate a failure. The "partial" mode allows anything to be
activated (or at least attempted). If a non-redundant LV is
missing a portion of its addressable space due to a device failure,
it will be replaced with an error target. RAID LVs will either
activate or fail to activate depending on how badly their
redundancy is compromised.
This patch adds a third option, "degraded" mode. This mode can
be selected via the '--activationmode {complete|degraded|partial}'
option to lvchange/vgchange. It can also be set in lvm.conf.
The "degraded" activation mode allows RAID LVs with a sufficient
level of redundancy to activate (e.g. a RAID5 LV with one device
failure, a RAID6 with two device failures, or RAID1 with n-1
failures). RAID LVs with too many device failures are not allowed
to activate - nor are any non-redundant LVs that may have been
affected. This patch also makes the "degraded" mode the default
activation mode.
The degraded activation mode does not yet work in a cluster. A
new cluster lock flag (LCK_DEGRADED_MODE) will need to be created
to make that work. Currently, there is limited space for this
extra flag and I am looking for possible solutions. One possible
solution is to usurp LCK_CONVERT, as it is not used. When the
locking_type is 3, the degraded mode flag simply gets dropped and
the old ("complete") behavior is exhibited.
2014-07-10 07:56:11 +04:00
if ( ! seg - > lv - > vg - > cmd - > degraded_activation | |
! lv_is_raid_type ( seg - > lv ) ) {
log_error ( " Aborting. LV %s is now incomplete "
" and '--activationmode partial' was not specified. " , seg - > lv - > name ) ;
return 0 ;
}
2011-07-06 04:29:44 +04:00
}
2008-09-19 10:42:00 +04:00
if ( ! _add_error_area ( dm , node , seg , s ) )
return_0 ;
2012-02-01 17:47:27 +04:00
num_error_areas + + ;
2011-07-06 03:10:14 +04:00
} else if ( seg_type ( seg , s ) = = AREA_PV ) {
if ( ! dm_tree_node_add_target_area ( node , dev_name ( seg_dev ( seg , s ) ) , NULL ,
( seg_pv ( seg , s ) - > pe_start + ( extent_size * seg_pe ( seg , s ) ) ) ) )
return_0 ;
2012-02-01 17:47:27 +04:00
num_existing_areas + + ;
2011-08-18 23:38:26 +04:00
} else if ( seg_is_raid ( seg ) ) {
/*
* RAID can handle unassigned areas . It simple puts
* ' - - ' in for the metadata / data device pair . This
* is a valid way to indicate to the RAID target that
* the device is missing .
*
* If an image is marked as VISIBLE_LV and ! LVM_WRITE ,
* it means the device has temporarily been extracted
* from the array . It may come back at a future date ,
* so the bitmap must track differences . Again , ' - - '
* is used in the CTR table .
*/
if ( ( seg_type ( seg , s ) = = AREA_UNASSIGNED ) | |
2015-01-28 15:34:12 +03:00
( lv_is_visible ( seg_lv ( seg , s ) ) & &
2011-08-18 23:38:26 +04:00
! ( seg_lv ( seg , s ) - > status & LVM_WRITE ) ) ) {
/* One each for metadata area and data area */
if ( ! dm_tree_node_add_null_area ( node , 0 ) | |
! dm_tree_node_add_null_area ( node , 0 ) )
2011-08-11 23:38:00 +04:00
return_0 ;
2011-08-18 23:38:26 +04:00
continue ;
2011-08-03 02:07:20 +04:00
}
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , seg_metalv ( seg , s ) , NULL ) ) )
2011-08-18 23:38:26 +04:00
return_0 ;
if ( ! dm_tree_node_add_target_area ( node , NULL , dlid , extent_size * seg_metale ( seg , s ) ) )
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , seg_lv ( seg , s ) , NULL ) ) )
2011-08-18 23:38:26 +04:00
return_0 ;
if ( ! dm_tree_node_add_target_area ( node , NULL , dlid , extent_size * seg_le ( seg , s ) ) )
return_0 ;
} else if ( seg_type ( seg , s ) = = AREA_LV ) {
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , seg_lv ( seg , s ) , NULL ) ) )
2011-07-06 03:10:14 +04:00
return_0 ;
if ( ! dm_tree_node_add_target_area ( node , NULL , dlid , extent_size * seg_le ( seg , s ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
} else {
2009-12-16 22:22:11 +03:00
log_error ( INTERNAL_ERROR " Unassigned area found in LV %s. " ,
2005-11-09 01:52:26 +03:00
seg - > lv - > name ) ;
2002-03-11 14:27:48 +03:00
return 0 ;
}
2005-11-09 01:52:26 +03:00
}
2002-03-11 14:27:48 +03:00
2012-02-01 17:47:27 +04:00
if ( num_error_areas ) {
/* Thins currently do not support partial activation */
if ( lv_is_thin_type ( seg - > lv ) ) {
log_error ( " Cannot activate %s%s: pool incomplete. " ,
seg - > lv - > vg - > name , seg - > lv - > name ) ;
return 0 ;
}
}
2005-11-09 01:52:26 +03:00
return 1 ;
}
2002-03-11 14:27:48 +03:00
2013-02-21 13:24:28 +04:00
static int _add_layer_target_to_dtree ( struct dev_manager * dm ,
struct dm_tree_node * dnode ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv )
2013-02-21 13:24:28 +04:00
{
const char * layer_dlid ;
2014-03-11 20:13:47 +04:00
if ( ! ( layer_dlid = build_dm_uuid ( dm - > mem , lv , lv_layer ( lv ) ) ) )
2013-02-21 13:24:28 +04:00
return_0 ;
/* Add linear mapping over layered LV */
if ( ! add_linear_area_to_dtree ( dnode , lv - > size , lv - > vg - > extent_size ,
lv - > vg - > cmd - > use_linear_target ,
lv - > vg - > name , lv - > name ) | |
! dm_tree_node_add_target_area ( dnode , NULL , layer_dlid , 0 ) )
return_0 ;
return 1 ;
}
2005-11-09 16:08:41 +03:00
static int _add_origin_target_to_dtree ( struct dev_manager * dm ,
2014-09-22 17:50:07 +04:00
struct dm_tree_node * dnode ,
const struct logical_volume * lv )
2005-11-09 01:52:26 +03:00
{
const char * real_dlid ;
2002-03-11 23:36:04 +03:00
2014-03-11 20:13:47 +04:00
if ( ! ( real_dlid = build_dm_uuid ( dm - > mem , lv , " real " ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2003-04-25 02:09:13 +04:00
2005-11-09 16:05:17 +03:00
if ( ! dm_tree_node_add_snapshot_origin_target ( dnode , lv - > size , real_dlid ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2002-03-11 14:27:48 +03:00
return 1 ;
}
2010-01-13 04:43:32 +03:00
static int _add_snapshot_merge_target_to_dtree ( struct dev_manager * dm ,
struct dm_tree_node * dnode ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv )
2010-01-13 04:43:32 +03:00
{
const char * origin_dlid , * cow_dlid , * merge_dlid ;
2013-11-28 14:39:38 +04:00
struct lv_segment * merging_snap_seg = find_snapshot ( lv ) ;
2013-07-19 19:28:43 +04:00
2013-11-28 14:39:38 +04:00
if ( ! lv_is_merging_origin ( lv ) ) {
2013-07-19 19:28:43 +04:00
log_error ( INTERNAL_ERROR " LV %s is not merging snapshot. " , lv - > name ) ;
return 0 ;
}
2010-01-13 04:43:32 +03:00
2014-03-11 20:13:47 +04:00
if ( ! ( origin_dlid = build_dm_uuid ( dm - > mem , lv , " real " ) ) )
2010-01-13 04:43:32 +03:00
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( cow_dlid = build_dm_uuid ( dm - > mem , merging_snap_seg - > cow , " cow " ) ) )
2010-01-13 04:43:32 +03:00
return_0 ;
2014-03-11 20:13:47 +04:00
if ( ! ( merge_dlid = build_dm_uuid ( dm - > mem , merging_snap_seg - > cow , NULL ) ) )
2010-01-13 04:43:32 +03:00
return_0 ;
if ( ! dm_tree_node_add_snapshot_merge_target ( dnode , lv - > size , origin_dlid ,
cow_dlid , merge_dlid ,
2013-07-03 00:26:03 +04:00
merging_snap_seg - > chunk_size ) )
2010-01-13 04:43:32 +03:00
return_0 ;
return 1 ;
}
2005-11-09 16:08:41 +03:00
static int _add_snapshot_target_to_dtree ( struct dev_manager * dm ,
2011-06-17 18:14:19 +04:00
struct dm_tree_node * dnode ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts )
2002-03-11 14:27:48 +03:00
{
2005-11-09 01:52:26 +03:00
const char * origin_dlid ;
const char * cow_dlid ;
struct lv_segment * snap_seg ;
uint64_t size ;
2002-03-11 14:27:48 +03:00
2013-07-03 00:26:03 +04:00
if ( ! ( snap_seg = find_snapshot ( lv ) ) ) {
2005-11-09 01:52:26 +03:00
log_error ( " Couldn't find snapshot for '%s'. " , lv - > name ) ;
2005-10-26 23:50:00 +04:00
return 0 ;
2002-03-11 14:27:48 +03:00
}
2014-03-11 20:13:47 +04:00
if ( ! ( origin_dlid = build_dm_uuid ( dm - > mem , snap_seg - > origin , " real " ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2003-04-25 02:09:13 +04:00
2014-03-11 20:13:47 +04:00
if ( ! ( cow_dlid = build_dm_uuid ( dm - > mem , snap_seg - > cow , " cow " ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2002-03-27 21:17:43 +03:00
2005-11-09 01:52:26 +03:00
size = ( uint64_t ) snap_seg - > len * snap_seg - > origin - > vg - > extent_size ;
2004-05-05 22:11:43 +04:00
2011-06-17 18:22:48 +04:00
if ( ! laopts - > no_merging & & lv_is_merging_cow ( lv ) ) {
2010-02-18 01:59:46 +03:00
/* cow is to be merged so load the error target */
if ( ! dm_tree_node_add_error_target ( dnode , size ) )
return_0 ;
}
else if ( ! dm_tree_node_add_snapshot_target ( dnode , size , origin_dlid ,
cow_dlid , 1 , snap_seg - > chunk_size ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2002-03-11 14:27:48 +03:00
return 1 ;
}
2005-11-09 16:08:41 +03:00
static int _add_target_to_dtree ( struct dev_manager * dm ,
2011-06-17 18:14:19 +04:00
struct dm_tree_node * dnode ,
struct lv_segment * seg ,
struct lv_activate_opts * laopts )
2002-03-11 14:27:48 +03:00
{
2005-11-09 01:52:26 +03:00
uint64_t extent_size = seg - > lv - > vg - > extent_size ;
if ( ! seg - > segtype - > ops - > add_target_line ) {
2009-12-16 22:22:11 +03:00
log_error ( INTERNAL_ERROR " _emit_target cannot handle "
2005-11-09 01:52:26 +03:00
" segment type %s " , seg - > segtype - > name ) ;
return 0 ;
}
2006-05-16 20:48:31 +04:00
return seg - > segtype - > ops - > add_target_line ( dm , dm - > mem , dm - > cmd ,
2005-11-09 01:52:26 +03:00
& dm - > target_state , seg ,
2011-06-17 18:14:19 +04:00
laopts , dnode ,
2005-11-09 01:52:26 +03:00
extent_size * seg - > len ,
2014-08-26 13:39:51 +04:00
& dm - > pvmove_mirror_count ) ;
2002-03-16 01:59:12 +03:00
}
2002-03-11 14:27:48 +03:00
2005-11-09 16:08:41 +03:00
static int _add_new_lv_to_dtree ( struct dev_manager * dm , struct dm_tree * dtree ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts ,
const char * layer ) ;
2005-11-09 01:52:26 +03:00
2010-05-24 13:01:05 +04:00
/* Add all replicators' LVs */
static int _add_replicator_dev_target_to_dtree ( struct dev_manager * dm ,
struct dm_tree * dtree ,
2011-06-17 18:14:19 +04:00
struct lv_segment * seg ,
struct lv_activate_opts * laopts )
2010-05-24 13:01:05 +04:00
{
struct replicator_device * rdev ;
struct replicator_site * rsite ;
/* For inactive replicator add linear mapping */
if ( ! lv_is_active_replicator_dev ( seg - > lv ) ) {
2011-06-17 18:14:19 +04:00
if ( ! _add_new_lv_to_dtree ( dm , dtree , seg - > lv - > rdevice - > lv , laopts , NULL ) )
2010-05-24 13:01:05 +04:00
return_0 ;
return 1 ;
}
/* Add rlog and replicator nodes */
if ( ! seg - > replicator | |
2011-06-17 18:14:19 +04:00
! first_seg ( seg - > replicator ) - > rlog_lv | |
2010-05-24 13:01:05 +04:00
! _add_new_lv_to_dtree ( dm , dtree ,
2011-06-17 18:14:19 +04:00
first_seg ( seg - > replicator ) - > rlog_lv ,
laopts , NULL ) | |
! _add_new_lv_to_dtree ( dm , dtree , seg - > replicator , laopts , NULL ) )
2010-05-24 13:01:05 +04:00
return_0 ;
/* Activation of one replicator_dev node activates all other nodes */
dm_list_iterate_items ( rsite , & seg - > replicator - > rsites ) {
dm_list_iterate_items ( rdev , & rsite - > rdevices ) {
if ( rdev - > lv & &
2011-06-17 18:14:19 +04:00
! _add_new_lv_to_dtree ( dm , dtree , rdev - > lv ,
laopts , NULL ) )
2010-05-24 13:01:05 +04:00
return_0 ;
if ( rdev - > slog & &
2011-06-17 18:14:19 +04:00
! _add_new_lv_to_dtree ( dm , dtree , rdev - > slog ,
laopts , NULL ) )
2010-05-24 13:01:05 +04:00
return_0 ;
}
}
/* Add remaining replicator-dev nodes in the second loop
* to avoid multiple retries for inserting all elements */
dm_list_iterate_items ( rsite , & seg - > replicator - > rsites ) {
if ( rsite - > state ! = REPLICATOR_STATE_ACTIVE )
continue ;
dm_list_iterate_items ( rdev , & rsite - > rdevices ) {
if ( rdev - > replicator_dev - > lv = = seg - > lv )
continue ;
if ( ! rdev - > replicator_dev - > lv | |
! _add_new_lv_to_dtree ( dm , dtree ,
rdev - > replicator_dev - > lv ,
2011-06-17 18:14:19 +04:00
laopts , NULL ) )
2010-05-24 13:01:05 +04:00
return_0 ;
}
}
return 1 ;
}
2013-07-09 14:34:49 +04:00
static int _add_new_external_lv_to_dtree ( struct dev_manager * dm ,
struct dm_tree * dtree ,
struct logical_volume * external_lv ,
struct lv_activate_opts * laopts )
2013-02-21 13:25:44 +04:00
{
struct seg_list * sl ;
2013-07-09 14:34:49 +04:00
/* Do not want to recursively add externals again */
if ( dm - > skip_external_lv )
return 1 ;
2013-02-21 13:25:44 +04:00
2013-07-09 14:34:49 +04:00
/*
* Any LV can have only 1 external origin , so we will
* process all LVs related to this LV , and we want to
* skip repeated invocation of external lv processing
*/
dm - > skip_external_lv = 1 ;
log_debug_activation ( " Adding external origin lv %s and all active users. " ,
external_lv - > name ) ;
2013-02-21 13:25:44 +04:00
2013-07-09 14:34:49 +04:00
if ( ! _add_new_lv_to_dtree ( dm , dtree , external_lv , laopts ,
lv_layer ( external_lv ) ) )
return_0 ;
/*
* Add all ACTIVE LVs using this external origin LV . This is
* needed because of conversion of thin which could have been
* also an old - snapshot to external origin .
*/
//if (lv_is_origin(external_lv))
dm_list_iterate_items ( sl , & external_lv - > segs_using_this_lv )
if ( ( sl - > seg - > external_lv = = external_lv ) & &
/* Add only active layered devices (also avoids loop) */
2014-11-02 22:59:57 +03:00
_cached_dm_info ( dm - > mem , dtree , sl - > seg - > lv ,
lv_layer ( sl - > seg - > lv ) ) & &
2013-02-21 13:25:44 +04:00
! _add_new_lv_to_dtree ( dm , dtree , sl - > seg - > lv ,
laopts , lv_layer ( sl - > seg - > lv ) ) )
return_0 ;
2013-07-09 14:34:49 +04:00
log_debug_activation ( " Finished adding external origin lv %s and all active users. " ,
external_lv - > name ) ;
dm - > skip_external_lv = 0 ;
2013-02-21 13:25:44 +04:00
return 1 ;
}
2005-11-09 16:08:41 +03:00
static int _add_segment_to_dtree ( struct dev_manager * dm ,
2011-06-17 18:14:19 +04:00
struct dm_tree * dtree ,
struct dm_tree_node * dnode ,
struct lv_segment * seg ,
struct lv_activate_opts * laopts ,
const char * layer )
2003-11-12 22:16:48 +03:00
{
2005-11-09 01:52:26 +03:00
uint32_t s ;
2006-11-20 19:45:45 +03:00
struct lv_segment * seg_present ;
2014-11-13 12:08:40 +03:00
const struct segment_type * segtype ;
2010-10-14 01:26:37 +04:00
const char * target_name ;
2003-11-12 22:16:48 +03:00
2005-11-09 01:52:26 +03:00
/* Ensure required device-mapper targets are loaded */
2013-07-03 00:26:03 +04:00
seg_present = find_snapshot ( seg - > lv ) ? : seg ;
2014-11-13 12:08:40 +03:00
segtype = seg_present - > segtype ;
target_name = ( segtype - > ops - > target_name ?
segtype - > ops - > target_name ( seg_present , laopts ) :
segtype - > name ) ;
2006-11-20 19:45:45 +03:00
2013-01-08 02:30:29 +04:00
log_debug_activation ( " Checking kernel supports %s segment type for %s%s%s " ,
target_name , seg - > lv - > name ,
layer ? " - " : " " , layer ? : " " ) ;
2006-11-20 19:45:45 +03:00
2014-11-13 12:08:40 +03:00
if ( segtype - > ops - > target_present & &
! segtype - > ops - > target_present ( seg_present - > lv - > vg - > cmd ,
seg_present , NULL ) ) {
2010-10-14 01:26:37 +04:00
log_error ( " Can't process LV %s: %s target support missing "
" from kernel? " , seg - > lv - > name , target_name ) ;
2003-11-12 22:16:48 +03:00
return 0 ;
}
2013-02-21 13:25:44 +04:00
/* Add external origin layer */
2013-07-09 14:34:49 +04:00
if ( seg - > external_lv & &
! _add_new_external_lv_to_dtree ( dm , dtree , seg - > external_lv , laopts ) )
return_0 ;
2014-02-05 02:50:16 +04:00
2005-11-09 01:52:26 +03:00
/* Add mirror log */
if ( seg - > log_lv & &
2011-06-17 18:14:19 +04:00
! _add_new_lv_to_dtree ( dm , dtree , seg - > log_lv , laopts , NULL ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2014-02-05 02:50:16 +04:00
/* Add pool metadata */
2013-02-21 13:39:47 +04:00
if ( seg - > metadata_lv & &
! _add_new_lv_to_dtree ( dm , dtree , seg - > metadata_lv , laopts , NULL ) )
return_0 ;
2014-02-05 02:50:16 +04:00
/* Add pool layer */
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
if ( seg - > pool_lv & & ! laopts - > origin_only & &
2013-02-21 13:39:47 +04:00
! _add_new_lv_to_dtree ( dm , dtree , seg - > pool_lv , laopts ,
lv_layer ( seg - > pool_lv ) ) )
return_0 ;
2003-11-12 22:16:48 +03:00
2010-05-24 13:01:05 +04:00
if ( seg_is_replicator_dev ( seg ) ) {
2011-06-17 18:14:19 +04:00
if ( ! _add_replicator_dev_target_to_dtree ( dm , dtree , seg , laopts ) )
2010-05-24 13:01:05 +04:00
return_0 ;
2013-02-21 13:39:47 +04:00
}
/* Add any LVs used by this segment */
for ( s = 0 ; s < seg - > area_count ; + + s ) {
if ( ( seg_type ( seg , s ) = = AREA_LV ) & &
2014-11-13 12:08:40 +03:00
/* origin only for cache without pending delete */
( ! dm - > track_pending_delete | | ! seg_is_cache ( seg ) ) & &
! _add_new_lv_to_dtree ( dm , dtree , seg_lv ( seg , s ) ,
laopts , NULL ) )
2011-09-29 12:56:38 +04:00
return_0 ;
2013-02-21 13:39:47 +04:00
if ( seg_is_raid ( seg ) & &
! _add_new_lv_to_dtree ( dm , dtree , seg_metalv ( seg , s ) ,
laopts , NULL ) )
2011-09-29 12:56:38 +04:00
return_0 ;
2005-11-09 01:52:26 +03:00
}
2014-11-13 12:08:40 +03:00
if ( dm - > track_pending_delete ) {
/* Replace target and all its used devs with error mapping */
log_debug_activation ( " Using error for pending delete %s. " ,
seg - > lv - > name ) ;
if ( ! dm_tree_node_add_error_target ( dnode , ( uint64_t ) seg - > lv - > vg - > extent_size * seg - > len ) )
return_0 ;
} else if ( ! _add_target_to_dtree ( dm , dnode , seg , laopts ) )
2005-11-09 01:52:26 +03:00
return_0 ;
return 1 ;
2003-07-05 02:34:56 +04:00
}
2005-10-17 22:00:02 +04:00
2014-05-23 23:22:38 +04:00
#if 0
This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 18:45:40 +04:00
static int _set_udev_flags_for_children ( struct dev_manager * dm ,
struct volume_group * vg ,
struct dm_tree_node * dnode )
{
char * p ;
const char * uuid ;
void * handle = NULL ;
struct dm_tree_node * child ;
const struct dm_info * info ;
struct lv_list * lvl ;
while ( ( child = dm_tree_next_child ( & handle , dnode , 0 ) ) ) {
/* Ignore root node */
if ( ! ( info = dm_tree_node_get_info ( child ) ) | | ! info - > exists )
continue ;
if ( ! ( uuid = dm_tree_node_get_uuid ( child ) ) ) {
log_error ( INTERNAL_ERROR
" Failed to get uuid for % " PRIu32 " :% " PRIu32 ,
info - > major , info - > minor ) ;
continue ;
}
/* Ignore non-LVM devices */
if ( ! ( p = strstr ( uuid , UUID_PREFIX ) ) )
continue ;
p + = strlen ( UUID_PREFIX ) ;
/* Ignore LVs that belong to different VGs (due to stacking) */
if ( strncmp ( p , ( char * ) vg - > id . uuid , ID_LEN ) )
continue ;
/* Ignore LVM devices with 'layer' suffixes */
if ( strrchr ( p , ' - ' ) )
continue ;
if ( ! ( lvl = find_lv_in_vg_by_lvid ( vg , ( const union lvid * ) p ) ) ) {
log_error ( INTERNAL_ERROR
" %s (% " PRIu32 " :% " PRIu32 " ) not found in VG " ,
dm_tree_node_get_name ( child ) ,
info - > major , info - > minor ) ;
return 0 ;
}
dm_tree_node_set_udev_flags ( child ,
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
_get_udev_flags ( dm , lvl - > lv , NULL , 0 , 0 ) ) ;
This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 18:45:40 +04:00
}
return 1 ;
}
2014-05-23 23:22:38 +04:00
# endif
This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 18:45:40 +04:00
2005-11-09 16:08:41 +03:00
static int _add_new_lv_to_dtree ( struct dev_manager * dm , struct dm_tree * dtree ,
2014-09-22 17:50:07 +04:00
const struct logical_volume * lv , struct lv_activate_opts * laopts ,
2011-06-17 18:14:19 +04:00
const char * layer )
2005-10-17 22:21:05 +04:00
{
2005-11-09 01:52:26 +03:00
struct lv_segment * seg ;
struct lv_layer * lvlayer ;
2011-06-11 04:03:06 +04:00
struct seg_list * sl ;
2013-02-21 13:39:47 +04:00
struct dm_list * snh ;
2005-11-09 16:05:17 +03:00
struct dm_tree_node * dnode ;
2010-01-22 18:40:31 +03:00
const struct dm_info * dinfo ;
2010-02-18 01:59:46 +03:00
char * name , * dlid ;
2007-11-12 23:51:54 +03:00
uint32_t max_stripe_size = UINT32_C ( 0 ) ;
uint32_t read_ahead = lv - > read_ahead ;
2007-11-29 18:04:12 +03:00
uint32_t read_ahead_flags = UINT32_C ( 0 ) ;
2014-11-13 12:08:40 +03:00
int save_pending_delete = dm - > track_pending_delete ;
2005-10-17 22:21:05 +04:00
2014-11-11 13:00:51 +03:00
/* LV with pending delete is never put new into a table */
if ( lv_is_pending_delete ( lv ) & & ! _cached_dm_info ( dm - > mem , dtree , lv , NULL ) )
return 1 ; /* Replace with error only when already exists */
2014-11-02 21:34:50 +03:00
if ( lv_is_cache_pool ( lv ) & &
! dm_list_empty ( & lv - > segs_using_this_lv ) ) {
2014-04-01 19:53:18 +04:00
/* cache pool is 'meta' LV and does not have a real device node */
if ( ! _add_new_lv_to_dtree ( dm , dtree , seg_lv ( first_seg ( lv ) , 0 ) , laopts , NULL ) )
return_0 ;
if ( ! _add_new_lv_to_dtree ( dm , dtree , first_seg ( lv ) - > metadata_lv , laopts , NULL ) )
return_0 ;
return 1 ;
}
2010-01-14 17:39:57 +03:00
/* FIXME Seek a simpler way to lay out the snapshot-merge tree. */
2013-11-28 14:39:38 +04:00
if ( ! layer & & lv_is_merging_origin ( lv ) ) {
seg = find_snapshot ( lv ) ;
2010-01-13 04:54:34 +03:00
/*
* Clear merge attributes if merge isn ' t currently possible :
* either origin or merging snapshot are open
2010-01-22 18:40:31 +03:00
* - but use " snapshot-merge " if it is already in use
* - open_count is always retrieved ( as of dm - ioctl 4.7 .0 )
* so just use the tree ' s existing nodes ' info
2010-01-13 04:54:34 +03:00
*/
2013-02-24 22:42:40 +04:00
/* An activating merging origin won't have a node in the tree yet */
2014-11-02 22:59:57 +03:00
if ( ( ( dinfo = _cached_dm_info ( dm - > mem , dtree , lv , NULL ) ) & &
2013-02-18 03:08:38 +04:00
dinfo - > open_count ) | |
2014-11-02 22:59:57 +03:00
( ( dinfo = _cached_dm_info ( dm - > mem , dtree ,
seg_is_thin_volume ( seg ) ?
seg - > lv : seg - > cow , NULL ) ) & &
2013-02-18 03:08:38 +04:00
dinfo - > open_count ) ) {
2013-11-30 00:18:34 +04:00
if ( seg_is_thin_volume ( seg ) | |
/* FIXME Is there anything simpler to check for instead? */
2014-02-18 00:49:51 +04:00
! lv_has_target_type ( dm - > mem , lv , NULL , " snapshot-merge " ) )
2011-06-17 18:22:48 +04:00
laopts - > no_merging = 1 ;
2010-01-13 04:54:34 +03:00
}
}
2011-08-30 18:55:15 +04:00
if ( ! ( name = dm_build_dm_name ( dm - > mem , lv - > vg - > name , lv - > name , layer ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2005-10-17 22:21:05 +04:00
2014-11-04 12:33:35 +03:00
/* Even unused thin-pool still needs to get layered UUID -suffix */
if ( ! layer & & lv_is_new_thin_pool ( lv ) )
layer = lv_layer ( lv ) ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , layer ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2005-10-17 22:21:05 +04:00
2005-11-09 01:52:26 +03:00
/* We've already processed this node if it already has a context ptr */
2005-11-09 16:05:17 +03:00
if ( ( dnode = dm_tree_find_node_by_uuid ( dtree , dlid ) ) & &
dm_tree_node_get_context ( dnode ) )
2005-11-09 01:52:26 +03:00
return 1 ;
2005-10-17 22:21:05 +04:00
2005-11-09 01:52:26 +03:00
if ( ! ( lvlayer = dm_pool_alloc ( dm - > mem , sizeof ( * lvlayer ) ) ) ) {
2009-12-03 13:01:30 +03:00
log_error ( " _add_new_lv_to_dtree: pool alloc failed for %s %s. " ,
2010-02-18 01:59:46 +03:00
lv - > name , layer ) ;
2005-10-17 22:21:05 +04:00
return 0 ;
}
2005-11-09 01:52:26 +03:00
lvlayer - > lv = lv ;
/*
2005-11-09 16:08:41 +03:00
* Add LV to dtree .
2005-11-09 01:52:26 +03:00
* If we ' re working with precommitted metadata , clear any
* existing inactive table left behind .
* Major / minor settings only apply to the visible layer .
*/
2011-06-11 04:03:06 +04:00
/* FIXME Move the clear from here until later, so we can leave
* identical inactive tables untouched . ( For pvmove . )
*/
2009-10-22 17:00:07 +04:00
if ( ! ( dnode = dm_tree_add_new_dev_with_udev_flags ( dtree , name , dlid ,
2006-07-10 23:17:40 +04:00
layer ? UINT32_C ( 0 ) : ( uint32_t ) lv - > major ,
layer ? UINT32_C ( 0 ) : ( uint32_t ) lv - > minor ,
2012-01-12 05:51:56 +04:00
read_only_lv ( lv , laopts ) ,
2011-09-28 02:43:40 +04:00
( ( lv - > vg - > status & PRECOMMITTED ) | laopts - > revert ) ? 1 : 0 ,
2009-10-22 17:00:07 +04:00
lvlayer ,
activation: flag temporary LVs internally
Add LV_TEMPORARY flag for LVs with limited existence during command
execution. Such LVs are temporary in way that they need to be activated,
some action done and then removed immediately. Such LVs are just like
any normal LV - the only difference is that they are removed during
LVM command execution. This is also the case for LVs representing
future pool metadata spare LVs which we need to initialize by using
the usual LV before they are declared as pool metadata spare.
We can optimize some other parts like udev to do a better job if
it knows that the LV is temporary and any processing on it is just
useless.
This flag is orthogonal to LV_NOSCAN flag introduced recently
as LV_NOSCAN flag is primarily used to mark an LV for the scanning
to be avoided before the zeroing of the device happens. The LV_TEMPORARY
flag makes a difference between a full-fledged LV visible in the system
and the LV just used as a temporary overlay for some action that needs to
be done on underlying PVs.
For example: lvcreate --thinpool POOL --zero n -L 1G vg
- first, the usual LV is created to do a clean up for pool metadata
spare. The LV is activated, zeroed, deactivated.
- between "activated" and "zeroed" stage, the LV_NOSCAN flag is used
to avoid any scanning in udev
- betwen "zeroed" and "deactivated" stage, we need to avoid the WATCH
udev rule, but since the LV is just a usual LV, we can't make a
difference. The LV_TEMPORARY internal LV flag helps here. If we
create the LV with this flag, the DM_UDEV_DISABLE_DISK_RULES
and DM_UDEV_DISABLE_OTHER_RULES flag are set (just like as it is
with "invisible" and non-top-level LVs) - udev is directed to
skip WATCH rule use.
- if the LV_TEMPORARY flag was not used, there would normally be
a WATCH event generated once the LV is closed after "zeroed"
stage. This will make problems with immediated deactivation that
follows.
2013-10-23 16:06:39 +04:00
_get_udev_flags ( dm , lv , layer , laopts - > noscan , laopts - > temporary ) ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
/* Store existing name so we can do rename later */
2005-11-09 16:05:17 +03:00
lvlayer - > old_name = dm_tree_node_get_name ( dnode ) ;
2005-11-09 01:52:26 +03:00
/* Create table */
dm - > pvmove_mirror_count = 0u ;
2013-02-21 13:39:47 +04:00
2014-11-13 12:08:40 +03:00
if ( lv_is_pending_delete ( lv ) )
2014-11-10 12:56:43 +03:00
/* Handle LVs with pending delete */
2014-11-13 12:08:40 +03:00
/* Fow now used only by cache segtype, TODO snapshots */
dm - > track_pending_delete = 1 ;
2014-11-10 12:56:43 +03:00
2014-11-02 21:34:50 +03:00
/* This is unused cache-pool - make metadata accessible */
if ( lv_is_cache_pool ( lv ) )
lv = first_seg ( lv ) - > metadata_lv ;
2013-02-21 13:39:47 +04:00
/* If this is a snapshot origin, add real LV */
/* If this is a snapshot origin + merging snapshot, add cow + real LV */
2013-02-21 13:25:44 +04:00
/* Snapshot origin could be also external origin */
2013-02-21 13:39:47 +04:00
if ( lv_is_origin ( lv ) & & ! layer ) {
if ( ! _add_new_lv_to_dtree ( dm , dtree , lv , laopts , " real " ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2013-02-21 13:39:47 +04:00
if ( ! laopts - > no_merging & & lv_is_merging_origin ( lv ) ) {
if ( ! _add_new_lv_to_dtree ( dm , dtree ,
2014-02-18 00:49:51 +04:00
find_snapshot ( lv ) - > cow , laopts , " cow " ) )
2013-02-21 13:39:47 +04:00
return_0 ;
/*
* Must also add " real " LV for use when
* snapshot - merge target is added
*/
if ( ! _add_snapshot_merge_target_to_dtree ( dm , dnode , lv ) )
return_0 ;
} else if ( ! _add_origin_target_to_dtree ( dm , dnode , lv ) )
return_0 ;
/* Add any snapshots of this LV */
dm_list_iterate ( snh , & lv - > snapshot_segs )
if ( ! _add_new_lv_to_dtree ( dm , dtree ,
dm_list_struct_base ( snh , struct lv_segment ,
origin_list ) - > cow ,
laopts , NULL ) )
return_0 ;
} else if ( lv_is_cow ( lv ) & & ! layer ) {
if ( ! _add_new_lv_to_dtree ( dm , dtree , lv , laopts , " cow " ) )
return_0 ;
if ( ! _add_snapshot_target_to_dtree ( dm , dnode , lv , laopts ) )
return_0 ;
2014-11-04 12:33:35 +03:00
} else if ( ! layer & & ( ( lv_is_thin_pool ( lv ) & & ! lv_is_new_thin_pool ( lv ) ) | |
lv_is_external_origin ( lv ) ) ) {
/* External origin or 'used' Thin pool is using layer */
2013-02-21 13:39:47 +04:00
if ( ! _add_new_lv_to_dtree ( dm , dtree , lv , laopts , lv_layer ( lv ) ) )
return_0 ;
if ( ! _add_layer_target_to_dtree ( dm , dnode , lv ) )
return_0 ;
} else {
/* Add 'real' segments for LVs */
dm_list_iterate_items ( seg , & lv - > segments ) {
if ( ! _add_segment_to_dtree ( dm , dtree , dnode , seg , laopts , layer ) )
return_0 ;
if ( max_stripe_size < seg - > stripe_size * seg - > area_count )
max_stripe_size = seg - > stripe_size * seg - > area_count ;
}
2005-11-09 01:52:26 +03:00
}
2005-10-17 22:21:05 +04:00
2013-02-21 13:39:47 +04:00
/* Setup thin pool callback */
if ( lv_is_thin_pool ( lv ) & & layer & &
2014-07-09 19:24:34 +04:00
! _pool_register_callback ( dm , dnode , lv ) )
return_0 ;
if ( lv_is_cache ( lv ) & &
! _pool_register_callback ( dm , dnode , lv ) )
2013-02-21 13:39:47 +04:00
return_0 ;
2007-12-05 22:24:32 +03:00
if ( read_ahead = = DM_READ_AHEAD_AUTO ) {
2008-01-08 19:47:10 +03:00
/* we need RA at least twice a whole stripe - see the comment in md/raid0.c */
read_ahead = max_stripe_size * 2 ;
2013-02-21 13:39:47 +04:00
/* FIXME: layered device read-ahead */
2009-05-20 15:09:49 +04:00
if ( ! read_ahead )
2009-06-01 16:43:31 +04:00
lv_calculate_readahead ( lv , & read_ahead ) ;
2007-11-29 18:04:12 +03:00
read_ahead_flags = DM_READ_AHEAD_MINIMUM_FLAG ;
2007-12-05 22:24:32 +03:00
}
2007-11-12 23:51:54 +03:00
2007-11-29 18:04:12 +03:00
dm_tree_node_set_read_ahead ( dnode , read_ahead , read_ahead_flags ) ;
2007-11-12 23:51:54 +03:00
2011-06-11 04:03:06 +04:00
/* Add any LVs referencing a PVMOVE LV unless told not to */
2014-09-16 00:33:53 +04:00
if ( dm - > track_pvmove_deps & & lv_is_pvmove ( lv ) )
2011-06-11 04:03:06 +04:00
dm_list_iterate_items ( sl , & lv - > segs_using_this_lv )
2011-06-17 18:14:19 +04:00
if ( ! _add_new_lv_to_dtree ( dm , dtree , sl - > seg - > lv , laopts , NULL ) )
2011-06-11 04:03:06 +04:00
return_0 ;
2014-05-23 23:22:38 +04:00
#if 0
/* Should not be needed, will be removed */
This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 18:45:40 +04:00
if ( ! _set_udev_flags_for_children ( dm , lv - > vg , dnode ) )
return_0 ;
2014-05-23 23:22:38 +04:00
# endif
This patch fixes issues with improper udev flags on sub-LVs.
The current code does not always assign proper udev flags to sub-LVs (e.g.
mirror images and log LVs). This shows up especially during a splitmirror
operation in which an image is split off from a mirror to form a new LV.
A mirror with a disk log is actually composed of 4 different LVs: the 2
mirror images, the log, and the top-level LV that "glues" them all together.
When a 2-way mirror is split into two linear LVs, two of those LVs must be
removed. The segments of the image which is not split off to form the new
LV are transferred to the top-level LV. This is done so that the original
LV can maintain its major/minor, UUID, and name. The sub-lv from which the
segments were transferred gets an error segment as a transitory process
before it is eventually removed. (Note that if the error target was not put
in place, a resume_lv would result in two LVs pointing to the same segment!
If the machine crashes before the eventual removal of the sub-LV, the result
would be a residual LV with the same mapping as the original (now linear) LV.)
So, the two LVs that need to be removed are now the log device and the sub-LV
with the error segment. If udev_flags are not properly set, a resume will
cause the error LV to come up and be scanned by udev. This causes I/O errors.
Additionally, when udev scans sub-LVs (or former sub-LVs), it can cause races
when we are trying to remove those LVs. This is especially bad during failure
conditions.
When the mirror is suspended, the top-level along with its sub-LVs are
suspended. The changes (now 2 linear devices and the yet-to-be-removed log
and error LV) are committed. When the resume takes place on the original
LV, there are no longer links to the other sub-lvs through the LVM metadata.
The links are implicitly handled by querying the kernel for a list of
dependencies. This is done in the '_add_dev' function (which is recursively
called for each dependency found) - called through the following chain:
_add_dev
dm_tree_add_dev_with_udev_flags
<*** DM / LVM divide ***>
_add_dev_to_dtree
_add_lv_to_dtree
_create_partial_dtree
_tree_action
dev_manager_activate
_lv_activate_lv
_lv_resume
lv_resume_if_active
When udev flags are calculated by '_get_udev_flags', it is done by referencing
the 'logical_volume' structure. Those flags are then passed down into
'dm_tree_add_dev_with_udev_flags', which in turn passes them to '_add_dev'.
Unfortunately, when '_add_dev' is finding the dependencies, it has no way to
calculate their proper udev_flags. This is because it is below the DM/LVM
divide - it doesn't have access to the logical_volume structure. In fact,
'_add_dev' simply reuses the udev_flags given for the initial device! This
virtually guarentees the udev_flags are wrong for all the dependencies unless
they are reset by some other mechanism. The current code provides no such
mechanism. Even if '_add_new_lv_to_dtree' were called on the sub-devices -
which it isn't - entries already in the tree are simply passed over, failing
to reset any udev_flags. The solution must retain its implicit nature of
discovering dependencies and be able to go back over the dependencies found
to properly set the udev_flags.
My solution simply calls a new function before leaving '_add_new_lv_to_dtree'
that iterates over the dtree nodes to properly reset the udev_flags of any
children. It is important that this function occur after the '_add_dev' has
done its job of querying the kernel for a list of dependencies. It is this
list of children that we use to look up their respective LVs and properly
calculate the udev_flags.
This solution has worked for single machine, cluster, and cluster w/ exclusive
activation.
2011-10-06 18:45:40 +04:00
2014-11-13 12:08:40 +03:00
dm - > track_pending_delete = save_pending_delete ; /* restore */
2005-10-18 16:39:20 +04:00
return 1 ;
2005-10-17 22:21:05 +04:00
}
2007-05-15 18:42:01 +04:00
/* FIXME: symlinks should be created/destroyed at the same time
* as the kernel devices but we can ' t do that from within libdevmapper
* at present so we must walk the tree twice instead . */
2005-11-09 01:52:26 +03:00
/*
* Create LV symlinks for children of supplied root node .
*/
2005-11-09 16:05:17 +03:00
static int _create_lv_symlinks ( struct dev_manager * dm , struct dm_tree_node * root )
2005-10-17 22:21:05 +04:00
{
2005-11-09 01:52:26 +03:00
void * handle = NULL ;
2005-11-09 16:05:17 +03:00
struct dm_tree_node * child ;
2005-11-09 01:52:26 +03:00
struct lv_layer * lvlayer ;
2008-12-19 17:22:48 +03:00
char * old_vgname , * old_lvname , * old_layer ;
char * new_vgname , * new_lvname , * new_layer ;
2005-11-09 01:52:26 +03:00
const char * name ;
int r = 1 ;
2005-10-17 22:21:05 +04:00
2011-06-17 18:50:53 +04:00
/* Nothing to do if udev fallback is disabled. */
2013-05-13 13:46:24 +04:00
if ( ! _check_udev_fallback ( dm - > cmd ) ) {
2011-10-14 17:23:47 +04:00
fs_set_create ( ) ;
2011-06-17 18:50:53 +04:00
return 1 ;
2011-10-14 17:23:47 +04:00
}
2011-06-17 18:50:53 +04:00
2005-11-09 16:05:17 +03:00
while ( ( child = dm_tree_next_child ( & handle , root , 0 ) ) ) {
2009-12-03 12:59:54 +03:00
if ( ! ( lvlayer = dm_tree_node_get_context ( child ) ) )
2005-11-09 01:52:26 +03:00
continue ;
2005-10-17 22:21:05 +04:00
2005-11-09 01:52:26 +03:00
/* Detect rename */
2005-11-09 16:05:17 +03:00
name = dm_tree_node_get_name ( child ) ;
2005-10-17 22:21:05 +04:00
2005-11-09 01:52:26 +03:00
if ( name & & lvlayer - > old_name & & * lvlayer - > old_name & & strcmp ( name , lvlayer - > old_name ) ) {
2008-12-19 17:22:48 +03:00
if ( ! dm_split_lvm_name ( dm - > mem , lvlayer - > old_name , & old_vgname , & old_lvname , & old_layer ) ) {
2008-01-30 17:00:02 +03:00
log_error ( " _create_lv_symlinks: Couldn't split up old device name %s " , lvlayer - > old_name ) ;
return 0 ;
}
2008-12-19 17:22:48 +03:00
if ( ! dm_split_lvm_name ( dm - > mem , name , & new_vgname , & new_lvname , & new_layer ) ) {
log_error ( " _create_lv_symlinks: Couldn't split up new device name %s " , name ) ;
return 0 ;
}
if ( ! fs_rename_lv ( lvlayer - > lv , name , old_vgname , old_lvname ) )
r = 0 ;
2009-05-28 05:11:29 +04:00
continue ;
}
if ( lv_is_visible ( lvlayer - > lv ) ) {
2010-02-24 23:00:56 +03:00
if ( ! _dev_manager_lv_mknodes ( lvlayer - > lv ) )
2009-05-28 05:11:29 +04:00
r = 0 ;
continue ;
}
2010-02-24 23:00:56 +03:00
if ( ! _dev_manager_lv_rmnodes ( lvlayer - > lv ) )
2005-11-09 01:52:26 +03:00
r = 0 ;
2005-10-17 22:21:05 +04:00
}
2005-11-09 01:52:26 +03:00
return r ;
}
2005-10-18 16:39:20 +04:00
2007-05-15 18:42:01 +04:00
/*
* Remove LV symlinks for children of supplied root node .
*/
static int _remove_lv_symlinks ( struct dev_manager * dm , struct dm_tree_node * root )
{
void * handle = NULL ;
struct dm_tree_node * child ;
char * vgname , * lvname , * layer ;
int r = 1 ;
2011-06-17 18:50:53 +04:00
/* Nothing to do if udev fallback is disabled. */
2013-05-13 13:46:24 +04:00
if ( ! _check_udev_fallback ( dm - > cmd ) )
2011-06-17 18:50:53 +04:00
return 1 ;
2007-05-15 18:42:01 +04:00
while ( ( child = dm_tree_next_child ( & handle , root , 0 ) ) ) {
2008-01-30 17:00:02 +03:00
if ( ! dm_split_lvm_name ( dm - > mem , dm_tree_node_get_name ( child ) , & vgname , & lvname , & layer ) ) {
2007-05-15 18:42:01 +04:00
r = 0 ;
continue ;
}
2008-06-05 16:45:55 +04:00
if ( ! * vgname )
continue ;
2007-05-15 18:42:01 +04:00
/* only top level layer has symlinks */
if ( * layer )
continue ;
2010-01-07 22:54:21 +03:00
fs_del_lv_byname ( dm - > cmd - > dev_dir , vgname , lvname ,
dm - > cmd - > current_settings . udev_rules ) ;
2007-05-15 18:42:01 +04:00
}
return r ;
}
2014-11-13 12:08:40 +03:00
static int _clean_tree ( struct dev_manager * dm , struct dm_tree_node * root , const char * non_toplevel_tree_dlid )
2005-11-09 01:52:26 +03:00
{
void * handle = NULL ;
2005-11-09 16:05:17 +03:00
struct dm_tree_node * child ;
2005-11-09 01:52:26 +03:00
char * vgname , * lvname , * layer ;
const char * name , * uuid ;
2014-11-13 12:08:40 +03:00
struct dm_str_list * dl ;
/* Deactivate any tracked pending delete nodes */
dm_list_iterate_items ( dl , & dm - > pending_delete ) {
log_debug_activation ( " Deleting tracked UUID %s. " , dl - > str ) ;
if ( ! dm_tree_deactivate_children ( root , dl - > str , strlen ( dl - > str ) ) )
return_0 ;
}
2005-11-09 01:52:26 +03:00
2005-11-09 16:05:17 +03:00
while ( ( child = dm_tree_next_child ( & handle , root , 0 ) ) ) {
if ( ! ( name = dm_tree_node_get_name ( child ) ) )
2005-11-09 01:52:26 +03:00
continue ;
2005-11-09 16:05:17 +03:00
if ( ! ( uuid = dm_tree_node_get_uuid ( child ) ) )
2005-11-09 01:52:26 +03:00
continue ;
2008-01-30 17:00:02 +03:00
if ( ! dm_split_lvm_name ( dm - > mem , name , & vgname , & lvname , & layer ) ) {
log_error ( " _clean_tree: Couldn't split up device name %s. " , name ) ;
return 0 ;
}
2005-11-09 01:52:26 +03:00
/* Not meant to be top level? */
2014-11-13 12:08:40 +03:00
if ( ! * layer )
2014-11-11 02:50:13 +03:00
continue ;
2010-08-17 23:25:05 +04:00
/* If operation was performed on a partial tree, don't remove it */
if ( non_toplevel_tree_dlid & & ! strcmp ( non_toplevel_tree_dlid , uuid ) )
continue ;
2014-11-12 11:34:46 +03:00
if ( ! dm_tree_deactivate_children ( root , uuid , strlen ( uuid ) ) )
2009-08-03 22:01:45 +04:00
return_0 ;
2005-11-09 01:52:26 +03:00
}
return 1 ;
2005-10-17 22:21:05 +04:00
}
2014-09-22 17:50:07 +04:00
static int _tree_action ( struct dev_manager * dm , const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts , action_t action )
2005-10-17 22:21:05 +04:00
{
2014-11-13 15:15:58 +03:00
static const char _action_names [ ] [ 24 ] = {
2014-11-09 01:41:22 +03:00
" PRELOAD " , " ACTIVATE " , " DEACTIVATE " , " SUSPEND " , " SUSPEND_WITH_LOCKFS " , " CLEAN "
} ;
2011-10-11 14:02:28 +04:00
const size_t DLID_SIZE = ID_LEN + sizeof ( UUID_PREFIX ) - 1 ;
2005-11-09 16:05:17 +03:00
struct dm_tree * dtree ;
struct dm_tree_node * root ;
2005-10-18 16:39:20 +04:00
char * dlid ;
2005-10-17 22:21:05 +04:00
int r = 0 ;
2014-11-09 01:41:22 +03:00
if ( action < DM_ARRAY_SIZE ( _action_names ) )
2015-06-15 15:33:29 +03:00
log_debug_activation ( " Creating %s%s tree for %s. " ,
_action_names [ action ] ,
( laopts - > origin_only ) ? " origin-only " : " " ,
display_lvname ( lv ) ) ;
2014-11-09 01:41:22 +03:00
2014-04-07 22:33:50 +04:00
/* Some LV can be used for top level tree */
/* TODO: add more.... */
2014-11-02 21:34:50 +03:00
if ( lv_is_cache_pool ( lv ) & & ! dm_list_empty ( & lv - > segs_using_this_lv ) ) {
2014-04-07 22:33:50 +04:00
log_error ( INTERNAL_ERROR " Cannot create tree for %s. " , lv - > name ) ;
return 0 ;
}
2013-07-09 14:34:49 +04:00
/* Some targets may build bigger tree for activation */
dm - > activation = ( ( action = = PRELOAD ) | | ( action = = ACTIVATE ) ) ;
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
dm - > suspend = ( action = = SUSPEND_WITH_LOCKFS ) | | ( action = = SUSPEND ) ;
2011-06-17 18:14:19 +04:00
if ( ! ( dtree = _create_partial_dtree ( dm , lv , laopts - > origin_only ) ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2005-10-17 22:21:05 +04:00
2005-11-09 16:05:17 +03:00
if ( ! ( root = dm_tree_find_node ( dtree , 0 , 0 ) ) ) {
2005-10-18 16:39:20 +04:00
log_error ( " Lost dependency tree root node " ) ;
2011-01-10 17:02:30 +03:00
goto out_no_root ;
2005-10-17 22:21:05 +04:00
}
2011-01-10 17:02:30 +03:00
/* Restore fs cookie */
dm_tree_set_cookie ( root , fs_get_cookie ( ) ) ;
2014-03-11 20:13:47 +04:00
if ( ! ( dlid = build_dm_uuid ( dm - > mem , lv , laopts - > origin_only ? lv_layer ( lv ) : NULL ) ) )
2005-11-09 01:52:26 +03:00
goto_out ;
2005-10-17 22:21:05 +04:00
2005-10-19 17:59:18 +04:00
/* Only process nodes with uuid of "LVM-" plus VG id. */
2005-10-25 23:08:21 +04:00
switch ( action ) {
2005-11-09 01:52:26 +03:00
case CLEAN :
2014-06-09 12:58:57 +04:00
if ( retry_deactivation ( ) )
dm_tree_retry_remove ( root ) ;
2005-11-09 01:52:26 +03:00
/* Deactivate any unused non-toplevel nodes */
2011-06-17 18:14:19 +04:00
if ( ! _clean_tree ( dm , root , laopts - > origin_only ? dlid : NULL ) )
2005-11-09 01:52:26 +03:00
goto_out ;
break ;
2005-10-25 23:08:21 +04:00
case DEACTIVATE :
2011-09-22 21:39:56 +04:00
if ( retry_deactivation ( ) )
dm_tree_retry_remove ( root ) ;
2011-10-11 12:54:01 +04:00
/* Deactivate LV and all devices it references that nothing else has open. */
2011-10-11 14:02:28 +04:00
if ( ! dm_tree_deactivate_children ( root , dlid , DLID_SIZE ) )
2009-08-03 22:01:45 +04:00
goto_out ;
2007-05-15 18:42:01 +04:00
if ( ! _remove_lv_symlinks ( dm , root ) )
2011-10-11 12:57:13 +04:00
log_warn ( " Failed to remove all device symlinks associated with %s. " , lv - > name ) ;
2005-10-25 23:08:21 +04:00
break ;
case SUSPEND :
2006-08-09 01:20:00 +04:00
dm_tree_skip_lockfs ( root ) ;
2015-10-25 22:37:39 +03:00
if ( ! dm - > flush_required & & ! lv_is_pvmove ( lv ) )
2007-01-09 23:31:08 +03:00
dm_tree_use_no_flush_suspend ( root ) ;
2011-02-28 22:53:03 +03:00
/* Fall through */
2006-08-09 01:20:00 +04:00
case SUSPEND_WITH_LOCKFS :
2011-10-11 14:02:28 +04:00
if ( ! dm_tree_suspend_children ( root , dlid , DLID_SIZE ) )
2005-11-09 01:52:26 +03:00
goto_out ;
break ;
case PRELOAD :
case ACTIVATE :
/* Add all required new devices to tree */
thin: move pool messaging from resume to suspend
Existing messaging intarface for thin-pool has a few 'weak' points:
* Message were posted with each 'resume' operation, thus not allowing
activation of thin-pool with the existing state.
* Acceleration skipped suspend step has not worked in cluster,
since clvmd resumes only nodes which are suspended (have proper lock
state).
* Resume may fail and code is not really designed to 'fail' in this
phase (generic rule here is resume DOES NOT fail unless something serious
is wrong and lvm2 tool usually doesn't handle recovery path in this case.)
* Full thin-pool suspend happened, when taken a thin-volume snapshot.
With this patch the new method relocates message passing into suspend
state.
This has a few drawbacks with current API, but overal it performs
better and gives are more posibilities to deal with errors.
Patch introduces a new logic for 'origin-only' suspend of thin-pool and
this also relates to thin-volume when taking snapshot.
When suspend_origin_only operation is invoked on a pool with
queued messages then only those messages are posted to thin-pool and
actual suspend of thin pool and data and metadata volume is skipped.
This makes taking a snapshot of thin-volume lighter operation and
avoids blocking of other unrelated active thin volumes.
Also fail now happens in 'suspend' state where the 'Fail' is more expected
and it is better handled through error paths.
Activation of thin-pool is now not sending any message and leaves upto a tool
to decided later how to finish unfinished double-commit transaction.
Problem which needs some API improvements relates to the lvm2 tree
construction. For the suspend tree we do not add target table line
into the tree, but only a device is inserted into a tree.
Current mechanism to attach messages for thin-pool requires the libdm
to know about thin-pool target, so lvm2 currently takes assumption, node
is really a thin-pool and fills in the table line for this node (which
should be ensured by the PRELOAD phase, but it's a misuse of internal API)
we would possibly need to be able to attach message to 'any' node.
Other thing to notice - current messaging interface in thin-pool
target requires to suspend thin volume origin first and then send
a create message, but this could not have any 'nice' solution on lvm2
side and IMHO we should introduce something like 'create_after_resume'
message.
Patch also changes the moment, where lvm2 transaction id is increased.
Now it happens only after successful finish of kernel transaction id
change. This change was needed to handle properly activation of pool,
which is in the middle of unfinished transaction, and also this corrects
usage of thin-pool by external apps like Docker.
2015-07-01 14:31:37 +03:00
if ( ! _add_new_lv_to_dtree ( dm , dtree , lv , laopts ,
( lv_is_origin ( lv ) & & laopts - > origin_only ) ? " real " :
( lv_is_thin_pool ( lv ) & & laopts - > origin_only ) ? " tpool " : NULL ) )
2005-11-09 01:52:26 +03:00
goto_out ;
/* Preload any devices required before any suspensions */
2013-12-17 18:17:44 +04:00
if ( ! dm_tree_preload_children ( root , dlid , DLID_SIZE ) )
2009-08-03 22:01:45 +04:00
goto_out ;
2005-11-09 01:52:26 +03:00
2015-10-25 22:41:19 +03:00
if ( ( dm_tree_node_size_changed ( root ) < 0 ) )
2009-05-20 13:52:37 +04:00
dm - > flush_required = 1 ;
2015-10-26 23:46:54 +03:00
/* Currently keep the code require flush for any
* non ' thin pool / volume , mirror ' or with any size change */
if ( ! lv_is_thin_volume ( lv ) & &
! lv_is_thin_pool ( lv ) & &
( ! lv_is_mirror ( lv ) | | dm_tree_node_size_changed ( root ) ) )
dm - > flush_required = 1 ;
2009-07-31 22:30:31 +04:00
if ( action = = ACTIVATE ) {
2011-10-11 14:02:28 +04:00
if ( ! dm_tree_activate_children ( root , dlid , DLID_SIZE ) )
2013-12-17 18:17:44 +04:00
goto_out ;
2011-10-11 12:57:13 +04:00
if ( ! _create_lv_symlinks ( dm , root ) )
log_warn ( " Failed to create symlinks for %s. " , lv - > name ) ;
2009-07-31 22:30:31 +04:00
}
2005-11-09 01:52:26 +03:00
2005-10-25 23:08:21 +04:00
break ;
default :
2014-11-09 01:41:22 +03:00
log_error ( INTERNAL_ERROR " _tree_action: Action %u not supported. " , action ) ;
2005-10-17 22:21:05 +04:00
goto out ;
2009-12-03 12:58:30 +03:00
}
2005-10-17 22:21:05 +04:00
r = 1 ;
out :
2011-01-10 17:02:30 +03:00
/* Save fs cookie for udev settle, do not wait here */
fs_set_cookie ( dm_tree_get_cookie ( root ) ) ;
out_no_root :
2005-11-09 16:05:17 +03:00
dm_tree_free ( dtree ) ;
2005-10-17 22:21:05 +04:00
return r ;
}
2010-08-17 20:25:32 +04:00
/* origin_only may only be set if we are resuming (not activating) an origin LV */
2014-09-22 17:50:07 +04:00
int dev_manager_activate ( struct dev_manager * dm , const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts )
2005-11-09 01:52:26 +03:00
{
2011-06-17 18:14:19 +04:00
if ( ! _tree_action ( dm , lv , laopts , ACTIVATE ) )
2005-11-09 01:52:26 +03:00
return_0 ;
2011-10-11 12:59:42 +04:00
if ( ! _tree_action ( dm , lv , laopts , CLEAN ) )
return_0 ;
return 1 ;
2005-11-09 01:52:26 +03:00
}
2010-08-17 20:25:32 +04:00
/* origin_only may only be set if we are resuming (not activating) an origin LV */
2014-09-22 17:50:07 +04:00
int dev_manager_preload ( struct dev_manager * dm , const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts , int * flush_required )
2005-11-09 01:52:26 +03:00
{
2011-06-17 18:14:19 +04:00
if ( ! _tree_action ( dm , lv , laopts , PRELOAD ) )
2011-10-11 12:59:42 +04:00
return_0 ;
2009-05-20 13:52:37 +04:00
* flush_required = dm - > flush_required ;
return 1 ;
2005-11-09 01:52:26 +03:00
}
2005-10-26 18:13:52 +04:00
2014-09-22 17:50:07 +04:00
int dev_manager_deactivate ( struct dev_manager * dm , const struct logical_volume * lv )
2005-10-25 23:08:21 +04:00
{
2011-06-17 18:14:19 +04:00
struct lv_activate_opts laopts = { 0 } ;
2005-10-25 23:08:21 +04:00
2011-10-11 12:59:42 +04:00
if ( ! _tree_action ( dm , lv , & laopts , DEACTIVATE ) )
return_0 ;
2005-10-25 23:08:21 +04:00
2011-10-11 12:59:42 +04:00
return 1 ;
2005-10-25 23:08:21 +04:00
}
2014-09-22 17:50:07 +04:00
int dev_manager_suspend ( struct dev_manager * dm , const struct logical_volume * lv ,
2011-06-17 18:14:19 +04:00
struct lv_activate_opts * laopts , int lockfs , int flush_required )
2005-10-26 18:13:52 +04:00
{
2009-05-20 13:52:37 +04:00
dm - > flush_required = flush_required ;
2011-10-11 12:59:42 +04:00
if ( ! _tree_action ( dm , lv , laopts , lockfs ? SUSPEND_WITH_LOCKFS : SUSPEND ) )
return_0 ;
return 1 ;
2005-10-26 18:13:52 +04:00
}
2005-10-25 23:08:21 +04:00
/*
* Does device use VG somewhere in its construction ?
* Returns 1 if uncertain .
*/
2006-05-11 21:58:58 +04:00
int dev_manager_device_uses_vg ( struct device * dev ,
2005-10-25 23:08:21 +04:00
struct volume_group * vg )
{
2005-11-09 16:05:17 +03:00
struct dm_tree * dtree ;
struct dm_tree_node * root ;
2010-07-09 19:34:40 +04:00
char dlid [ sizeof ( UUID_PREFIX ) + sizeof ( struct id ) - 1 ] __attribute__ ( ( aligned ( 8 ) ) ) ;
2005-10-25 23:08:21 +04:00
int r = 1 ;
2005-11-09 16:05:17 +03:00
if ( ! ( dtree = dm_tree_create ( ) ) ) {
2005-11-09 16:08:41 +03:00
log_error ( " partial dtree creation failed " ) ;
2005-10-25 23:08:21 +04:00
return r ;
}
2014-07-31 00:55:11 +04:00
dm_tree_set_optional_uuid_suffixes ( dtree , & uuid_suffix_list [ 0 ] ) ;
2006-05-11 21:58:58 +04:00
if ( ! dm_tree_add_dev ( dtree , ( uint32_t ) MAJOR ( dev - > dev ) , ( uint32_t ) MINOR ( dev - > dev ) ) ) {
2005-11-09 16:08:41 +03:00
log_error ( " Failed to add device %s (% " PRIu32 " :% " PRIu32 " ) to dtree " ,
2005-10-25 23:08:21 +04:00
dev_name ( dev ) , ( uint32_t ) MAJOR ( dev - > dev ) , ( uint32_t ) MINOR ( dev - > dev ) ) ;
goto out ;
}
memcpy ( dlid , UUID_PREFIX , sizeof ( UUID_PREFIX ) - 1 ) ;
memcpy ( dlid + sizeof ( UUID_PREFIX ) - 1 , & vg - > id . uuid [ 0 ] , sizeof ( vg - > id ) ) ;
2005-11-09 16:05:17 +03:00
if ( ! ( root = dm_tree_find_node ( dtree , 0 , 0 ) ) ) {
2005-10-25 23:08:21 +04:00
log_error ( " Lost dependency tree root node " ) ;
goto out ;
}
2005-11-09 16:05:17 +03:00
if ( dm_tree_children_use_uuid ( root , dlid , sizeof ( UUID_PREFIX ) + sizeof ( vg - > id ) - 1 ) )
2005-11-09 01:52:26 +03:00
goto_out ;
2005-10-25 23:08:21 +04:00
r = 0 ;
out :
2005-11-09 16:05:17 +03:00
dm_tree_free ( dtree ) ;
2005-10-25 23:08:21 +04:00
return r ;
}