2005-04-16 15:20:36 -07:00
/*
* Copyright ( C ) 2001 , 2002 Sistina Software ( UK ) Limited .
2006-06-26 00:27:32 -07:00
* Copyright ( C ) 2004 - 2006 Red Hat , Inc . All rights reserved .
2005-04-16 15:20:36 -07:00
*
* This file is released under the GPL .
*/
# include "dm.h"
# include "dm-bio-list.h"
# include <linux/init.h>
# include <linux/module.h>
2006-03-27 01:18:20 -08:00
# include <linux/mutex.h>
2005-04-16 15:20:36 -07:00
# include <linux/moduleparam.h>
# include <linux/blkpg.h>
# include <linux/bio.h>
# include <linux/buffer_head.h>
# include <linux/mempool.h>
# include <linux/slab.h>
# include <linux/idr.h>
2006-03-27 01:17:54 -08:00
# include <linux/hdreg.h>
2006-03-23 20:00:26 +01:00
# include <linux/blktrace_api.h>
2006-10-03 01:15:15 -07:00
# include <linux/smp_lock.h>
2005-04-16 15:20:36 -07:00
2006-06-26 00:27:35 -07:00
# define DM_MSG_PREFIX "core"
2005-04-16 15:20:36 -07:00
static const char * _name = DM_NAME ;
static unsigned int major = 0 ;
static unsigned int _major = 0 ;
2006-06-26 00:27:22 -07:00
static DEFINE_SPINLOCK ( _minor_lock ) ;
2005-04-16 15:20:36 -07:00
/*
* One of these is allocated per bio .
*/
struct dm_io {
struct mapped_device * md ;
int error ;
struct bio * bio ;
atomic_t io_count ;
2006-02-01 03:04:53 -08:00
unsigned long start_time ;
2005-04-16 15:20:36 -07:00
} ;
/*
* One of these is allocated per target within a bio . Hopefully
* this will be simplified out one day .
*/
2007-07-12 17:26:32 +01:00
struct dm_target_io {
2005-04-16 15:20:36 -07:00
struct dm_io * io ;
struct dm_target * ti ;
union map_info info ;
} ;
union map_info * dm_get_mapinfo ( struct bio * bio )
{
2006-06-26 00:27:33 -07:00
if ( bio & & bio - > bi_private )
2007-07-12 17:26:32 +01:00
return & ( ( struct dm_target_io * ) bio - > bi_private ) - > info ;
2006-06-26 00:27:33 -07:00
return NULL ;
2005-04-16 15:20:36 -07:00
}
2006-06-26 00:27:21 -07:00
# define MINOR_ALLOCED ((void *)-1)
2005-04-16 15:20:36 -07:00
/*
* Bits for the md - > flags field .
*/
# define DMF_BLOCK_IO 0
# define DMF_SUSPENDED 1
2006-01-06 00:20:06 -08:00
# define DMF_FROZEN 2
2006-06-26 00:27:23 -07:00
# define DMF_FREEING 3
2006-06-26 00:27:34 -07:00
# define DMF_DELETING 4
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
# define DMF_NOFLUSH_SUSPENDING 5
2005-04-16 15:20:36 -07:00
struct mapped_device {
2005-07-28 21:16:00 -07:00
struct rw_semaphore io_lock ;
struct semaphore suspend_lock ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
spinlock_t pushback_lock ;
2005-04-16 15:20:36 -07:00
rwlock_t map_lock ;
atomic_t holders ;
2006-06-26 00:27:34 -07:00
atomic_t open_count ;
2005-04-16 15:20:36 -07:00
unsigned long flags ;
2007-07-24 09:28:11 +02:00
struct request_queue * queue ;
2005-04-16 15:20:36 -07:00
struct gendisk * disk ;
2006-03-27 01:17:52 -08:00
char name [ 16 ] ;
2005-04-16 15:20:36 -07:00
void * interface_ptr ;
/*
* A list of ios that arrived while we were suspended .
*/
atomic_t pending ;
wait_queue_head_t wait ;
2006-12-08 02:41:02 -08:00
struct bio_list deferred ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
struct bio_list pushback ;
2005-04-16 15:20:36 -07:00
/*
* The current mapping .
*/
struct dm_table * map ;
/*
* io objects are allocated from here .
*/
mempool_t * io_pool ;
mempool_t * tio_pool ;
2006-10-03 01:15:41 -07:00
struct bio_set * bs ;
2005-04-16 15:20:36 -07:00
/*
* Event handling .
*/
atomic_t event_nr ;
wait_queue_head_t eventq ;
/*
* freeze / thaw support require holding onto a super block
*/
struct super_block * frozen_sb ;
2006-01-06 00:20:05 -08:00
struct block_device * suspended_bdev ;
2006-03-27 01:17:54 -08:00
/* forced geometry settings */
struct hd_geometry geometry ;
2005-04-16 15:20:36 -07:00
} ;
# define MIN_IOS 256
2006-12-06 20:33:20 -08:00
static struct kmem_cache * _io_cache ;
static struct kmem_cache * _tio_cache ;
2005-04-16 15:20:36 -07:00
static int __init local_init ( void )
{
int r ;
/* allocate a slab for the dm_ios */
2007-07-12 17:26:32 +01:00
_io_cache = KMEM_CACHE ( dm_io , 0 ) ;
2005-04-16 15:20:36 -07:00
if ( ! _io_cache )
return - ENOMEM ;
/* allocate a slab for the target ios */
2007-07-12 17:26:32 +01:00
_tio_cache = KMEM_CACHE ( dm_target_io , 0 ) ;
2005-04-16 15:20:36 -07:00
if ( ! _tio_cache ) {
kmem_cache_destroy ( _io_cache ) ;
return - ENOMEM ;
}
_major = major ;
r = register_blkdev ( _major , _name ) ;
if ( r < 0 ) {
kmem_cache_destroy ( _tio_cache ) ;
kmem_cache_destroy ( _io_cache ) ;
return r ;
}
if ( ! _major )
_major = r ;
return 0 ;
}
static void local_exit ( void )
{
kmem_cache_destroy ( _tio_cache ) ;
kmem_cache_destroy ( _io_cache ) ;
2007-07-17 04:03:46 -07:00
unregister_blkdev ( _major , _name ) ;
2005-04-16 15:20:36 -07:00
_major = 0 ;
DMINFO ( " cleaned up " ) ;
}
int ( * _inits [ ] ) ( void ) __initdata = {
local_init ,
dm_target_init ,
dm_linear_init ,
dm_stripe_init ,
dm_interface_init ,
} ;
void ( * _exits [ ] ) ( void ) = {
local_exit ,
dm_target_exit ,
dm_linear_exit ,
dm_stripe_exit ,
dm_interface_exit ,
} ;
static int __init dm_init ( void )
{
const int count = ARRAY_SIZE ( _inits ) ;
int r , i ;
for ( i = 0 ; i < count ; i + + ) {
r = _inits [ i ] ( ) ;
if ( r )
goto bad ;
}
return 0 ;
bad :
while ( i - - )
_exits [ i ] ( ) ;
return r ;
}
static void __exit dm_exit ( void )
{
int i = ARRAY_SIZE ( _exits ) ;
while ( i - - )
_exits [ i ] ( ) ;
}
/*
* Block device functions
*/
static int dm_blk_open ( struct inode * inode , struct file * file )
{
struct mapped_device * md ;
2006-06-26 00:27:23 -07:00
spin_lock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
md = inode - > i_bdev - > bd_disk - > private_data ;
2006-06-26 00:27:23 -07:00
if ( ! md )
goto out ;
2006-06-26 00:27:34 -07:00
if ( test_bit ( DMF_FREEING , & md - > flags ) | |
test_bit ( DMF_DELETING , & md - > flags ) ) {
2006-06-26 00:27:23 -07:00
md = NULL ;
goto out ;
}
2005-04-16 15:20:36 -07:00
dm_get ( md ) ;
2006-06-26 00:27:34 -07:00
atomic_inc ( & md - > open_count ) ;
2006-06-26 00:27:23 -07:00
out :
spin_unlock ( & _minor_lock ) ;
return md ? 0 : - ENXIO ;
2005-04-16 15:20:36 -07:00
}
static int dm_blk_close ( struct inode * inode , struct file * file )
{
struct mapped_device * md ;
md = inode - > i_bdev - > bd_disk - > private_data ;
2006-06-26 00:27:34 -07:00
atomic_dec ( & md - > open_count ) ;
2005-04-16 15:20:36 -07:00
dm_put ( md ) ;
return 0 ;
}
2006-06-26 00:27:34 -07:00
int dm_open_count ( struct mapped_device * md )
{
return atomic_read ( & md - > open_count ) ;
}
/*
* Guarantees nothing is using the device before it ' s deleted .
*/
int dm_lock_for_deletion ( struct mapped_device * md )
{
int r = 0 ;
spin_lock ( & _minor_lock ) ;
if ( dm_open_count ( md ) )
r = - EBUSY ;
else
set_bit ( DMF_DELETING , & md - > flags ) ;
spin_unlock ( & _minor_lock ) ;
return r ;
}
2006-03-27 01:17:54 -08:00
static int dm_blk_getgeo ( struct block_device * bdev , struct hd_geometry * geo )
{
struct mapped_device * md = bdev - > bd_disk - > private_data ;
return dm_get_geometry ( md , geo ) ;
}
2006-10-03 01:15:15 -07:00
static int dm_blk_ioctl ( struct inode * inode , struct file * file ,
unsigned int cmd , unsigned long arg )
{
struct mapped_device * md ;
struct dm_table * map ;
struct dm_target * tgt ;
int r = - ENOTTY ;
/* We don't really need this lock, but we do need 'inode'. */
unlock_kernel ( ) ;
md = inode - > i_bdev - > bd_disk - > private_data ;
map = dm_get_table ( md ) ;
if ( ! map | | ! dm_table_get_size ( map ) )
goto out ;
/* We only support devices that have a single target */
if ( dm_table_get_num_targets ( map ) ! = 1 )
goto out ;
tgt = dm_table_get_target ( map , 0 ) ;
if ( dm_suspended ( md ) ) {
r = - EAGAIN ;
goto out ;
}
if ( tgt - > type - > ioctl )
r = tgt - > type - > ioctl ( tgt , inode , file , cmd , arg ) ;
out :
dm_table_put ( map ) ;
lock_kernel ( ) ;
return r ;
}
2007-07-12 17:26:32 +01:00
static struct dm_io * alloc_io ( struct mapped_device * md )
2005-04-16 15:20:36 -07:00
{
return mempool_alloc ( md - > io_pool , GFP_NOIO ) ;
}
2007-07-12 17:26:32 +01:00
static void free_io ( struct mapped_device * md , struct dm_io * io )
2005-04-16 15:20:36 -07:00
{
mempool_free ( io , md - > io_pool ) ;
}
2007-07-12 17:26:32 +01:00
static struct dm_target_io * alloc_tio ( struct mapped_device * md )
2005-04-16 15:20:36 -07:00
{
return mempool_alloc ( md - > tio_pool , GFP_NOIO ) ;
}
2007-07-12 17:26:32 +01:00
static void free_tio ( struct mapped_device * md , struct dm_target_io * tio )
2005-04-16 15:20:36 -07:00
{
mempool_free ( tio , md - > tio_pool ) ;
}
2006-02-01 03:04:53 -08:00
static void start_io_acct ( struct dm_io * io )
{
struct mapped_device * md = io - > md ;
io - > start_time = jiffies ;
preempt_disable ( ) ;
disk_round_stats ( dm_disk ( md ) ) ;
preempt_enable ( ) ;
dm_disk ( md ) - > in_flight = atomic_inc_return ( & md - > pending ) ;
}
static int end_io_acct ( struct dm_io * io )
{
struct mapped_device * md = io - > md ;
struct bio * bio = io - > bio ;
unsigned long duration = jiffies - io - > start_time ;
int pending ;
int rw = bio_data_dir ( bio ) ;
preempt_disable ( ) ;
disk_round_stats ( dm_disk ( md ) ) ;
preempt_enable ( ) ;
dm_disk ( md ) - > in_flight = pending = atomic_dec_return ( & md - > pending ) ;
disk_stat_add ( dm_disk ( md ) , ticks [ rw ] , duration ) ;
return ! pending ;
}
2005-04-16 15:20:36 -07:00
/*
* Add the bio to the list of deferred io .
*/
static int queue_io ( struct mapped_device * md , struct bio * bio )
{
2005-07-28 21:16:00 -07:00
down_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
if ( ! test_bit ( DMF_BLOCK_IO , & md - > flags ) ) {
2005-07-28 21:16:00 -07:00
up_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
return 1 ;
}
bio_list_add ( & md - > deferred , bio ) ;
2005-07-28 21:16:00 -07:00
up_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
return 0 ; /* deferred successfully */
}
/*
* Everyone ( including functions in this file ) , should use this
* function to access the md - > map field , and make sure they call
* dm_table_put ( ) when finished .
*/
struct dm_table * dm_get_table ( struct mapped_device * md )
{
struct dm_table * t ;
read_lock ( & md - > map_lock ) ;
t = md - > map ;
if ( t )
dm_table_get ( t ) ;
read_unlock ( & md - > map_lock ) ;
return t ;
}
2006-03-27 01:17:54 -08:00
/*
* Get the geometry associated with a dm device
*/
int dm_get_geometry ( struct mapped_device * md , struct hd_geometry * geo )
{
* geo = md - > geometry ;
return 0 ;
}
/*
* Set the geometry of a device .
*/
int dm_set_geometry ( struct mapped_device * md , struct hd_geometry * geo )
{
sector_t sz = ( sector_t ) geo - > cylinders * geo - > heads * geo - > sectors ;
if ( geo - > start > sz ) {
DMWARN ( " Start sector is beyond the geometry limits. " ) ;
return - EINVAL ;
}
md - > geometry = * geo ;
return 0 ;
}
2005-04-16 15:20:36 -07:00
/*-----------------------------------------------------------------
* CRUD START :
* A more elegant soln is in the works that uses the queue
* merge fn , unfortunately there are a couple of changes to
* the block layer that I want to make for this . So in the
* interests of getting something for people to use I give
* you this clearly demarcated crap .
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
static int __noflush_suspending ( struct mapped_device * md )
{
return test_bit ( DMF_NOFLUSH_SUSPENDING , & md - > flags ) ;
}
2005-04-16 15:20:36 -07:00
/*
* Decrements the number of outstanding ios that a bio has been
* cloned into , completing the original io if necc .
*/
2006-01-14 13:20:43 -08:00
static void dec_pending ( struct dm_io * io , int error )
2005-04-16 15:20:36 -07:00
{
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
unsigned long flags ;
/* Push-back supersedes any I/O errors */
if ( error & & ! ( io - > error > 0 & & __noflush_suspending ( io - > md ) ) )
2005-04-16 15:20:36 -07:00
io - > error = error ;
if ( atomic_dec_and_test ( & io - > io_count ) ) {
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
if ( io - > error = = DM_ENDIO_REQUEUE ) {
/*
* Target requested pushing back the I / O .
* This must be handled before the sleeper on
* suspend queue merges the pushback list .
*/
spin_lock_irqsave ( & io - > md - > pushback_lock , flags ) ;
if ( __noflush_suspending ( io - > md ) )
bio_list_add ( & io - > md - > pushback , io - > bio ) ;
else
/* noflush suspend was interrupted. */
io - > error = - EIO ;
spin_unlock_irqrestore ( & io - > md - > pushback_lock , flags ) ;
}
2006-02-01 03:04:53 -08:00
if ( end_io_acct ( io ) )
2005-04-16 15:20:36 -07:00
/* nudge anyone waiting on suspend queue */
wake_up ( & io - > md - > wait ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
if ( io - > error ! = DM_ENDIO_REQUEUE ) {
blk_add_trace_bio ( io - > md - > queue , io - > bio ,
BLK_TA_COMPLETE ) ;
bio_endio ( io - > bio , io - > bio - > bi_size , io - > error ) ;
}
2006-03-23 20:00:26 +01:00
2005-04-16 15:20:36 -07:00
free_io ( io - > md , io ) ;
}
}
static int clone_endio ( struct bio * bio , unsigned int done , int error )
{
int r = 0 ;
2007-07-12 17:26:32 +01:00
struct dm_target_io * tio = bio - > bi_private ;
2006-10-03 01:15:41 -07:00
struct mapped_device * md = tio - > io - > md ;
2005-04-16 15:20:36 -07:00
dm_endio_fn endio = tio - > ti - > type - > end_io ;
if ( bio - > bi_size )
return 1 ;
if ( ! bio_flagged ( bio , BIO_UPTODATE ) & & ! error )
error = - EIO ;
if ( endio ) {
r = endio ( tio - > ti , bio , error , & tio - > info ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
if ( r < 0 | | r = = DM_ENDIO_REQUEUE )
/*
* error and requeue request are handled
* in dec_pending ( ) .
*/
2005-04-16 15:20:36 -07:00
error = r ;
2006-12-08 02:41:05 -08:00
else if ( r = = DM_ENDIO_INCOMPLETE )
/* The target will handle the io */
2005-04-16 15:20:36 -07:00
return 1 ;
2006-12-08 02:41:05 -08:00
else if ( r ) {
DMWARN ( " unimplemented target endio return value: %d " , r ) ;
BUG ( ) ;
}
2005-04-16 15:20:36 -07:00
}
2006-10-03 01:15:41 -07:00
dec_pending ( tio - > io , error ) ;
/*
* Store md for cleanup instead of tio which is about to get freed .
*/
bio - > bi_private = md - > bs ;
2005-04-16 15:20:36 -07:00
bio_put ( bio ) ;
2006-10-03 01:15:41 -07:00
free_tio ( md , tio ) ;
2005-04-16 15:20:36 -07:00
return r ;
}
static sector_t max_io_len ( struct mapped_device * md ,
sector_t sector , struct dm_target * ti )
{
sector_t offset = sector - ti - > begin ;
sector_t len = ti - > len - offset ;
/*
* Does the target need to split even further ?
*/
if ( ti - > split_io ) {
sector_t boundary ;
boundary = ( ( offset + ti - > split_io ) & ~ ( ti - > split_io - 1 ) )
- offset ;
if ( len > boundary )
len = boundary ;
}
return len ;
}
static void __map_bio ( struct dm_target * ti , struct bio * clone ,
2007-07-12 17:26:32 +01:00
struct dm_target_io * tio )
2005-04-16 15:20:36 -07:00
{
int r ;
2006-03-23 20:00:26 +01:00
sector_t sector ;
2006-10-03 01:15:41 -07:00
struct mapped_device * md ;
2005-04-16 15:20:36 -07:00
/*
* Sanity checks .
*/
BUG_ON ( ! clone - > bi_size ) ;
clone - > bi_end_io = clone_endio ;
clone - > bi_private = tio ;
/*
* Map the clone . If r = = 0 we don ' t need to do
* anything , the target has assumed ownership of
* this io .
*/
atomic_inc ( & tio - > io - > io_count ) ;
2006-03-23 20:00:26 +01:00
sector = clone - > bi_sector ;
2005-04-16 15:20:36 -07:00
r = ti - > type - > map ( ti , clone , & tio - > info ) ;
2006-12-08 02:41:05 -08:00
if ( r = = DM_MAPIO_REMAPPED ) {
2005-04-16 15:20:36 -07:00
/* the bio has been remapped so dispatch it */
2006-03-23 20:00:26 +01:00
2006-06-26 00:27:33 -07:00
blk_add_trace_remap ( bdev_get_queue ( clone - > bi_bdev ) , clone ,
tio - > io - > bio - > bi_bdev - > bd_dev , sector ,
2006-03-23 20:00:26 +01:00
clone - > bi_sector ) ;
2005-04-16 15:20:36 -07:00
generic_make_request ( clone ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
} else if ( r < 0 | | r = = DM_MAPIO_REQUEUE ) {
/* error the io and bail out, or requeue it if needed */
2006-10-03 01:15:41 -07:00
md = tio - > io - > md ;
dec_pending ( tio - > io , r ) ;
/*
* Store bio_set for cleanup .
*/
clone - > bi_private = md - > bs ;
2005-04-16 15:20:36 -07:00
bio_put ( clone ) ;
2006-10-03 01:15:41 -07:00
free_tio ( md , tio ) ;
2006-12-08 02:41:05 -08:00
} else if ( r ) {
DMWARN ( " unimplemented target map return value: %d " , r ) ;
BUG ( ) ;
2005-04-16 15:20:36 -07:00
}
}
struct clone_info {
struct mapped_device * md ;
struct dm_table * map ;
struct bio * bio ;
struct dm_io * io ;
sector_t sector ;
sector_t sector_count ;
unsigned short idx ;
} ;
2005-09-06 15:16:42 -07:00
static void dm_bio_destructor ( struct bio * bio )
{
2006-10-03 01:15:41 -07:00
struct bio_set * bs = bio - > bi_private ;
bio_free ( bio , bs ) ;
2005-09-06 15:16:42 -07:00
}
2005-04-16 15:20:36 -07:00
/*
* Creates a little bio that is just does part of a bvec .
*/
static struct bio * split_bvec ( struct bio * bio , sector_t sector ,
unsigned short idx , unsigned int offset ,
2006-10-03 01:15:41 -07:00
unsigned int len , struct bio_set * bs )
2005-04-16 15:20:36 -07:00
{
struct bio * clone ;
struct bio_vec * bv = bio - > bi_io_vec + idx ;
2006-10-03 01:15:41 -07:00
clone = bio_alloc_bioset ( GFP_NOIO , 1 , bs ) ;
2005-09-06 15:16:42 -07:00
clone - > bi_destructor = dm_bio_destructor ;
2005-04-16 15:20:36 -07:00
* clone - > bi_io_vec = * bv ;
clone - > bi_sector = sector ;
clone - > bi_bdev = bio - > bi_bdev ;
clone - > bi_rw = bio - > bi_rw ;
clone - > bi_vcnt = 1 ;
clone - > bi_size = to_bytes ( len ) ;
clone - > bi_io_vec - > bv_offset = offset ;
clone - > bi_io_vec - > bv_len = clone - > bi_size ;
return clone ;
}
/*
* Creates a bio that consists of range of complete bvecs .
*/
static struct bio * clone_bio ( struct bio * bio , sector_t sector ,
unsigned short idx , unsigned short bv_count ,
2006-10-03 01:15:41 -07:00
unsigned int len , struct bio_set * bs )
2005-04-16 15:20:36 -07:00
{
struct bio * clone ;
2006-10-03 01:15:41 -07:00
clone = bio_alloc_bioset ( GFP_NOIO , bio - > bi_max_vecs , bs ) ;
__bio_clone ( clone , bio ) ;
clone - > bi_destructor = dm_bio_destructor ;
2005-04-16 15:20:36 -07:00
clone - > bi_sector = sector ;
clone - > bi_idx = idx ;
clone - > bi_vcnt = idx + bv_count ;
clone - > bi_size = to_bytes ( len ) ;
clone - > bi_flags & = ~ ( 1 < < BIO_SEG_VALID ) ;
return clone ;
}
static void __clone_and_map ( struct clone_info * ci )
{
struct bio * clone , * bio = ci - > bio ;
struct dm_target * ti = dm_table_find_target ( ci - > map , ci - > sector ) ;
sector_t len = 0 , max = max_io_len ( ci - > md , ci - > sector , ti ) ;
2007-07-12 17:26:32 +01:00
struct dm_target_io * tio ;
2005-04-16 15:20:36 -07:00
/*
* Allocate a target io object .
*/
tio = alloc_tio ( ci - > md ) ;
tio - > io = ci - > io ;
tio - > ti = ti ;
memset ( & tio - > info , 0 , sizeof ( tio - > info ) ) ;
if ( ci - > sector_count < = max ) {
/*
* Optimise for the simple case where we can do all of
* the remaining io with a single clone .
*/
clone = clone_bio ( bio , ci - > sector , ci - > idx ,
2006-10-03 01:15:41 -07:00
bio - > bi_vcnt - ci - > idx , ci - > sector_count ,
ci - > md - > bs ) ;
2005-04-16 15:20:36 -07:00
__map_bio ( ti , clone , tio ) ;
ci - > sector_count = 0 ;
} else if ( to_sector ( bio - > bi_io_vec [ ci - > idx ] . bv_len ) < = max ) {
/*
* There are some bvecs that don ' t span targets .
* Do as many of these as possible .
*/
int i ;
sector_t remaining = max ;
sector_t bv_len ;
for ( i = ci - > idx ; remaining & & ( i < bio - > bi_vcnt ) ; i + + ) {
bv_len = to_sector ( bio - > bi_io_vec [ i ] . bv_len ) ;
if ( bv_len > remaining )
break ;
remaining - = bv_len ;
len + = bv_len ;
}
2006-10-03 01:15:41 -07:00
clone = clone_bio ( bio , ci - > sector , ci - > idx , i - ci - > idx , len ,
ci - > md - > bs ) ;
2005-04-16 15:20:36 -07:00
__map_bio ( ti , clone , tio ) ;
ci - > sector + = len ;
ci - > sector_count - = len ;
ci - > idx = i ;
} else {
/*
2006-03-22 00:07:42 -08:00
* Handle a bvec that must be split between two or more targets .
2005-04-16 15:20:36 -07:00
*/
struct bio_vec * bv = bio - > bi_io_vec + ci - > idx ;
2006-03-22 00:07:42 -08:00
sector_t remaining = to_sector ( bv - > bv_len ) ;
unsigned int offset = 0 ;
2005-04-16 15:20:36 -07:00
2006-03-22 00:07:42 -08:00
do {
if ( offset ) {
ti = dm_table_find_target ( ci - > map , ci - > sector ) ;
max = max_io_len ( ci - > md , ci - > sector , ti ) ;
2005-04-16 15:20:36 -07:00
2006-03-22 00:07:42 -08:00
tio = alloc_tio ( ci - > md ) ;
tio - > io = ci - > io ;
tio - > ti = ti ;
memset ( & tio - > info , 0 , sizeof ( tio - > info ) ) ;
}
len = min ( remaining , max ) ;
clone = split_bvec ( bio , ci - > sector , ci - > idx ,
2006-10-03 01:15:41 -07:00
bv - > bv_offset + offset , len ,
ci - > md - > bs ) ;
2006-03-22 00:07:42 -08:00
__map_bio ( ti , clone , tio ) ;
ci - > sector + = len ;
ci - > sector_count - = len ;
offset + = to_bytes ( len ) ;
} while ( remaining - = len ) ;
2005-04-16 15:20:36 -07:00
ci - > idx + + ;
}
}
/*
* Split the bio into several clones .
*/
static void __split_bio ( struct mapped_device * md , struct bio * bio )
{
struct clone_info ci ;
ci . map = dm_get_table ( md ) ;
if ( ! ci . map ) {
bio_io_error ( bio , bio - > bi_size ) ;
return ;
}
ci . md = md ;
ci . bio = bio ;
ci . io = alloc_io ( md ) ;
ci . io - > error = 0 ;
atomic_set ( & ci . io - > io_count , 1 ) ;
ci . io - > bio = bio ;
ci . io - > md = md ;
ci . sector = bio - > bi_sector ;
ci . sector_count = bio_sectors ( bio ) ;
ci . idx = bio - > bi_idx ;
2006-02-01 03:04:53 -08:00
start_io_acct ( ci . io ) ;
2005-04-16 15:20:36 -07:00
while ( ci . sector_count )
__clone_and_map ( & ci ) ;
/* drop the extra reference count */
dec_pending ( ci . io , 0 ) ;
dm_table_put ( ci . map ) ;
}
/*-----------------------------------------------------------------
* CRUD END
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*
* The request function that just remaps the bio built up by
* dm_merge_bvec .
*/
2007-07-24 09:28:11 +02:00
static int dm_request ( struct request_queue * q , struct bio * bio )
2005-04-16 15:20:36 -07:00
{
int r ;
2006-02-01 03:04:52 -08:00
int rw = bio_data_dir ( bio ) ;
2005-04-16 15:20:36 -07:00
struct mapped_device * md = q - > queuedata ;
2007-07-12 17:28:33 +01:00
/*
* There is no use in forwarding any barrier request since we can ' t
* guarantee it is ( or can be ) handled by the targets correctly .
*/
if ( unlikely ( bio_barrier ( bio ) ) ) {
bio_endio ( bio , bio - > bi_size , - EOPNOTSUPP ) ;
return 0 ;
}
2005-07-28 21:16:00 -07:00
down_read ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
2006-02-01 03:04:52 -08:00
disk_stat_inc ( dm_disk ( md ) , ios [ rw ] ) ;
disk_stat_add ( dm_disk ( md ) , sectors [ rw ] , bio_sectors ( bio ) ) ;
2005-04-16 15:20:36 -07:00
/*
* If we ' re suspended we have to queue
* this io for later .
*/
while ( test_bit ( DMF_BLOCK_IO , & md - > flags ) ) {
2005-07-28 21:16:00 -07:00
up_read ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
if ( bio_rw ( bio ) = = READA ) {
bio_io_error ( bio , bio - > bi_size ) ;
return 0 ;
}
r = queue_io ( md , bio ) ;
if ( r < 0 ) {
bio_io_error ( bio , bio - > bi_size ) ;
return 0 ;
} else if ( r = = 0 )
return 0 ; /* deferred successfully */
/*
* We ' re in a while loop , because someone could suspend
* before we get to the following read lock .
*/
2005-07-28 21:16:00 -07:00
down_read ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
}
__split_bio ( md , bio ) ;
2005-07-28 21:16:00 -07:00
up_read ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
return 0 ;
}
2007-07-24 09:28:11 +02:00
static int dm_flush_all ( struct request_queue * q , struct gendisk * disk ,
2005-04-16 15:20:36 -07:00
sector_t * error_sector )
{
struct mapped_device * md = q - > queuedata ;
struct dm_table * map = dm_get_table ( md ) ;
int ret = - ENXIO ;
if ( map ) {
2005-07-28 21:15:57 -07:00
ret = dm_table_flush_all ( map ) ;
2005-04-16 15:20:36 -07:00
dm_table_put ( map ) ;
}
return ret ;
}
2007-07-24 09:28:11 +02:00
static void dm_unplug_all ( struct request_queue * q )
2005-04-16 15:20:36 -07:00
{
struct mapped_device * md = q - > queuedata ;
struct dm_table * map = dm_get_table ( md ) ;
if ( map ) {
dm_table_unplug_all ( map ) ;
dm_table_put ( map ) ;
}
}
static int dm_any_congested ( void * congested_data , int bdi_bits )
{
int r ;
struct mapped_device * md = ( struct mapped_device * ) congested_data ;
struct dm_table * map = dm_get_table ( md ) ;
if ( ! map | | test_bit ( DMF_BLOCK_IO , & md - > flags ) )
r = bdi_bits ;
else
r = dm_table_any_congested ( map , bdi_bits ) ;
dm_table_put ( map ) ;
return r ;
}
/*-----------------------------------------------------------------
* An IDR is used to keep track of allocated minor numbers .
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
static DEFINE_IDR ( _minor_idr ) ;
2006-06-26 00:27:32 -07:00
static void free_minor ( int minor )
2005-04-16 15:20:36 -07:00
{
2006-06-26 00:27:22 -07:00
spin_lock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
idr_remove ( & _minor_idr , minor ) ;
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
}
/*
* See if the device with a specific minor # is free .
*/
2006-06-26 00:27:32 -07:00
static int specific_minor ( struct mapped_device * md , int minor )
2005-04-16 15:20:36 -07:00
{
int r , m ;
if ( minor > = ( 1 < < MINORBITS ) )
return - EINVAL ;
2006-06-26 00:27:21 -07:00
r = idr_pre_get ( & _minor_idr , GFP_KERNEL ) ;
if ( ! r )
return - ENOMEM ;
2006-06-26 00:27:22 -07:00
spin_lock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
if ( idr_find ( & _minor_idr , minor ) ) {
r = - EBUSY ;
goto out ;
}
2006-06-26 00:27:21 -07:00
r = idr_get_new_above ( & _minor_idr , MINOR_ALLOCED , minor , & m ) ;
2006-06-26 00:27:21 -07:00
if ( r )
2005-04-16 15:20:36 -07:00
goto out ;
if ( m ! = minor ) {
idr_remove ( & _minor_idr , m ) ;
r = - EBUSY ;
goto out ;
}
out :
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
return r ;
}
2006-06-26 00:27:32 -07:00
static int next_free_minor ( struct mapped_device * md , int * minor )
2005-04-16 15:20:36 -07:00
{
2006-06-26 00:27:32 -07:00
int r , m ;
2005-04-16 15:20:36 -07:00
r = idr_pre_get ( & _minor_idr , GFP_KERNEL ) ;
2006-06-26 00:27:21 -07:00
if ( ! r )
return - ENOMEM ;
2006-06-26 00:27:22 -07:00
spin_lock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
2006-06-26 00:27:21 -07:00
r = idr_get_new ( & _minor_idr , MINOR_ALLOCED , & m ) ;
2005-04-16 15:20:36 -07:00
if ( r ) {
goto out ;
}
if ( m > = ( 1 < < MINORBITS ) ) {
idr_remove ( & _minor_idr , m ) ;
r = - ENOSPC ;
goto out ;
}
* minor = m ;
out :
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
return r ;
}
static struct block_device_operations dm_blk_dops ;
/*
* Allocate and initialise a blank device with a given minor .
*/
2006-06-26 00:27:32 -07:00
static struct mapped_device * alloc_dev ( int minor )
2005-04-16 15:20:36 -07:00
{
int r ;
struct mapped_device * md = kmalloc ( sizeof ( * md ) , GFP_KERNEL ) ;
2006-06-26 00:27:21 -07:00
void * old_md ;
2005-04-16 15:20:36 -07:00
if ( ! md ) {
DMWARN ( " unable to allocate device, out of memory. " ) ;
return NULL ;
}
2006-06-26 00:27:25 -07:00
if ( ! try_module_get ( THIS_MODULE ) )
goto bad0 ;
2005-04-16 15:20:36 -07:00
/* get a minor number for the dev */
2006-06-26 00:27:32 -07:00
if ( minor = = DM_ANY_MINOR )
r = next_free_minor ( md , & minor ) ;
else
r = specific_minor ( md , minor ) ;
2005-04-16 15:20:36 -07:00
if ( r < 0 )
goto bad1 ;
memset ( md , 0 , sizeof ( * md ) ) ;
2005-07-28 21:16:00 -07:00
init_rwsem ( & md - > io_lock ) ;
init_MUTEX ( & md - > suspend_lock ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
spin_lock_init ( & md - > pushback_lock ) ;
2005-04-16 15:20:36 -07:00
rwlock_init ( & md - > map_lock ) ;
atomic_set ( & md - > holders , 1 ) ;
2006-06-26 00:27:34 -07:00
atomic_set ( & md - > open_count , 0 ) ;
2005-04-16 15:20:36 -07:00
atomic_set ( & md - > event_nr , 0 ) ;
md - > queue = blk_alloc_queue ( GFP_KERNEL ) ;
if ( ! md - > queue )
2006-10-03 01:15:22 -07:00
goto bad1_free_minor ;
2005-04-16 15:20:36 -07:00
md - > queue - > queuedata = md ;
md - > queue - > backing_dev_info . congested_fn = dm_any_congested ;
md - > queue - > backing_dev_info . congested_data = md ;
blk_queue_make_request ( md - > queue , dm_request ) ;
2006-01-10 10:48:02 +01:00
blk_queue_bounce_limit ( md - > queue , BLK_BOUNCE_ANY ) ;
2005-04-16 15:20:36 -07:00
md - > queue - > unplug_fn = dm_unplug_all ;
md - > queue - > issue_flush_fn = dm_flush_all ;
2006-03-26 01:37:50 -08:00
md - > io_pool = mempool_create_slab_pool ( MIN_IOS , _io_cache ) ;
2006-12-08 02:41:02 -08:00
if ( ! md - > io_pool )
goto bad2 ;
2005-04-16 15:20:36 -07:00
2006-03-26 01:37:50 -08:00
md - > tio_pool = mempool_create_slab_pool ( MIN_IOS , _tio_cache ) ;
2005-04-16 15:20:36 -07:00
if ( ! md - > tio_pool )
goto bad3 ;
2007-04-02 10:06:42 +02:00
md - > bs = bioset_create ( 16 , 16 ) ;
2006-10-03 01:15:41 -07:00
if ( ! md - > bs )
goto bad_no_bioset ;
2005-04-16 15:20:36 -07:00
md - > disk = alloc_disk ( 1 ) ;
if ( ! md - > disk )
goto bad4 ;
2006-06-26 00:27:25 -07:00
atomic_set ( & md - > pending , 0 ) ;
init_waitqueue_head ( & md - > wait ) ;
init_waitqueue_head ( & md - > eventq ) ;
2005-04-16 15:20:36 -07:00
md - > disk - > major = _major ;
md - > disk - > first_minor = minor ;
md - > disk - > fops = & dm_blk_dops ;
md - > disk - > queue = md - > queue ;
md - > disk - > private_data = md ;
sprintf ( md - > disk - > disk_name , " dm-%d " , minor ) ;
add_disk ( md - > disk ) ;
2006-03-27 01:17:52 -08:00
format_dev_t ( md - > name , MKDEV ( _major , minor ) ) ;
2005-04-16 15:20:36 -07:00
2006-06-26 00:27:21 -07:00
/* Populate the mapping, nobody knows we exist yet */
2006-06-26 00:27:22 -07:00
spin_lock ( & _minor_lock ) ;
2006-06-26 00:27:21 -07:00
old_md = idr_replace ( & _minor_idr , md , minor ) ;
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2006-06-26 00:27:21 -07:00
BUG_ON ( old_md ! = MINOR_ALLOCED ) ;
2005-04-16 15:20:36 -07:00
return md ;
bad4 :
2006-10-03 01:15:41 -07:00
bioset_free ( md - > bs ) ;
bad_no_bioset :
2005-04-16 15:20:36 -07:00
mempool_destroy ( md - > tio_pool ) ;
bad3 :
mempool_destroy ( md - > io_pool ) ;
bad2 :
2006-03-12 11:02:03 -05:00
blk_cleanup_queue ( md - > queue ) ;
2006-10-03 01:15:22 -07:00
bad1_free_minor :
2005-04-16 15:20:36 -07:00
free_minor ( minor ) ;
bad1 :
2006-06-26 00:27:25 -07:00
module_put ( THIS_MODULE ) ;
bad0 :
2005-04-16 15:20:36 -07:00
kfree ( md ) ;
return NULL ;
}
static void free_dev ( struct mapped_device * md )
{
2006-06-26 00:27:32 -07:00
int minor = md - > disk - > first_minor ;
2006-02-24 13:04:25 -08:00
2006-02-24 13:04:24 -08:00
if ( md - > suspended_bdev ) {
thaw_bdev ( md - > suspended_bdev , NULL ) ;
bdput ( md - > suspended_bdev ) ;
}
2005-04-16 15:20:36 -07:00
mempool_destroy ( md - > tio_pool ) ;
mempool_destroy ( md - > io_pool ) ;
2006-10-03 01:15:41 -07:00
bioset_free ( md - > bs ) ;
2005-04-16 15:20:36 -07:00
del_gendisk ( md - > disk ) ;
2006-02-24 13:04:25 -08:00
free_minor ( minor ) ;
2006-06-26 00:27:23 -07:00
spin_lock ( & _minor_lock ) ;
md - > disk - > private_data = NULL ;
spin_unlock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
put_disk ( md - > disk ) ;
2006-03-12 11:02:03 -05:00
blk_cleanup_queue ( md - > queue ) ;
2006-06-26 00:27:25 -07:00
module_put ( THIS_MODULE ) ;
2005-04-16 15:20:36 -07:00
kfree ( md ) ;
}
/*
* Bind a table to the device .
*/
static void event_callback ( void * context )
{
struct mapped_device * md = ( struct mapped_device * ) context ;
atomic_inc ( & md - > event_nr ) ;
wake_up ( & md - > eventq ) ;
}
2005-07-28 21:15:59 -07:00
static void __set_size ( struct mapped_device * md , sector_t size )
2005-04-16 15:20:36 -07:00
{
2005-07-28 21:15:59 -07:00
set_capacity ( md - > disk , size ) ;
2005-04-16 15:20:36 -07:00
2006-01-09 15:59:24 -08:00
mutex_lock ( & md - > suspended_bdev - > bd_inode - > i_mutex ) ;
2006-01-06 00:20:05 -08:00
i_size_write ( md - > suspended_bdev - > bd_inode , ( loff_t ) size < < SECTOR_SHIFT ) ;
2006-01-09 15:59:24 -08:00
mutex_unlock ( & md - > suspended_bdev - > bd_inode - > i_mutex ) ;
2005-04-16 15:20:36 -07:00
}
static int __bind ( struct mapped_device * md , struct dm_table * t )
{
2007-07-24 09:28:11 +02:00
struct request_queue * q = md - > queue ;
2005-04-16 15:20:36 -07:00
sector_t size ;
size = dm_table_get_size ( t ) ;
2006-03-27 01:17:54 -08:00
/*
* Wipe any geometry if the size of the table changed .
*/
if ( size ! = get_capacity ( md - > disk ) )
memset ( & md - > geometry , 0 , sizeof ( md - > geometry ) ) ;
2007-01-26 00:57:07 -08:00
if ( md - > suspended_bdev )
__set_size ( md , size ) ;
2005-04-16 15:20:36 -07:00
if ( size = = 0 )
return 0 ;
2005-07-28 21:16:00 -07:00
dm_table_get ( t ) ;
dm_table_event_callback ( t , event_callback , md ) ;
2005-04-16 15:20:36 -07:00
write_lock ( & md - > map_lock ) ;
md - > map = t ;
2005-07-28 21:16:00 -07:00
dm_table_set_restrictions ( t , q ) ;
2005-04-16 15:20:36 -07:00
write_unlock ( & md - > map_lock ) ;
return 0 ;
}
static void __unbind ( struct mapped_device * md )
{
struct dm_table * map = md - > map ;
if ( ! map )
return ;
dm_table_event_callback ( map , NULL , NULL ) ;
write_lock ( & md - > map_lock ) ;
md - > map = NULL ;
write_unlock ( & md - > map_lock ) ;
dm_table_put ( map ) ;
}
/*
* Constructor for a new device .
*/
2006-06-26 00:27:32 -07:00
int dm_create ( int minor , struct mapped_device * * result )
2005-04-16 15:20:36 -07:00
{
struct mapped_device * md ;
2006-06-26 00:27:32 -07:00
md = alloc_dev ( minor ) ;
2005-04-16 15:20:36 -07:00
if ( ! md )
return - ENXIO ;
* result = md ;
return 0 ;
}
2006-01-06 00:20:00 -08:00
static struct mapped_device * dm_find_md ( dev_t dev )
2005-04-16 15:20:36 -07:00
{
struct mapped_device * md ;
unsigned minor = MINOR ( dev ) ;
if ( MAJOR ( dev ) ! = _major | | minor > = ( 1 < < MINORBITS ) )
return NULL ;
2006-06-26 00:27:22 -07:00
spin_lock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
md = idr_find ( & _minor_idr , minor ) ;
2006-06-26 00:27:23 -07:00
if ( md & & ( md = = MINOR_ALLOCED | |
( dm_disk ( md ) - > first_minor ! = minor ) | |
2006-06-26 00:27:33 -07:00
test_bit ( DMF_FREEING , & md - > flags ) ) ) {
2006-01-06 00:20:00 -08:00
md = NULL ;
2006-06-26 00:27:23 -07:00
goto out ;
}
2005-04-16 15:20:36 -07:00
2006-06-26 00:27:23 -07:00
out :
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2005-04-16 15:20:36 -07:00
2006-01-06 00:20:00 -08:00
return md ;
}
2006-01-06 00:20:01 -08:00
struct mapped_device * dm_get_md ( dev_t dev )
{
struct mapped_device * md = dm_find_md ( dev ) ;
if ( md )
dm_get ( md ) ;
return md ;
}
2006-03-27 01:17:53 -08:00
void * dm_get_mdptr ( struct mapped_device * md )
2006-01-06 00:20:00 -08:00
{
2006-03-27 01:17:53 -08:00
return md - > interface_ptr ;
2005-04-16 15:20:36 -07:00
}
void dm_set_mdptr ( struct mapped_device * md , void * ptr )
{
md - > interface_ptr = ptr ;
}
void dm_get ( struct mapped_device * md )
{
atomic_inc ( & md - > holders ) ;
}
2006-06-26 00:27:35 -07:00
const char * dm_device_name ( struct mapped_device * md )
{
return md - > name ;
}
EXPORT_SYMBOL_GPL ( dm_device_name ) ;
2005-04-16 15:20:36 -07:00
void dm_put ( struct mapped_device * md )
{
2006-03-27 01:17:54 -08:00
struct dm_table * map ;
2005-04-16 15:20:36 -07:00
2006-06-26 00:27:23 -07:00
BUG_ON ( test_bit ( DMF_FREEING , & md - > flags ) ) ;
2006-06-26 00:27:22 -07:00
if ( atomic_dec_and_lock ( & md - > holders , & _minor_lock ) ) {
2006-03-27 01:17:54 -08:00
map = dm_get_table ( md ) ;
2006-06-26 00:27:21 -07:00
idr_replace ( & _minor_idr , MINOR_ALLOCED , dm_disk ( md ) - > first_minor ) ;
2006-06-26 00:27:23 -07:00
set_bit ( DMF_FREEING , & md - > flags ) ;
2006-06-26 00:27:22 -07:00
spin_unlock ( & _minor_lock ) ;
2005-07-28 21:15:57 -07:00
if ( ! dm_suspended ( md ) ) {
2005-04-16 15:20:36 -07:00
dm_table_presuspend_targets ( map ) ;
dm_table_postsuspend_targets ( map ) ;
}
__unbind ( md ) ;
2006-03-27 01:17:54 -08:00
dm_table_put ( map ) ;
2005-04-16 15:20:36 -07:00
free_dev ( md ) ;
}
}
2007-05-09 02:32:56 -07:00
EXPORT_SYMBOL_GPL ( dm_put ) ;
2005-04-16 15:20:36 -07:00
/*
* Process the deferred bios
*/
static void __flush_deferred_io ( struct mapped_device * md , struct bio * c )
{
struct bio * n ;
while ( c ) {
n = c - > bi_next ;
c - > bi_next = NULL ;
__split_bio ( md , c ) ;
c = n ;
}
}
/*
* Swap in a new table ( destroying old one ) .
*/
int dm_swap_table ( struct mapped_device * md , struct dm_table * table )
{
2005-07-12 15:53:05 -07:00
int r = - EINVAL ;
2005-04-16 15:20:36 -07:00
2005-07-28 21:16:00 -07:00
down ( & md - > suspend_lock ) ;
2005-04-16 15:20:36 -07:00
/* device must be suspended */
2005-07-28 21:15:57 -07:00
if ( ! dm_suspended ( md ) )
2005-07-12 15:53:05 -07:00
goto out ;
2005-04-16 15:20:36 -07:00
2007-01-26 00:57:07 -08:00
/* without bdev, the device size cannot be changed */
if ( ! md - > suspended_bdev )
if ( get_capacity ( md - > disk ) ! = dm_table_get_size ( table ) )
goto out ;
2005-04-16 15:20:36 -07:00
__unbind ( md ) ;
r = __bind ( md , table ) ;
2005-07-12 15:53:05 -07:00
out :
2005-07-28 21:16:00 -07:00
up ( & md - > suspend_lock ) ;
2005-07-12 15:53:05 -07:00
return r ;
2005-04-16 15:20:36 -07:00
}
/*
* Functions to lock and unlock any filesystem running on the
* device .
*/
2005-07-28 21:16:00 -07:00
static int lock_fs ( struct mapped_device * md )
2005-04-16 15:20:36 -07:00
{
2006-01-06 00:20:05 -08:00
int r ;
2005-04-16 15:20:36 -07:00
WARN_ON ( md - > frozen_sb ) ;
2005-05-05 16:16:04 -07:00
2006-01-06 00:20:05 -08:00
md - > frozen_sb = freeze_bdev ( md - > suspended_bdev ) ;
2005-05-05 16:16:04 -07:00
if ( IS_ERR ( md - > frozen_sb ) ) {
2005-07-28 21:15:57 -07:00
r = PTR_ERR ( md - > frozen_sb ) ;
2006-01-06 00:20:05 -08:00
md - > frozen_sb = NULL ;
return r ;
2005-05-05 16:16:04 -07:00
}
2006-01-06 00:20:06 -08:00
set_bit ( DMF_FROZEN , & md - > flags ) ;
2005-04-16 15:20:36 -07:00
/* don't bdput right now, we don't want the bdev
2006-01-06 00:20:05 -08:00
* to go away while it is locked .
2005-04-16 15:20:36 -07:00
*/
return 0 ;
}
2005-07-28 21:16:00 -07:00
static void unlock_fs ( struct mapped_device * md )
2005-04-16 15:20:36 -07:00
{
2006-01-06 00:20:06 -08:00
if ( ! test_bit ( DMF_FROZEN , & md - > flags ) )
return ;
2006-01-06 00:20:05 -08:00
thaw_bdev ( md - > suspended_bdev , md - > frozen_sb ) ;
2005-04-16 15:20:36 -07:00
md - > frozen_sb = NULL ;
2006-01-06 00:20:06 -08:00
clear_bit ( DMF_FROZEN , & md - > flags ) ;
2005-04-16 15:20:36 -07:00
}
/*
* We need to be able to change a mapping table under a mounted
* filesystem . For example we might want to move some data in
* the background . Before the table can be swapped with
* dm_bind_table , dm_suspend must be called to flush any in
* flight bios and ensure that any further io gets deferred .
*/
2006-12-08 02:41:04 -08:00
int dm_suspend ( struct mapped_device * md , unsigned suspend_flags )
2005-04-16 15:20:36 -07:00
{
2005-07-28 21:16:00 -07:00
struct dm_table * map = NULL ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
unsigned long flags ;
2005-04-16 15:20:36 -07:00
DECLARE_WAITQUEUE ( wait , current ) ;
2006-03-27 01:17:51 -08:00
struct bio * def ;
2005-07-28 21:15:57 -07:00
int r = - EINVAL ;
2006-12-08 02:41:04 -08:00
int do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG ? 1 : 0 ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
int noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG ? 1 : 0 ;
2005-04-16 15:20:36 -07:00
2005-07-28 21:16:00 -07:00
down ( & md - > suspend_lock ) ;
if ( dm_suspended ( md ) )
2006-11-08 17:44:43 -08:00
goto out_unlock ;
2005-04-16 15:20:36 -07:00
map = dm_get_table ( md ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
/*
* DMF_NOFLUSH_SUSPENDING must be set before presuspend .
* This flag is cleared before dm_suspend returns .
*/
if ( noflush )
set_bit ( DMF_NOFLUSH_SUSPENDING , & md - > flags ) ;
2005-07-28 21:15:57 -07:00
/* This does not get reverted if there's an error later. */
dm_table_presuspend_targets ( map ) ;
2007-01-26 00:57:07 -08:00
/* bdget() can stall if the pending I/Os are not flushed */
if ( ! noflush ) {
md - > suspended_bdev = bdget_disk ( md - > disk , 0 ) ;
if ( ! md - > suspended_bdev ) {
DMWARN ( " bdget failed in dm_suspend " ) ;
r = - ENOMEM ;
goto flush_and_out ;
}
2006-01-06 00:20:05 -08:00
}
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
/*
* Flush I / O to the device .
* noflush supersedes do_lockfs , because lock_fs ( ) needs to flush I / Os .
*/
if ( do_lockfs & & ! noflush ) {
2006-01-06 00:20:06 -08:00
r = lock_fs ( md ) ;
if ( r )
goto out ;
}
2005-04-16 15:20:36 -07:00
/*
2005-05-05 16:16:05 -07:00
* First we set the BLOCK_IO flag so no more ios will be mapped .
2005-04-16 15:20:36 -07:00
*/
2005-07-28 21:16:00 -07:00
down_write ( & md - > io_lock ) ;
set_bit ( DMF_BLOCK_IO , & md - > flags ) ;
2005-04-16 15:20:36 -07:00
add_wait_queue ( & md - > wait , & wait ) ;
2005-07-28 21:16:00 -07:00
up_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
/* unplug */
2005-07-28 21:16:00 -07:00
if ( map )
2005-04-16 15:20:36 -07:00
dm_table_unplug_all ( map ) ;
/*
* Then we wait for the already mapped ios to
* complete .
*/
while ( 1 ) {
set_current_state ( TASK_INTERRUPTIBLE ) ;
if ( ! atomic_read ( & md - > pending ) | | signal_pending ( current ) )
break ;
io_schedule ( ) ;
}
set_current_state ( TASK_RUNNING ) ;
2005-07-28 21:16:00 -07:00
down_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
remove_wait_queue ( & md - > wait , & wait ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
if ( noflush ) {
spin_lock_irqsave ( & md - > pushback_lock , flags ) ;
clear_bit ( DMF_NOFLUSH_SUSPENDING , & md - > flags ) ;
bio_list_merge_head ( & md - > deferred , & md - > pushback ) ;
bio_list_init ( & md - > pushback ) ;
spin_unlock_irqrestore ( & md - > pushback_lock , flags ) ;
}
2005-04-16 15:20:36 -07:00
/* were we interrupted ? */
2005-07-28 21:15:57 -07:00
r = - EINTR ;
2005-07-28 21:16:00 -07:00
if ( atomic_read ( & md - > pending ) ) {
2006-03-27 01:17:51 -08:00
clear_bit ( DMF_BLOCK_IO , & md - > flags ) ;
def = bio_list_get ( & md - > deferred ) ;
__flush_deferred_io ( md , def ) ;
2005-07-28 21:16:00 -07:00
up_write ( & md - > io_lock ) ;
unlock_fs ( md ) ;
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
goto out ; /* pushback list is already flushed, so skip flush */
2005-07-28 21:16:00 -07:00
}
up_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
2005-07-28 21:15:57 -07:00
dm_table_postsuspend_targets ( map ) ;
2005-04-16 15:20:36 -07:00
2005-07-28 21:16:00 -07:00
set_bit ( DMF_SUSPENDED , & md - > flags ) ;
2005-05-05 16:16:06 -07:00
2005-07-28 21:16:00 -07:00
r = 0 ;
2005-05-05 16:16:06 -07:00
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
flush_and_out :
if ( r & & noflush ) {
/*
* Because there may be already I / Os in the pushback list ,
* flush them before return .
*/
down_write ( & md - > io_lock ) ;
spin_lock_irqsave ( & md - > pushback_lock , flags ) ;
clear_bit ( DMF_NOFLUSH_SUSPENDING , & md - > flags ) ;
bio_list_merge_head ( & md - > deferred , & md - > pushback ) ;
bio_list_init ( & md - > pushback ) ;
spin_unlock_irqrestore ( & md - > pushback_lock , flags ) ;
def = bio_list_get ( & md - > deferred ) ;
__flush_deferred_io ( md , def ) ;
up_write ( & md - > io_lock ) ;
}
2005-07-28 21:16:00 -07:00
out :
2006-01-06 00:20:05 -08:00
if ( r & & md - > suspended_bdev ) {
bdput ( md - > suspended_bdev ) ;
md - > suspended_bdev = NULL ;
}
2005-07-28 21:16:00 -07:00
dm_table_put ( map ) ;
2006-11-08 17:44:43 -08:00
out_unlock :
2005-07-28 21:16:00 -07:00
up ( & md - > suspend_lock ) ;
2005-07-28 21:15:57 -07:00
return r ;
2005-04-16 15:20:36 -07:00
}
int dm_resume ( struct mapped_device * md )
{
2005-07-28 21:15:57 -07:00
int r = - EINVAL ;
2005-04-16 15:20:36 -07:00
struct bio * def ;
2005-07-28 21:15:57 -07:00
struct dm_table * map = NULL ;
2005-04-16 15:20:36 -07:00
2005-07-28 21:16:00 -07:00
down ( & md - > suspend_lock ) ;
if ( ! dm_suspended ( md ) )
2005-07-28 21:15:57 -07:00
goto out ;
map = dm_get_table ( md ) ;
2005-07-28 21:16:00 -07:00
if ( ! map | | ! dm_table_get_size ( map ) )
2005-07-28 21:15:57 -07:00
goto out ;
2005-04-16 15:20:36 -07:00
2006-10-03 01:15:36 -07:00
r = dm_table_resume_targets ( map ) ;
if ( r )
goto out ;
2005-07-28 21:16:00 -07:00
down_write ( & md - > io_lock ) ;
2005-04-16 15:20:36 -07:00
clear_bit ( DMF_BLOCK_IO , & md - > flags ) ;
def = bio_list_get ( & md - > deferred ) ;
__flush_deferred_io ( md , def ) ;
2005-07-28 21:16:00 -07:00
up_write ( & md - > io_lock ) ;
unlock_fs ( md ) ;
2007-01-26 00:57:07 -08:00
if ( md - > suspended_bdev ) {
bdput ( md - > suspended_bdev ) ;
md - > suspended_bdev = NULL ;
}
2006-01-06 00:20:05 -08:00
2005-07-28 21:16:00 -07:00
clear_bit ( DMF_SUSPENDED , & md - > flags ) ;
2005-04-16 15:20:36 -07:00
dm_table_unplug_all ( map ) ;
2006-10-03 01:15:35 -07:00
kobject_uevent ( & md - > disk - > kobj , KOBJ_CHANGE ) ;
2005-07-28 21:15:57 -07:00
r = 0 ;
2005-07-28 21:16:00 -07:00
2005-07-28 21:15:57 -07:00
out :
dm_table_put ( map ) ;
2005-07-28 21:16:00 -07:00
up ( & md - > suspend_lock ) ;
2005-07-28 21:15:57 -07:00
return r ;
2005-04-16 15:20:36 -07:00
}
/*-----------------------------------------------------------------
* Event notification .
* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
uint32_t dm_get_event_nr ( struct mapped_device * md )
{
return atomic_read ( & md - > event_nr ) ;
}
int dm_wait_event ( struct mapped_device * md , int event_nr )
{
return wait_event_interruptible ( md - > eventq ,
( event_nr ! = atomic_read ( & md - > event_nr ) ) ) ;
}
/*
* The gendisk is only valid as long as you have a reference
* count on ' md ' .
*/
struct gendisk * dm_disk ( struct mapped_device * md )
{
return md - > disk ;
}
int dm_suspended ( struct mapped_device * md )
{
return test_bit ( DMF_SUSPENDED , & md - > flags ) ;
}
[PATCH] dm: suspend: add noflush pushback
In device-mapper I/O is sometimes queued within targets for later processing.
For example the multipath target can be configured to store I/O when no paths
are available instead of returning it -EIO.
This patch allows the device-mapper core to instruct a target to transfer the
contents of any such in-target queue back into the core. This frees up the
resources used by the target so the core can replace that target with an
alternative one and then resend the I/O to it. Without this patch the only
way to change the target in such circumstances involves returning the I/O with
an error back to the filesystem/application. In the multipath case, this
patch will let us add new paths for existing I/O to try after all the existing
paths have failed.
DMF_NOFLUSH_SUSPENDING
----------------------
If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It
is always cleared before dm_suspend() returns.
The flag must be visible while the target is flushing pending I/Os so it
is set before presuspend where the flush starts and unset after the wait
for md->pending where the flush ends.
Target drivers can check this flag by calling dm_noflush_suspending().
DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
-----------------------------------
A target's map() function can now return DM_MAPIO_REQUEUE to request the
device mapper core queue the bio.
Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
the same. This has been labelled 'pushback'.
The __map_bio() and clone_endio() functions in the core treat these return
values as errors and call dec_pending() to end the I/O.
dec_pending
-----------
dec_pending() saves the pushback request in struct dm_io->error. Once all
the split clones have ended, dec_pending() will put the original bio on
the md->pushback list. Note that this supercedes any I/O errors.
It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
in progress (e.g. by user interrupt). dec_pending() checks for this and
returns -EIO if it happened.
pushdback list and pushback_lock
--------------------------------
The bio is queued on md->pushback temporarily in dec_pending(), and after
all pending I/Os return, md->pushback is merged into md->deferred in
dm_suspend() for re-issuing at resume time.
md->pushback_lock protects md->pushback.
The lock should be held with irq disabled because dec_pending() can be
called from interrupt context.
Queueing bios to md->pushback in dec_pending() must be done atomically
with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is
held when checking the flag. Otherwise dec_pending() may queue a bio to
md->pushback after the interrupted dm_suspend() flushes md->pushback.
Then the bio would be left in md->pushback.
Flag setting in dm_suspend() can be done without md->pushback_lock because
the flag is checked only after presuspend and the set value is already
made visible via the target's presuspend function.
The flag can be checked without md->pushback_lock (e.g. the first part of
the dec_pending() or target drivers), because the flag is checked again
with md->pushback_lock held when the bio is really queued to md->pushback
as described above. So even if the flag is cleared after the lockless
checkings, the bio isn't left in md->pushback but returned to applications
with -EIO.
Other notes on the current patch
--------------------------------
- md->pushback is added to the struct mapped_device instead of using
md->deferred directly because md->io_lock which protects md->deferred is
rw_semaphore and can't be used in interrupt context like dec_pending(),
and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
- Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
ioctl option is specified, because I/Os generated by lock_fs() would be
pushed back and never return if there were no valid devices.
- If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
flag is set, md->pushback must be flushed because I/Os may be queued to
the list already. (flush_and_out label in dm_suspend())
Test results
------------
I have tested using multipath target with the next patch.
The following tests are for regression/compatibility:
- I/Os succeed when valid paths exist;
- I/Os fail when there are no valid paths and queue_if_no_path is not
set;
- I/Os are queued in the multipath target when there are no valid paths and
queue_if_no_path is set;
- The queued I/Os above fail when suspend is issued without the
DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also
fail.
The following tests are for the normal code path of new pushback feature:
- Queued I/Os in the multipath target are flushed from the target
but don't return when suspend is issued with the DM_NOFLUSH_FLAG
ioctl option;
- The I/Os above are queued in the multipath target again when
resume is issued without path recovery;
- The I/Os above succeed when resume is issued after path recovery
or table load;
- Queued I/Os in the multipath target succeed when resume is issued
with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
spanning 2 multipath targets also succeed.
The following tests are for the error paths of the new pushback feature:
- When the bdget_disk() fails in dm_suspend(), the
DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
pushback list are flushed properly.
- When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
o I/Os which had already been queued to the pushback list
at the time don't return, and are re-issued at resume time;
o I/Os which hadn't been returned at the time return with EIO.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 02:41:09 -08:00
int dm_noflush_suspending ( struct dm_target * ti )
{
struct mapped_device * md = dm_table_get_md ( ti - > table ) ;
int r = __noflush_suspending ( md ) ;
dm_put ( md ) ;
return r ;
}
EXPORT_SYMBOL_GPL ( dm_noflush_suspending ) ;
2005-04-16 15:20:36 -07:00
static struct block_device_operations dm_blk_dops = {
. open = dm_blk_open ,
. release = dm_blk_close ,
2006-10-03 01:15:15 -07:00
. ioctl = dm_blk_ioctl ,
2006-03-27 01:17:54 -08:00
. getgeo = dm_blk_getgeo ,
2005-04-16 15:20:36 -07:00
. owner = THIS_MODULE
} ;
EXPORT_SYMBOL ( dm_get_mapinfo ) ;
/*
* module hooks
*/
module_init ( dm_init ) ;
module_exit ( dm_exit ) ;
module_param ( major , uint , 0 ) ;
MODULE_PARM_DESC ( major , " The major number of the device mapper " ) ;
MODULE_DESCRIPTION ( DM_NAME " driver " ) ;
MODULE_AUTHOR ( " Joe Thornber <dm-devel@redhat.com> " ) ;
MODULE_LICENSE ( " GPL " ) ;