2006-01-11 23:17:46 +03:00
# include <linux/capability.h>
2005-04-17 02:20:36 +04:00
# include <linux/blkdev.h>
2011-05-27 00:00:52 +04:00
# include <linux/export.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
# include <linux/gfp.h>
2005-04-17 02:20:36 +04:00
# include <linux/blkpg.h>
2006-01-08 12:02:50 +03:00
# include <linux/hdreg.h>
2005-04-17 02:20:36 +04:00
# include <linux/backing-dev.h>
2011-09-16 10:31:11 +04:00
# include <linux/fs.h>
2006-03-23 22:00:26 +03:00
# include <linux/blktrace_api.h>
2005-04-17 02:20:36 +04:00
# include <asm/uaccess.h>
static int blkpg_ioctl ( struct block_device * bdev , struct blkpg_ioctl_arg __user * arg )
{
struct block_device * bdevp ;
struct gendisk * disk ;
2012-08-01 14:24:18 +04:00
struct hd_struct * part , * lpart ;
2005-04-17 02:20:36 +04:00
struct blkpg_ioctl_arg a ;
struct blkpg_partition p ;
2008-09-03 11:03:02 +04:00
struct disk_part_iter piter ;
2005-04-17 02:20:36 +04:00
long long start , length ;
2008-09-03 11:01:09 +04:00
int partno ;
2005-04-17 02:20:36 +04:00
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
if ( copy_from_user ( & a , arg , sizeof ( struct blkpg_ioctl_arg ) ) )
return - EFAULT ;
if ( copy_from_user ( & p , a . data , sizeof ( struct blkpg_partition ) ) )
return - EFAULT ;
disk = bdev - > bd_disk ;
if ( bdev ! = bdev - > bd_contains )
return - EINVAL ;
2008-09-03 11:01:09 +04:00
partno = p . pno ;
2008-08-25 14:56:15 +04:00
if ( partno < = 0 )
2005-04-17 02:20:36 +04:00
return - EINVAL ;
switch ( a . op ) {
case BLKPG_ADD_PARTITION :
start = p . start > > 9 ;
length = p . length > > 9 ;
2012-08-01 14:24:18 +04:00
/* check for fit in a hd_struct */
if ( sizeof ( sector_t ) = = sizeof ( long ) & &
2005-04-17 02:20:36 +04:00
sizeof ( long long ) > sizeof ( long ) ) {
long pstart = start , plength = length ;
if ( pstart ! = start | | plength ! = length
2012-09-17 14:47:13 +04:00
| | pstart < 0 | | plength < 0 | | partno > 65535 )
2005-04-17 02:20:36 +04:00
return - EINVAL ;
}
2008-08-25 14:30:16 +04:00
2006-03-23 14:00:28 +03:00
mutex_lock ( & bdev - > bd_mutex ) ;
2008-08-25 14:30:16 +04:00
2005-04-17 02:20:36 +04:00
/* overlap? */
2008-09-03 11:03:02 +04:00
disk_part_iter_init ( & piter , disk ,
DISK_PITER_INCL_EMPTY ) ;
while ( ( part = disk_part_iter_next ( & piter ) ) ) {
if ( ! ( start + length < = part - > start_sect | |
start > = part - > start_sect + part - > nr_sects ) ) {
disk_part_iter_exit ( & piter ) ;
2006-03-23 14:00:28 +03:00
mutex_unlock ( & bdev - > bd_mutex ) ;
2005-04-17 02:20:36 +04:00
return - EBUSY ;
}
}
2008-09-03 11:03:02 +04:00
disk_part_iter_exit ( & piter ) ;
2005-04-17 02:20:36 +04:00
/* all seems OK */
2008-11-10 09:29:58 +03:00
part = add_partition ( disk , partno , start , length ,
2010-09-01 00:47:05 +04:00
ADDPART_FLAG_NONE , NULL ) ;
2006-03-23 14:00:28 +03:00
mutex_unlock ( & bdev - > bd_mutex ) ;
2013-11-06 11:56:39 +04:00
return PTR_ERR_OR_ZERO ( part ) ;
2005-04-17 02:20:36 +04:00
case BLKPG_DEL_PARTITION :
2008-09-03 11:03:02 +04:00
part = disk_get_part ( disk , partno ) ;
if ( ! part )
2005-04-17 02:20:36 +04:00
return - ENXIO ;
2008-09-03 11:03:02 +04:00
bdevp = bdget ( part_devt ( part ) ) ;
disk_put_part ( part ) ;
2005-04-17 02:20:36 +04:00
if ( ! bdevp )
return - ENOMEM ;
2008-09-03 11:03:02 +04:00
2006-12-08 13:36:13 +03:00
mutex_lock ( & bdevp - > bd_mutex ) ;
2005-04-17 02:20:36 +04:00
if ( bdevp - > bd_openers ) {
2006-03-23 14:00:28 +03:00
mutex_unlock ( & bdevp - > bd_mutex ) ;
2005-04-17 02:20:36 +04:00
bdput ( bdevp ) ;
return - EBUSY ;
}
/* all seems OK */
fsync_bdev ( bdevp ) ;
2007-05-07 01:49:54 +04:00
invalidate_bdev ( bdevp ) ;
2005-04-17 02:20:36 +04:00
2007-02-21 00:58:18 +03:00
mutex_lock_nested ( & bdev - > bd_mutex , 1 ) ;
2008-09-03 11:01:09 +04:00
delete_partition ( disk , partno ) ;
2006-03-23 14:00:28 +03:00
mutex_unlock ( & bdev - > bd_mutex ) ;
mutex_unlock ( & bdevp - > bd_mutex ) ;
2005-04-17 02:20:36 +04:00
bdput ( bdevp ) ;
2012-08-01 14:24:18 +04:00
return 0 ;
case BLKPG_RESIZE_PARTITION :
start = p . start > > 9 ;
/* new length of partition in bytes */
length = p . length > > 9 ;
/* check for fit in a hd_struct */
if ( sizeof ( sector_t ) = = sizeof ( long ) & &
sizeof ( long long ) > sizeof ( long ) ) {
long pstart = start , plength = length ;
if ( pstart ! = start | | plength ! = length
| | pstart < 0 | | plength < 0 )
return - EINVAL ;
}
part = disk_get_part ( disk , partno ) ;
if ( ! part )
return - ENXIO ;
bdevp = bdget ( part_devt ( part ) ) ;
if ( ! bdevp ) {
disk_put_part ( part ) ;
return - ENOMEM ;
}
mutex_lock ( & bdevp - > bd_mutex ) ;
mutex_lock_nested ( & bdev - > bd_mutex , 1 ) ;
if ( start ! = part - > start_sect ) {
mutex_unlock ( & bdevp - > bd_mutex ) ;
mutex_unlock ( & bdev - > bd_mutex ) ;
bdput ( bdevp ) ;
disk_put_part ( part ) ;
return - EINVAL ;
}
/* overlap? */
disk_part_iter_init ( & piter , disk ,
DISK_PITER_INCL_EMPTY ) ;
while ( ( lpart = disk_part_iter_next ( & piter ) ) ) {
if ( lpart - > partno ! = partno & &
! ( start + length < = lpart - > start_sect | |
start > = lpart - > start_sect + lpart - > nr_sects )
) {
disk_part_iter_exit ( & piter ) ;
mutex_unlock ( & bdevp - > bd_mutex ) ;
mutex_unlock ( & bdev - > bd_mutex ) ;
bdput ( bdevp ) ;
disk_put_part ( part ) ;
return - EBUSY ;
}
}
disk_part_iter_exit ( & piter ) ;
part_nr_sects_write ( part , ( sector_t ) length ) ;
i_size_write ( bdevp - > bd_inode , p . length ) ;
mutex_unlock ( & bdevp - > bd_mutex ) ;
mutex_unlock ( & bdev - > bd_mutex ) ;
bdput ( bdevp ) ;
disk_put_part ( part ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
default :
return - EINVAL ;
}
}
static int blkdev_reread_part ( struct block_device * bdev )
{
struct gendisk * disk = bdev - > bd_disk ;
int res ;
2011-08-23 22:01:04 +04:00
if ( ! disk_part_scan_enabled ( disk ) | | bdev ! = bdev - > bd_contains )
2005-04-17 02:20:36 +04:00
return - EINVAL ;
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
2006-03-23 14:00:28 +03:00
if ( ! mutex_trylock ( & bdev - > bd_mutex ) )
2005-04-17 02:20:36 +04:00
return - EBUSY ;
res = rescan_partitions ( disk , bdev ) ;
2006-03-23 14:00:28 +03:00
mutex_unlock ( & bdev - > bd_mutex ) ;
2005-04-17 02:20:36 +04:00
return res ;
}
2008-08-11 18:58:42 +04:00
static int blk_ioctl_discard ( struct block_device * bdev , uint64_t start ,
2010-08-12 01:17:49 +04:00
uint64_t len , int secure )
2008-08-11 18:58:42 +04:00
{
2010-09-16 22:51:46 +04:00
unsigned long flags = 0 ;
2010-08-12 01:17:49 +04:00
2008-08-11 18:58:42 +04:00
if ( start & 511 )
return - EINVAL ;
if ( len & 511 )
return - EINVAL ;
start > > = 9 ;
len > > = 9 ;
2010-11-08 16:39:12 +03:00
if ( start + len > ( i_size_read ( bdev - > bd_inode ) > > 9 ) )
2008-08-11 18:58:42 +04:00
return - EINVAL ;
2010-08-12 01:17:49 +04:00
if ( secure )
2010-09-16 22:51:46 +04:00
flags | = BLKDEV_DISCARD_SECURE ;
2010-08-12 01:17:49 +04:00
return blkdev_issue_discard ( bdev , start , len , GFP_KERNEL , flags ) ;
2008-08-11 18:58:42 +04:00
}
2012-09-18 20:19:29 +04:00
static int blk_ioctl_zeroout ( struct block_device * bdev , uint64_t start ,
uint64_t len )
{
if ( start & 511 )
return - EINVAL ;
if ( len & 511 )
return - EINVAL ;
start > > = 9 ;
len > > = 9 ;
if ( start + len > ( i_size_read ( bdev - > bd_inode ) > > 9 ) )
return - EINVAL ;
return blkdev_issue_zeroout ( bdev , start , len , GFP_KERNEL ) ;
}
2005-04-17 02:20:36 +04:00
static int put_ushort ( unsigned long arg , unsigned short val )
{
return put_user ( val , ( unsigned short __user * ) arg ) ;
}
static int put_int ( unsigned long arg , int val )
{
return put_user ( val , ( int __user * ) arg ) ;
}
2009-10-03 22:52:01 +04:00
static int put_uint ( unsigned long arg , unsigned int val )
{
return put_user ( val , ( unsigned int __user * ) arg ) ;
}
2005-04-17 02:20:36 +04:00
static int put_long ( unsigned long arg , long val )
{
return put_user ( val , ( long __user * ) arg ) ;
}
static int put_ulong ( unsigned long arg , unsigned long val )
{
return put_user ( val , ( unsigned long __user * ) arg ) ;
}
static int put_u64 ( unsigned long arg , u64 val )
{
return put_user ( val , ( u64 __user * ) arg ) ;
}
2007-08-30 04:34:12 +04:00
int __blkdev_driver_ioctl ( struct block_device * bdev , fmode_t mode ,
unsigned cmd , unsigned long arg )
{
struct gendisk * disk = bdev - > bd_disk ;
[PATCH] beginning of methods conversion
To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both. That's this changeset.
2) for each driver convert to new methods. *ALL* drivers
are converted in this series.
3) kill the old (renamed) methods.
Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain. The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.
New methods:
open(bdev, mode)
release(disk, mode)
ioctl(bdev, mode, cmd, arg) /* Called without BKL */
compat_ioctl(bdev, mode, cmd, arg)
locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-02 17:09:22 +03:00
if ( disk - > fops - > ioctl )
return disk - > fops - > ioctl ( bdev , mode , cmd , arg ) ;
2007-08-30 04:34:12 +04:00
return - ENOTTY ;
}
/*
* For the record : _GPL here is only because somebody decided to slap it
* on the previous export . Sheer idiocy , since it wasn ' t copyrightable
* at all and could be open - coded without any exports by anybody who cares .
*/
EXPORT_SYMBOL_GPL ( __blkdev_driver_ioctl ) ;
2012-01-06 03:40:12 +04:00
/*
* Is it an unrecognized ioctl ? The correct returns are either
* ENOTTY ( final ) or ENOIOCTLCMD ( " I don't know this one, try a
* fallback " ). ENOIOCTLCMD gets turned into ENOTTY by the ioctl
* code before returning .
*
* Confused drivers sometimes return EINVAL , which is wrong . It
* means " I understood the ioctl command, but the parameters to
* it were wrong " .
*
* We should aim to just fix the broken drivers , the EINVAL case
* should go away .
*/
static inline int is_unrecognized_ioctl ( int ret )
{
return ret = = - EINVAL | |
ret = = - ENOTTY | |
ret = = - ENOIOCTLCMD ;
}
2007-10-09 15:23:51 +04:00
/*
2010-07-08 12:18:46 +04:00
* always keep this in sync with compat_blkdev_ioctl ( )
2007-10-09 15:23:51 +04:00
*/
2008-09-19 11:17:36 +04:00
int blkdev_ioctl ( struct block_device * bdev , fmode_t mode , unsigned cmd ,
2005-06-23 11:10:15 +04:00
unsigned long arg )
{
struct gendisk * disk = bdev - > bd_disk ;
2008-09-18 23:53:24 +04:00
struct backing_dev_info * bdi ;
loff_t size ;
2005-06-23 11:10:15 +04:00
int ret , n ;
2014-05-25 16:43:33 +04:00
unsigned int max_sectors ;
2005-06-23 11:10:15 +04:00
switch ( cmd ) {
2005-04-17 02:20:36 +04:00
case BLKFLSBUF :
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
2005-06-23 11:10:15 +04:00
2008-09-18 11:38:12 +04:00
ret = __blkdev_driver_ioctl ( bdev , mode , cmd , arg ) ;
2012-01-06 03:40:12 +04:00
if ( ! is_unrecognized_ioctl ( ret ) )
2005-06-23 11:10:15 +04:00
return ret ;
2005-04-17 02:20:36 +04:00
fsync_bdev ( bdev ) ;
2007-05-07 01:49:54 +04:00
invalidate_bdev ( bdev ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
2005-06-23 11:10:15 +04:00
2005-04-17 02:20:36 +04:00
case BLKROSET :
2008-09-18 11:38:12 +04:00
ret = __blkdev_driver_ioctl ( bdev , mode , cmd , arg ) ;
2012-01-06 03:40:12 +04:00
if ( ! is_unrecognized_ioctl ( ret ) )
2005-06-23 11:10:15 +04:00
return ret ;
2005-04-17 02:20:36 +04:00
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
if ( get_user ( n , ( int __user * ) ( arg ) ) )
return - EFAULT ;
set_device_ro ( bdev , n ) ;
return 0 ;
2008-08-11 18:58:42 +04:00
2010-08-12 01:17:49 +04:00
case BLKDISCARD :
case BLKSECDISCARD : {
2008-08-11 18:58:42 +04:00
uint64_t range [ 2 ] ;
2008-09-18 11:38:12 +04:00
if ( ! ( mode & FMODE_WRITE ) )
2008-08-11 18:58:42 +04:00
return - EBADF ;
if ( copy_from_user ( range , ( void __user * ) arg , sizeof ( range ) ) )
return - EFAULT ;
2010-08-12 01:17:49 +04:00
return blk_ioctl_discard ( bdev , range [ 0 ] , range [ 1 ] ,
cmd = = BLKSECDISCARD ) ;
2008-08-11 18:58:42 +04:00
}
2012-09-18 20:19:29 +04:00
case BLKZEROOUT : {
uint64_t range [ 2 ] ;
if ( ! ( mode & FMODE_WRITE ) )
return - EBADF ;
if ( copy_from_user ( range , ( void __user * ) arg , sizeof ( range ) ) )
return - EFAULT ;
return blk_ioctl_zeroout ( bdev , range [ 0 ] , range [ 1 ] ) ;
}
2008-08-11 18:58:42 +04:00
2006-01-08 12:02:50 +03:00
case HDIO_GETGEO : {
struct hd_geometry geo ;
if ( ! arg )
return - EINVAL ;
if ( ! disk - > fops - > getgeo )
return - ENOTTY ;
/*
* We need to set the startsect first , the driver may
* want to override it .
*/
2010-11-08 16:42:40 +03:00
memset ( & geo , 0 , sizeof ( geo ) ) ;
2006-01-08 12:02:50 +03:00
geo . start = get_start_sect ( bdev ) ;
ret = disk - > fops - > getgeo ( bdev , & geo ) ;
if ( ret )
return ret ;
if ( copy_to_user ( ( struct hd_geometry __user * ) arg , & geo ,
sizeof ( geo ) ) )
return - EFAULT ;
return 0 ;
}
2008-09-18 23:53:24 +04:00
case BLKRAGET :
case BLKFRAGET :
if ( ! arg )
return - EINVAL ;
bdi = blk_get_backing_dev_info ( bdev ) ;
return put_long ( arg , ( bdi - > ra_pages * PAGE_CACHE_SIZE ) / 512 ) ;
case BLKROGET :
return put_int ( arg , bdev_read_only ( bdev ) ! = 0 ) ;
2009-10-03 22:52:01 +04:00
case BLKBSZGET : /* get block device soft block size (cf. BLKSSZGET) */
2008-09-18 23:53:24 +04:00
return put_int ( arg , block_size ( bdev ) ) ;
2009-10-03 22:52:01 +04:00
case BLKSSZGET : /* get block device logical block size */
2009-05-23 01:17:49 +04:00
return put_int ( arg , bdev_logical_block_size ( bdev ) ) ;
2009-10-03 22:52:01 +04:00
case BLKPBSZGET : /* get block device physical block size */
return put_uint ( arg , bdev_physical_block_size ( bdev ) ) ;
case BLKIOMIN :
return put_uint ( arg , bdev_io_min ( bdev ) ) ;
case BLKIOOPT :
return put_uint ( arg , bdev_io_opt ( bdev ) ) ;
case BLKALIGNOFF :
return put_int ( arg , bdev_alignment_offset ( bdev ) ) ;
2009-12-03 11:24:48 +03:00
case BLKDISCARDZEROES :
return put_uint ( arg , bdev_discard_zeroes_data ( bdev ) ) ;
2008-09-18 23:53:24 +04:00
case BLKSECTGET :
2014-05-25 16:43:33 +04:00
max_sectors = min_t ( unsigned int , USHRT_MAX ,
queue_max_sectors ( bdev_get_queue ( bdev ) ) ) ;
return put_ushort ( arg , max_sectors ) ;
2012-01-11 19:29:31 +04:00
case BLKROTATIONAL :
return put_ushort ( arg , ! blk_queue_nonrot ( bdev_get_queue ( bdev ) ) ) ;
2008-09-18 23:53:24 +04:00
case BLKRASET :
case BLKFRASET :
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
bdi = blk_get_backing_dev_info ( bdev ) ;
bdi - > ra_pages = ( arg * 512 ) / PAGE_CACHE_SIZE ;
return 0 ;
case BLKBSZSET :
/* set the logical block size */
if ( ! capable ( CAP_SYS_ADMIN ) )
return - EACCES ;
if ( ! arg )
return - EINVAL ;
if ( get_user ( n , ( int __user * ) arg ) )
return - EFAULT ;
2011-02-24 17:45:41 +03:00
if ( ! ( mode & FMODE_EXCL ) ) {
bdgrab ( bdev ) ;
if ( blkdev_get ( bdev , mode | FMODE_EXCL , & bdev ) < 0 )
return - EBUSY ;
}
2008-09-18 23:53:24 +04:00
ret = set_blocksize ( bdev , n ) ;
2008-09-19 11:08:13 +04:00
if ( ! ( mode & FMODE_EXCL ) )
block: make blkdev_get/put() handle exclusive access
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.
* blkdev_get/put() are the primary open and close functions.
* bd_claim/release() deal with exclusive open.
* open/close_bdev_exclusive() are combination of open and claim and
the other way around, respectively.
* bd_link/unlink_disk_holder() to create and remove holder/slave
symlinks.
* open_by_devnum() wraps bdget() + blkdev_get().
The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access. Reorganize the interface such that,
* blkdev_get() is extended to include exclusive access management.
@holder argument is added and, if is @FMODE_EXCL specified, it will
gain exclusive access atomically w.r.t. other exclusive accesses.
* blkdev_put() is similarly extended. It now takes @mode argument and
if @FMODE_EXCL is set, it releases an exclusive access. Also, when
the last exclusive claim is released, the holder/slave symlinks are
removed automatically.
* bd_claim/release() and close_bdev_exclusive() are no longer
necessary and either made static or removed.
* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
is no longer necessary and removed.
* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
and blkdev_get(). It also has an unexpected extra bdev_read_only()
test which probably should be moved into blkdev_get().
* open_by_devnum() is modified to take @holder argument and pass it to
blkdev_get().
Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should). This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.
open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features. Well, let's leave them for another day.
Most conversions are straight-forward. drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
2010-11-13 13:55:17 +03:00
blkdev_put ( bdev , mode | FMODE_EXCL ) ;
2005-06-23 11:10:15 +04:00
return ret ;
2008-09-18 23:53:24 +04:00
case BLKPG :
ret = blkpg_ioctl ( bdev , ( struct blkpg_ioctl_arg __user * ) arg ) ;
break ;
case BLKRRPART :
ret = blkdev_reread_part ( bdev ) ;
break ;
case BLKGETSIZE :
2010-11-08 16:39:12 +03:00
size = i_size_read ( bdev - > bd_inode ) ;
2008-09-18 23:53:24 +04:00
if ( ( size > > 9 ) > ~ 0UL )
return - EFBIG ;
return put_ulong ( arg , size > > 9 ) ;
case BLKGETSIZE64 :
2010-11-08 16:39:12 +03:00
return put_u64 ( arg , i_size_read ( bdev - > bd_inode ) ) ;
2008-09-18 23:53:24 +04:00
case BLKTRACESTART :
case BLKTRACESTOP :
case BLKTRACESETUP :
case BLKTRACETEARDOWN :
ret = blk_trace_ioctl ( bdev , cmd , ( char __user * ) arg ) ;
break ;
default :
ret = __blkdev_driver_ioctl ( bdev , mode , cmd , arg ) ;
}
return ret ;
2005-04-17 02:20:36 +04:00
}
[PATCH] Fix root hole in raw device
[Patch] Fix raw device ioctl pass-through
Raw character devices are supposed to pass ioctls through to the block
devices they are bound to. Unfortunately, they are using the wrong
function for this: ioctl_by_bdev(), instead of blkdev_ioctl().
ioctl_by_bdev() performs a set_fs(KERNEL_DS) before calling the ioctl,
redirecting the user-space buffer access to the kernel address space.
This is, needless to say, a bad thing.
This was noticed first on s390, where raw IO was non-functioning. The
s390 driver config does not actually allow raw IO to be enabled, which
was the first part of the problem. Secondly, the s390 kernel address
space is distinct from user, causing legal raw ioctls to fail. I've
reproduced this on a kernel built with 4G:4G split on x86, which fails
in the same way (-EFAULT if the address does not exist kernel-side;
returns success without actually populating the user buffer if it does.)
The patch below fixes both the config and address-space problems. It's
based closely on a patch by Jan Glauber <jang@de.ibm.com>, which has
been tested on s390 at IBM. I've tested it on x86 4G:4G (split address
space) and x86_64 (common address space).
Kernel-address-space access has been assigned CAN-2005-1264.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-05-14 07:31:19 +04:00
EXPORT_SYMBOL_GPL ( blkdev_ioctl ) ;