linux/block
Yasuaki Ishimatsu 7681bfeecc block: fix accounting bug on cross partition merges
/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
   is 0 and sda2's one is 1.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct->in_flight are not changed.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct->in_flight, not a sda1's one, is decremented.

        | hd_struct->in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-19 09:07:02 +02:00
..
blk-barrier.c block: set up rq->rq_disk properly for flush requests 2010-08-07 18:52:41 +02:00
blk-cgroup.c blkio-throttle: limit max iops value to UINT_MAX 2010-10-01 21:16:41 +02:00
blk-cgroup.h blkio-throttle: limit max iops value to UINT_MAX 2010-10-01 21:16:41 +02:00
blk-core.c block: fix accounting bug on cross partition merges 2010-10-19 09:07:02 +02:00
blk-exec.c block: Prevent hang_check firing during long I/O 2010-09-24 15:52:09 +02:00
blk-integrity.c block: Fix double free in blk_integrity_unregister 2010-10-15 15:49:18 +02:00
blk-ioc.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
blk-iopoll.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
blk-lib.c block: add secure discard 2010-08-12 08:43:30 -07:00
blk-map.c block: fix an address space warning in blk-map.c 2010-09-15 13:08:27 +02:00
blk-merge.c block: fix accounting bug on cross partition merges 2010-10-19 09:07:02 +02:00
blk-settings.c block: Ensure physical block size is unsigned int 2010-10-13 21:19:12 +02:00
blk-softirq.c generic-ipi: remove CSD_FLAG_WAIT 2009-02-25 14:13:44 +01:00
blk-sysfs.c block/scsi: Provide a limit on the number of integrity segments 2010-09-10 20:50:10 +02:00
blk-tag.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
blk-throttle.c blkio-throttle: Fix possible multiplication overflow in iops calculations 2010-10-01 21:16:42 +02:00
blk-timeout.c block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer 2010-04-21 17:42:08 +02:00
blk.h block: fix accounting bug on cross partition merges 2010-10-19 09:07:02 +02:00
bsg.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
cfq-iosched.c blkio: Recalculate the throttled bio dispatch time upon throttle limit change 2010-10-01 14:49:49 +02:00
cfq.h blk-cgroup: Prepare the base for supporting more than one IO control policies 2010-09-16 08:42:04 +02:00
compat_ioctl.c block: add secure discard 2010-08-12 08:43:30 -07:00
deadline-iosched.c block: convert to pos and nr_sectors accessors 2009-05-11 09:50:54 +02:00
elevator.c block: add secure discard 2010-08-12 08:43:30 -07:00
genhd.c block: fix accounting bug on cross partition merges 2010-10-19 09:07:02 +02:00
ioctl.c block, partition: add partition_meta_info to hd_struct 2010-09-15 16:13:18 +02:00
Kconfig blkio: Core implementation of throttle policy 2010-09-16 08:42:52 +02:00
Kconfig.iosched blk-cgroup: config options re-arrangement 2010-04-26 19:27:56 +02:00
Makefile blkio: Core implementation of throttle policy 2010-09-16 08:42:52 +02:00
noop-iosched.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
scsi_ioctl.c block/scsi_ioctl.c: quiet sparse noise 2009-11-04 09:10:33 +01:00