linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	0e04c641b1	. Add dm_accept_partial_bio interface to DM core to allow DM targets to only process a portion of a bio, the remainder being sent in the next bio. This enables the old dm snapshot-origin target to only split write bios on chunk boundaries, read bios are now sent to the origin device unchanged. . Add DM core support for disabling WRITE SAME if the underlying SCSI layer disables it due to command failure. . Reduce lock contention in DM's bio-prison. . A few small cleanups and fixes to dm-thin and dm-era. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTmavIAAoJEMUj8QotnQNay/EIAJI6lEFlQK3gG830+Yaw1m2U 7mnnX/Rd/N6RnccyhmFqk7Xu0REM7gJEgWicTQjR58La2DKFi072N0mwgHgHfB8f 1oOvUKN5Nb/a1CmRcVzSO0sbYcJn9I1r+k0buqfFHivU68wuedG+MrVya3YzOjvC 63MQiu4+3icDprcToxn+etz75FhrFps5QAsS0cH6t1VZqFCGIzxqgUKgY8zGo1CH P9hkYpJhhJe2aDh4vlFvpFYVFXt9zPoR+MkqXFW6Dn9GbR36gGaldTwnyhapla4N XYHDEdETcTqm2srOM5wW+jD0p+Id9/Lmd5Ld5J2zyARCYn/EG/SDCbiaVqOxEnc= =tv1B -----END PGP SIGNATURE----- Merge tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: "This pull request is later than I'd have liked because I was waiting for some performance data to help finally justify sending the long-standing dm-crypt cpu scalability improvements upstream. Unfortunately we came up short, so those dm-crypt changes will continue to wait, but it seems we're not far off. . Add dm_accept_partial_bio interface to DM core to allow DM targets to only process a portion of a bio, the remainder being sent in the next bio. This enables the old dm snapshot-origin target to only split write bios on chunk boundaries, read bios are now sent to the origin device unchanged. . Add DM core support for disabling WRITE SAME if the underlying SCSI layer disables it due to command failure. . Reduce lock contention in DM's bio-prison. . A few small cleanups and fixes to dm-thin and dm-era" * tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: update discard_granularity to reflect the thin-pool blocksize dm bio prison: implement per bucket locking in the dm_bio_prison hash table dm: remove symbol export for dm_set_device_limits dm: disable WRITE SAME if it fails dm era: check for a non-NULL metadata object before closing it dm thin: return ENOSPC instead of EIO when error_if_no_space enabled dm thin: cleanup noflush_work to use a proper completion dm snapshot: do not split read bios sent to snapshot-origin target dm snapshot: allocate a per-target structure for snapshot-origin target dm: introduce dm_accept_partial_bio dm: change sector_count member in clone_info from sector_t to unsigned	2014-06-12 13:33:29 -07:00
Lukas Czerner	09869de57e	dm thin: update discard_granularity to reflect the thin-pool blocksize DM thinp already checks whether the discard_granularity of the data device is a factor of the thin-pool block size. But when using the dm-thin-pool's discard passdown support, DM thinp was not selecting the max of the underlying data device's discard_granularity and the thin-pool's block size. Update set_discard_limits() to set discard_granularity to the max of these values. This enables blkdev_issue_discard() to properly align the discards that are sent to the DM thin device on a full block boundary. As such each discard will now cover an entire DM thin-pool block and the block will be reclaimed. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-06-11 16:56:12 -04:00
Heinz Mauelshagen	adcc44472b	dm bio prison: implement per bucket locking in the dm_bio_prison hash table Split the single per bio-prison lock by using per bucket locking. Per bucket locking benefits both dm-thin and dm-cache targets by reducing bio-prison lock contention. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-11 16:48:54 -04:00
Linus Torvalds	8d0304e69d	Assorted md fixes for 3.16 Mostly performance improvements with a few corner-case bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUAU5fevznsnt1WYoG5AQKzNxAAgjZPg6LZgl0wEHJi09ykqLK9dSkglz2B RY5IEWPEMcsGJQbSX7fEME5WSHkqKBIHx43xmdzijPCHyWLnF4vQRubNAi/Wo8PC yfJyI5hMrG8+0rNKnmXWeG+NLGj7l7j+wkExe27VBlu2+6ATvIUCa34kjOWFDJJx xAN9/t22XGly776SiuLXTpMhA0pRWH4sqijJhvM5EqPlsEUjGxvQU/URwUPYzhYr iCbvY64uKB1Crh+vn7yP1IUWc77aDOoUPIxO50m0GXOQFU/wbQta1NAFmMeu8tIo 3QRlGzP+jRJFBV7jLii4nHTMSVxBmD2zMppxVtDoq/egWhyZlJhh2Kd5sUKo2r9V 8bGQm7e1JlsDsRMVXxSlx3TebZDfJhtnFNhYb6JPlpWM5EYjCzXXfJUl3/6LfKX6 BKJ800xvabr5TFkVubF9H80LTQxaO+pQoS6PSs8hnwneqtybdbt9o7V59yluMGao IwaG/BY4jlY+76YnK32Kbr1eofUNYXBLg45mgSkrhUVqv9HLHe0CDnYAzYs68aur zG8a1rhA9IJtBmWwql2Gm8lKOnDFM6onvSmaYntsabJ++eahaNDpL5ck+R4kO8mJ pyI1N6GQDjtLdONk4NmH36P+CuCZje6KwRbVXvsJIBA1042tkZfU0zdqgt5QhFZj Ma9ZCn9wZFo= =MaM/ -----END PGP SIGNATURE----- Merge tag 'md/3.16' of git://neil.brown.name/md Pull md updates from Neil Brown: "Assorted md fixes for 3.16 Mostly performance improvements with a few corner-case bug fixes" * tag 'md/3.16' of git://neil.brown.name/md: raid5: speedup sync_request processing md/raid5: deadlock between retry_aligned_read with barrier io raid5: add an option to avoid copy data from bio to stripe cache md/bitmap: remove confusing code from filemap_get_page. raid5: avoid release list until last reference of the stripe md: md_clear_badblocks should return an error code on failure. md/raid56: Don't perform reads to support writes until stripe is ready. md: refuse to change shape of array if it is active but read-only	2014-06-11 08:33:41 -07:00
Eivind Sarto	053f5b6525	raid5: speedup sync_request processing The raid5 sync_request() processing calls handle_stripe() within the context of the resync-thread. The resync-thread issues the first set of read requests and this adds execution latency and slows down the scheduling of the next sync_request(). The current rebuild/resync speed of raid5 is not much faster than what rotational HDDs can sustain. Testing the following patch on a 6-drive array, I can increase the rebuild speed from 100 MB/s to 175 MB/s. The sync_request() now just sets STRIPE_HANDLE and releases the stripe. This creates some more parallelism between the resync-thread and raid5 kernel daemon. Signed-off-by: Eivind Sarto <esarto@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-06-10 11:02:01 +10:00
Linus Torvalds	3f17ea6dea	Merge branch 'next' (accumulated 3.16 merge window patches) into master Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc\|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...	2014-06-08 11:31:16 -07:00
hui jiao	2844dc32ea	md/raid5: deadlock between retry_aligned_read with barrier io A chunk aligned read increases counter active_aligned_reads and decreases it after sub-device handle it successfully. But when a read error occurs, the read redispatched by raid5d, and the active_aligned_reads will not be decreased until we can grab a stripe head in retry_aligned_read. Now suppose, a barrier io comes, set conf->quiesce to 2, and wait until both active_stripes and active_aligned_reads are zero. The retried chunk aligned read gets stuck at get_active_stripe waiting until conf->quiesce becomes 0. Retry_aligned_read and barrier io are waiting each other now. One possible solution is that we ignore conf->quiesce, let the retried aligned read finish. I reproduced this deadlock and test this patch on centos6.0 Signed-off-by: NeilBrown <neilb@suse.de>	2014-06-05 17:18:19 +10:00
Mike Snitzer	11f0431be2	dm: remove symbol export for dm_set_device_limits There is no need for code other than DM core to use dm_set_device_limits so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-04 09:46:34 -04:00
Mike Snitzer	7eee4ae2db	dm: disable WRITE SAME if it fails Add DM core support for disabling WRITE SAME on first failure to both request-based and bio-based targets. The need to disable WRITE SAME stems from SCSI enabling it by default but then disabling it when it fails. When SCSI does this it returns "permanent target failure, do not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME on failure if the returned error is -EREMOTEIO. Commit `f84cb8a4` ("dm mpath: disable WRITE SAME if it fails") implemented multipath specific disabling of WRITE SAME if it fails. However, as that commit detailed, the multipath-only solution doesn't go far enough if bio-based DM targets are stacked ontop of the request-based dm-multipath target (as is commonly done using dm-linear to support partitions on multipath devices, via kpartx). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Alex Chen <alex.chen@huawei.com>	2014-06-04 09:45:52 -04:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Joe Thornber	989f26f5ad	dm era: check for a non-NULL metadata object before closing it era_ctr() may call era_destroy() before era->md is initialized so era_destory() must only close the metadata object if it is not NULL. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Naohiro Aota <naota@elisp.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.15+	2014-06-03 13:44:08 -04:00
Mike Snitzer	af91805a49	dm thin: return ENOSPC instead of EIO when error_if_no_space enabled Update the DM thin provisioning target's allocation failure error to be consistent with commit `a9d6ceb8` ("[SCSI] return ENOSPC on thin provisioning failure"). The DM thin target now returns -ENOSPC rather than -EIO when block allocation fails due to the pool being out of data space (and the 'error_if_no_space' thin-pool feature is enabled). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-By: Joe Thornber <ejt@redhat.com>	2014-06-03 13:44:08 -04:00
Joe Thornber	e7a3e871d8	dm thin: cleanup noflush_work to use a proper completion Factor out a pool_work interface that noflush_work makes use of to wait for and complete work items (in terms of a proper completion struct). Allows discontinuing the use of a custom completion in terms of atomic_t and wait_event. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	298eaa89b0	dm snapshot: do not split read bios sent to snapshot-origin target Change the snapshot-origin target so that only write bios are split on chunk boundary. Read bios are passed unchanged to the underlying device, so they don't have to be split. Later, we could change the target so that it accepts a larger write bio if it spans an area that is completely covered by snapshot exceptions. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	599cdf3bfb	dm snapshot: allocate a per-target structure for snapshot-origin target Allocate a per-target dm_origin structure. This is a prerequisite for the next commit ("dm snapshot: do not split read bios sent to snapshot-origin target") which adds a new member to this structure. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	1dd40c3ecd	dm: introduce dm_accept_partial_bio The function dm_accept_partial_bio allows the target to specify how many sectors of the current bio it will process. If the target only wants to accept part of the bio, it calls dm_accept_partial_bio and the DM core sends the rest of the data in next bio. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:06 -04:00
Mikulas Patocka	e0d6609a5f	dm: change sector_count member in clone_info from sector_t to unsigned It is impossible to create bios with 2^23 or more sectors (the size is stored as a 32-bit byte count in the bio). So we convert some sector_t values to unsigned integers. This is needed for the next commit ("dm: introduce dm_accept_partial_bio") that replaces integer value arguments with pointers, so the size of the integer must match. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:06 -04:00
Linus Torvalds	ca755175f2	Two md bugfixes for possible corruption when restarting reshape If a raid5/6 reshape is restarted (After stopping and re-assembling the array) and the array is marked read-only (or read-auto), then the reshape will appear to complete immediately, without actually moving anything around. This can result in corruption. There are two patches which do much the same thing in different places. They are separate because one is an older bug and so can be applied to more -stable kernels. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIUAwUAU40KTDnsnt1WYoG5AQJcaA/1GkoZit6LqLjiIQmsK9Ci/4TI+sNqYaSB 9SleSjWt+bcNCRY4sS3Wv0H580LmkoRR24wdei+mukoFa+bpBBs6PodPMABAVsnL VxlnUX+P4Ef77s2zJ8B5wCY3ftmecaQL3TdZf10+hIITacXSp7JmsLJXm3DW+Jvq DZsxJRBEQfsz5obZAZXnvPAcTSkqMT4QQ13nIEmaYEz+AYVn6Tcf8xwDBOcZM4u9 Gdi6BHNaY6RjSU1gsVblPYmWQyqqdgCJ6UEV/KYyY9rtFyozkvJ0SDWcu/kRA74A uydN5U6iVqJatY9l9eK2tV7GQkN+o+MWIA0JocTRZe67ihE4tWxiLRn/7fZdLVsX TV6zYar0M/ZSn3XioGi4hQ0tWDPpq/aCzCAk5JQpywgBmoaMqqh8rttwdCkWvK6P TNnaVfo3r9AMJY8MVm8in/efEhY6jUa3q2oDqCEKjuL916v9ODsxXloqTlbEy2KC NrKNLCZA2subbzPa3T8u4aKRBzl0xSBSig8ecrufSpDC1I0G+Mbuc8wrDzjAnI3N +fbQCxxRR0akcleZrFZD67avOa5/DsQqWJbcW1D5VCekJoZcgdz5CGJz/bNl+0i6 bwrvNWi6q1X2P4Nt2BBhk771xzNiUlufsI0x7SFIJxpDiGlxINkluXvnEQKFSzhr uYSrvTCQwg== =cTEe -----END PGP SIGNATURE----- Merge tag 'md/3.15-fixes' of git://neil.brown.name/md Pull two md bugfixes from Neil Brown: "Two md bugfixes for possible corruption when restarting reshape If a raid5/6 reshape is restarted (After stopping and re-assembling the array) and the array is marked read-only (or read-auto), then the reshape will appear to complete immediately, without actually moving anything around. This can result in corruption. There are two patches which do much the same thing in different places. They are separate because one is an older bug and so can be applied to more -stable kernels" * tag 'md/3.15-fixes' of git://neil.brown.name/md: md: always set MD_RECOVERY_INTR when interrupting a reshape thread. md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".	2014-06-02 17:04:37 -07:00
Linus Torvalds	681a289548	Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next Pull block core updates from Jens Axboe: "It's a big(ish) round this time, lots of development effort has gone into blk-mq in the last 3 months. Generally we're heading to where 3.16 will be a feature complete and performant blk-mq. scsi-mq is progressing nicely and will hopefully be in 3.17. A nvme port is in progress, and the Micron pci-e flash driver, mtip32xx, is converted and will be sent in with the driver pull request for 3.16. This pull request contains: - Lots of prep and support patches for scsi-mq have been integrated. All from Christoph. - API and code cleanups for blk-mq from Christoph. - Lots of good corner case and error handling cleanup fixes for blk-mq from Ming Lei. - A flew of blk-mq updates from me: * Provide strict mappings so that the driver can rely on the CPU to queue mapping. This enables optimizations in the driver. * Provided a bitmap tagging instead of percpu_ida, which never really worked well for blk-mq. percpu_ida relies on the fact that we have a lot more tags available than we really need, it fails miserably for cases where we exhaust (or are close to exhausting) the tag space. * Provide sane support for shared tag maps, as utilized by scsi-mq * Various fixes for IO timeouts. * API cleanups, and lots of perf tweaks and optimizations. - Remove 'buffer' from struct request. This is ancient code, from when requests were always virtually mapped. Kill it, to reclaim some space in struct request. From me. - Remove 'magic' from blk_plug. Since we store these on the stack and since we've never caught any actual bugs with this, lets just get rid of it. From me. - Only call part_in_flight() once for IO completion, as includes two atomic reads. Hopefully we'll get a better implementation soon, as the part IO stats are now one of the more expensive parts of doing IO on blk-mq. From me. - File migration of block code from {mm,fs}/ to block/. This includes bio.c, bio-integrity.c, bounce.c, and ioprio.c. From me, from a discussion on lkml. That should describe the meat of the pull request. Also has various little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong, Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam Bradshaw" * 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits) blk-mq: push IPI or local end_io decision to __blk_mq_complete_request() blk-mq: remember to start timeout handler for direct queue block: ensure that the timer is always added blk-mq: blk_mq_unregister_hctx() can be static blk-mq: make the sysfs mq/ layout reflect current mappings blk-mq: blk_mq_tag_to_rq should handle flush request block: remove dead code in scsi_ioctl:blk_verify_command blk-mq: request initialization optimizations block: add queue flag for disabling SG merging block: remove 'magic' from struct blk_plug blk-mq: remove alloc_hctx and free_hctx methods blk-mq: add file comments and update copyright notices blk-mq: remove blk_mq_alloc_request_pinned blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request blk-mq: remove blk_mq_wait_for_tags blk-mq: initialize request in __blk_mq_alloc_request blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request blk-mq: add helper to insert requests from irq context blk-mq: remove stale comment for blk_mq_complete_request() blk-mq: allow non-softirq completions ...	2014-06-02 09:29:34 -07:00
Linus Torvalds	24e19d279f	A dm-cache stable fix to split discards on cache block boundaries because dm-cache cannot yet handle discards that span cache blocks. Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1. Add a 'no_space_timeout' control to dm-thinp to restore the ability to queue IO indefinitely when no data space is available. This fixes a change in behavior that was introduced in -rc6 where the timeout couldn't be disabled. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTh9QAAAoJEMUj8QotnQNaNpYH/j07FeH8YlxXRcFzDi7xRVtx luK5b9fLLlmPwW2eKSrvpI8Le4jwDvLwBmpEvN9/wyPiRDSUnYIyYdoV7RJXX2LT wqXatObb84fwQBJ6/q8o2YMzU5ODa5XT6KGEZyD4cHdAZ9FZSwfgqhslyrBJDkSN JBFfkXu066qw8cuYA6KFv4DwBf5eHAt5AjV/QPGd5zGXwETHLZ4ypgpwYHAGbdXa MgfHetwtEnJYvVQex/e+9xC5IDc4/BEAhZq4n3YmEJjNq8EbX15udHmCX7S2M5pT +9tNjUMz4j9BhoC9F8ntRz0pxWZtJK9hGojO4xoXqOCOHgp1xLQd/tHrFZS0v8E= =u5Xd -----END PGP SIGNATURE----- Merge tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device-mapper fixes from Mike Snitzer: "A dm-cache stable fix to split discards on cache block boundaries because dm-cache cannot yet handle discards that span cache blocks. Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1. Add a 'no_space_timeout' control to dm-thinp to restore the ability to queue IO indefinitely when no data space is available. This fixes a change in behavior that was introduced in -rc6 where the timeout couldn't be disabled" * tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: really fix lockdep warning dm cache: always split discards on cache block boundaries dm thin: add 'no_space_timeout' dm-thin-pool module param	2014-05-30 12:04:56 -07:00
Shaohua Li	d592a99691	raid5: add an option to avoid copy data from bio to stripe cache The stripe cache has two goals: 1. cache data, so next time if data can be found in stripe cache, disk access can be avoided. 2. stable data. data is copied from bio to stripe cache and calculated parity. data written to disk is from stripe cache, so if upper layer changes bio data, data written to disk isn't impacted. In my environment, I can guarantee 2 will not happen. And BDI_CAP_STABLE_WRITES can guarantee 2 too. For 1, it's not common too. block plug mechanism will dispatch a bunch of sequentail small requests together. And since I'm using SSD, I'm using small chunk size. It's rare case stripe cache is really useful. So I'd like to avoid the copy from bio to stripe cache and it's very helpful for performance. In my 1M randwrite tests, avoid the copy can increase the performance more than 30%. Of course, this shouldn't be enabled by default. It's reported enabling BDI_CAP_STABLE_WRITES can harm some workloads before, so I added an option to control it. Neilb: changed BUG_ON to WARN_ON Removed some assignments from raid5_build_block which are now not needed. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:47 +10:00
NeilBrown	f2e06c5884	md/bitmap: remove confusing code from filemap_get_page. file_page_index(store, 0) is always 0. This is because the bitmap sb, at 256 bytes, is always less than one page. So subtracting it has no effect and the code should be removed. Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:47 +10:00
Eivind Sarto	cf170f3fa4	raid5: avoid release list until last reference of the stripe The (lockless) release_list reduces lock contention, but there is excessive queueing and dequeuing of stripes on this list. A stripe will currently be queued on the release_list with a stripe reference count > 1. This can cause the raid5 kernel thread(s) to dequeue the stripe and decrement the refcount without doing any other useful processing of the stripe. The are two cases when the stripe can be put on the release_list multiple times before it is actually handled by the kernel thread(s). 1) make_request() activates the stripe processing in 4k increments. When a write request is large enough to span multiple chunks of a stripe_head, the first 4k chunk adds the stripe to the plug list. The next 4k chunk that is processed for the same stripe puts the stripe on the release_list with a refcount=2. This can cause the kernel thread to process and decrement the stripe before the stripe us unplugged, which again will put it back on the release_list. 2) Whenever IO is scheduled on a stripe (pre-read and/or write), the stripe refcount is set to the number of active IO (for each chunk). The stripe is released as each IO complete, and can be queued and dequeued multiple times on the release_list, until its refcount finally reached zero. This simple patch will ensure a stripe is only queued on the release_list when its refcount=1 and is ready to be handled by the kernel thread(s). I added some instrumentation to raid5 and counted the number of times striped were queued on the release_list for a variety of write IO sizes. Without this patch the number of times stripes got queued on the release_list was 100-500% higher than with the patch. The excess queuing will increase with the IO size. The patch also improved throughput by 5-10%. Signed-off-by: Eivind Sarto <esarto@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:46 +10:00
NeilBrown	8b32bf5e37	md: md_clear_badblocks should return an error code on failure. Julia Lawall and coccinelle report that md_clear_badblocks always returns 0, despite appearing to have an error path. The error path really should return an error code. ENOSPC is reasonably appropriate. Reported-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:46 +10:00
NeilBrown	67f455486d	md/raid56: Don't perform reads to support writes until stripe is ready. If it is found that we need to pre-read some blocks before a write can succeed, we normally set STRIPE_DELAYED and don't actually perform the read until STRIPE_PREREAD_ACTIVE subsequently gets set. However for a degraded RAID6 we currently perform the reads as soon as we see that a write is pending. This significantly hurts throughput. So: - when handle_stripe_dirtying find a block that it wants on a device that is failed, set STRIPE_DELAY, instead of doing nothing, and - when fetch_block detects that a read might be required to satisfy a write, only perform the read if STRIPE_PREREAD_ACTIVE is set, and if we would actually need to read something to complete the write. This also helps RAID5, though less often as RAID5 supports a read-modify-write cycle. For RAID5 the read is performed too early only if the write is not a full 4K aligned write (i.e. no an R5_OVERWRITE). Also clean up a couple of horrible bits of formatting. Reported-by: Patrik Horník <patrik@dsl.sk> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:46 +10:00
NeilBrown	bd8839e03b	md: refuse to change shape of array if it is active but read-only read-only arrays should not be changed. This includes changing the level, layout, size, or number of devices. So reject those changes for readonly arrays. Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:30 +10:00
NeilBrown	2ac295a544	md: always set MD_RECOVERY_INTR when interrupting a reshape thread. Commit `8313b8e57f` md: fix problem when adding device to read-only array with bitmap. added a called to md_reap_sync_thread() which cause a reshape thread to be interrupted (in particular, it could cause md_thread() to never even call md_do_sync()). However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not know that the reshape didn't complete. This only happens when mddev->ro is set and normally reshape threads don't run in that situation. But raid5 and raid10 can start a reshape thread during "run" is the array is in the middle of a reshape. They do this even if ->ro is set. So it is best to set MD_RECOVERY_INTR before abortingg the sync thread, just in case. Though it rare for this to trigger a problem it can cause data corruption because the reshape isn't finished properly. So it is suitable for any stable which the offending commit was applied to. (3.2 or later) Fixes: `8313b8e57f` Cc: stable@vger.kernel.org (3.2+) Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-29 16:59:06 +10:00
NeilBrown	3991b31ea0	md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync". If mddev->ro is set, md_to_sync will (correctly) abort. However in that case MD_RECOVERY_INTR isn't set. If a RESHAPE had been requested, then ->finish_reshape() will be called and it will think the reshape was successful even though nothing happened. Normally a resync will not be requested if ->ro is set, but if an array is stopped while a reshape is on-going, then when the array is started, the reshape will be restarted. If the array is also set read-only at this point, the reshape will instantly appear to success, resulting in data corruption. Consequently, this patch is suitable for any -stable kernel. Cc: stable@vger.kernel.org (any) Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-28 13:39:39 +10:00
Hannes Reinecke	63d832c301	dm mpath: really fix lockdep warning lockdep complains about a circular locking. And indeed, we need to release the lock before calling dm_table_run_md_queue_async(). As such, commit `4cdd2ad` ("dm mpath: fix lock order inconsistency in multipath_ioctl") must also be reverted in addition to fixing the lock order in the other dm_table_run_md_queue_async() callers. Reported-by: Bart van Assche <bvanassche@acm.org> Tested-by: Bart van Assche <bvanassche@acm.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-05-27 10:46:01 -04:00
Heinz Mauelshagen	f1daa838e8	dm cache: always split discards on cache block boundaries The DM cache target cannot cope with discards that span multiple cache blocks, so each discard bio that spans more than one cache block must get split by the DM core. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.9+	2014-05-27 10:33:05 -04:00
Linus Torvalds	23de4a7af7	A dm-crypt fix for a cpu hotplug crash that switches from using per-cpu data to a mempool allocation (which offers allocation with cpu locality, and there is no inter-cpu communication on slab allocation). A couple dm-thinp stable fixes to address "out-of-data-space" issues. A dm-multipath fix for a LOCKDEP warning introduced in 3.15-rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTdiI7AAoJEMUj8QotnQNa1fMIAOSGppH4U/VuT1+UMDyabUba eXsK8xBUTIDSBuTJ+ljkE5fyvXpn/wvA+b1hTKLhzVkUZ1pCY4pIw1pwpVcw89Bb BhktFWRYvcv/MAARDHiMGW5yc6xP319Qm04XN3xbMHx71gxGRwpzb191LSO5S2VR 0rjXvZZt7WPJe/QPOFUrqyoP7t59LH9hu2/OH/Ic9o5/D0WxbqPEP6X8iJyIs32u lNvIQ5r5f3xNzt0VDvEq3sxR3qYhQvLPDMdp0YR67c87fKfsKQj4pkvXltXhY5bM wBHFE+NBl6MPzTXNCT9i+p360GXE7B/lY9boochAyE/UEztRq1+oqJOa/dDaBCo= =0vk2 -----END PGP SIGNATURE----- Merge tag 'dm-3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "A dm-crypt fix for a cpu hotplug crash that switches from using per-cpu data to a mempool allocation (which offers allocation with cpu locality, and there is no inter-cpu communication on slab allocation). A couple dm-thinp stable fixes to address "out-of-data-space" issues. A dm-multipath fix for a LOCKDEP warning introduced in 3.15-rc1" * tag 'dm-3.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: fix lock order inconsistency in multipath_ioctl dm thin: add timeout to stop out-of-data-space mode holding IO forever dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode dm crypt: fix cpu hotplug crash by removing per-cpu structure	2014-05-21 17:57:31 +09:00
Mike Snitzer	80c578930c	dm thin: add 'no_space_timeout' dm-thin-pool module param Commit `85ad643b` ("dm thin: add timeout to stop out-of-data-space mode holding IO forever") introduced a fixed 60 second timeout. Users may want to either disable or modify this timeout. Allow the out-of-data-space timeout to be configured using the 'no_space_timeout' dm-thin-pool module param. Setting it to 0 will disable the timeout, resulting in IO being queued until more data space is added to the thin-pool. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+	2014-05-20 14:30:36 -04:00
Mike Snitzer	4cdd2ad780	dm mpath: fix lock order inconsistency in multipath_ioctl Commit `3e9f1be1b4` ("dm mpath: remove process_queued_ios()") did not consistently take the multipath device's spinlock (m->lock) before calling dm_table_run_md_queue_async() -- which takes the q->queue_lock. Found with code inspection using hint from reported lockdep warning. Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-05-14 16:12:17 -04:00
Joe Thornber	85ad643b7e	dm thin: add timeout to stop out-of-data-space mode holding IO forever If the pool runs out of data space, dm-thin can be configured to either error IOs that would trigger provisioning, or hold those IOs until the pool is resized. Unfortunately, holding IOs until the pool is resized can result in a cascade of tasks hitting the hung_task_timeout, which may render the system unavailable. Add a fixed timeout so IOs can only be held for a maximum of 60 seconds. If LVM is going to resize a thin-pool that is out of data space it needs to be prompt about it. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+	2014-05-14 16:11:37 -04:00
Joe Thornber	8d07e8a5f5	dm thin: allow metadata commit if pool is in PM_OUT_OF_DATA_SPACE mode Commit `3e1a0699` ("dm thin: fix out of data space handling") introduced a regression in the metadata commit() method by returning an error if the pool is in PM_OUT_OF_DATA_SPACE mode. This oversight caused a thin device to return errors even if the default queue_if_no_space ENOSPC handling mode is used. Fix commit() to only fail if pool is in PM_READ_ONLY or PM_FAIL mode. Reported-by: qindehua@163.com Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+	2014-05-14 16:11:36 -04:00
Mikulas Patocka	610f2de355	dm crypt: fix cpu hotplug crash by removing per-cpu structure The DM crypt target used per-cpu structures to hold pointers to a ablkcipher_request structure. The code assumed that the work item keeps executing on a single CPU, so it didn't use synchronization when accessing this structure. If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online, the work item could be moved to another CPU. This causes dm-crypt crashes, like the following, because the code starts using an incorrect ablkcipher_request: smpboot: CPU 7 is now offline BUG: unable to handle kernel NULL pointer dereference at 0000000000000130 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt] ... Call Trace: [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt] [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0 [<ffffffff81052a28>] ? process_one_work+0x168/0x470 [<ffffffff8105366b>] ? worker_thread+0x10b/0x390 [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290 [<ffffffff81058d9f>] ? kthread+0xaf/0xc0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 Fix this bug by removing the per-cpu definition. The structure ablkcipher_request is accessed via a pointer from convert_context. Consequently, if the work item is rescheduled to a different CPU, the thread still uses the same ablkcipher_request. This change may undermine performance improvements intended by commit `c0297721` ("dm crypt: scale to multiple cpus") on select hardware. In practice no performance difference was observed on recent hardware. But regardless, correctness is more important than performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-05-14 16:11:35 -04:00
Linus Torvalds	2ddb5998d0	Two bugfixes for md in 3.15 Both tagged for -stable. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUAU3AfITnsnt1WYoG5AQJiqQ/+Pk4n3AQqqtfjPaR5EWmAVwgLgvy7AX8z yG9UwN9AXqd1IkgaE+PzUwZHEUR1/fYeF52c5cakrHCvluHgxakUX6/T/f9dO8Ht rXK4Q82aTfm+5lfUsZfOL8aeY9ZheXXo97vbVAfegdIDNC6Il2nktHj6AfBfQWlQ r0hm3Vz1rgXxXVam7SLlbxa71JUxltlSpLqUoN487iF/hSJx5D04NiLFT8KJwtUh UtMiyNsUpMJHWfYZjTsX4+o9psLZB2fE+WXJvYy5jB3C/Yy3FB0x38fVTC7+ozej F0J8bhG/6oO0/0gieW7EXTDWNLlCtG8Z/rUi/Hre+7Lps3vp7V65q/uB1B2VnNjn TRzbEaCoWdzMjamp5btSzN64MJgvCPRn1TvPwcm+kSDk/IpslYMllwXK7H+UutXZ GEEw3TVz1jWk7JKxai9raApKtXB7yDpiKREFMjhowBb0rM+VL4/3gvzSpPyVbJxj 4TTj9fUqsXWMG4HzKuyxXlV51hAbcaVnYirf0JrkjzzYkl0d/oBAADQtaApD+NX2 thlfYUW4tjssmMB+X5ok5Zp4A0TV31a1bEmZ8CE63i/IHCf5F8BHsHpyO4P9ITDX zNEo1lKuIbhn5oVHDoLZjNgIPGi2+lq6jvq8+0POKyEBr++Nrbld2u0GB8Q3/SjE LAhU+0iUY6A= =9QhO -----END PGP SIGNATURE----- Merge tag 'md/3.15-fixes' of git://neil.brown.name/md Pull md bugfixes from Neil Brown: "Two bugfixes for md in 3.15 Both tagged for -stable" * tag 'md/3.15-fixes' of git://neil.brown.name/md: md: avoid possible spinning md thread at shutdown. md/raid10: call wait_barrier() for each request submitted.	2014-05-13 11:11:48 +09:00
NeilBrown	0f62fb220a	md: avoid possible spinning md thread at shutdown. If an md array with externally managed metadata (e.g. DDF or IMSM) is in use, then we should not set safemode==2 at shutdown because: 1/ this is ineffective: user-space need to be involved in any 'safemode' handling, 2/ The safemode management code doesn't cope with safemode==2 on external metadata and md_check_recover enters an infinite loop. Even at shutdown, an infinite-looping process can be problematic, so this could cause shutdown to hang. Cc: stable@vger.kernel.org (any kernel) Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-06 09:49:31 +10:00
NeilBrown	cc13b1d150	md/raid10: call wait_barrier() for each request submitted. wait_barrier() includes a counter, so we must call it precisely once (unless balanced by allow_barrier()) for each request submitted. Since commit `20d0189b10` block: Introduce new bio_split() in 3.14-rc1, we don't call it for the extra requests generated when we need to split a bio. When this happens the counter goes negative, any resync/recovery will never start, and "mdadm --stop" will hang. Reported-by: Chris Murphy <lists@colorremedies.com> Fixes: `20d0189b10` Cc: stable@vger.kernel.org (3.14+) Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-05-06 09:49:26 +10:00
Linus Torvalds	54366a7fd6	A few dm-thinp fixes for changes merged in 3.15-rc1. A dm-verity fix for an immutable biovec regression that affects 3.14+. A dm-cache fix to properly quiesce when using writethrough mode (3.14+). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTY+6+AAoJEMUj8QotnQNaeC4H/35S9GZL8SVPEDS5nbQ9YdZ9 co7wAYIGswOInX9u8nq0TqcNtBMhxwwdRX9ScPxHVUTT+/lM/c7axHiMqVjZrMme SVmmAXMp2uUMAnK4BGIQs8jjeyxBCHUF/gyfC3OC+RF72Z1bDkG/xXyKsljBSzMe RP0iFvvvA1Sm7XzBJRuhZLIdJGkXFAy0ooEBICQOoudg6iDvDKCtiU+owB/x4bBh xi9b1MY2VjkobWES6fyW/atolCEpgwU4xhsLl3w534P9oFvCkLEp4GTxdFWBhepl K3usGr0t1QhmHy1hKw++WGsAkMRHocf8nIBqxxdDNWpZvOif2z+weLYbOn+TXTM= =1Yvj -----END PGP SIGNATURE----- Merge tag 'dm-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "A few dm-thinp fixes for changes merged in 3.15-rc1. A dm-verity fix for an immutable biovec regression that affects 3.14+. A dm-cache fix to properly quiesce when using writethrough mode (3.14+)" * tag 'dm-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix writethrough mode quiescing in cache_map dm thin: use INIT_WORK_ONSTACK in noflush_work to avoid ODEBUG warning dm verity: fix biovecs hash calculation regression dm thin: fix rcu_read_lock being held in code that can sleep dm thin: irqsave must always be used with the pool->lock spinlock	2014-05-02 14:14:02 -07:00
Mike Snitzer	131cd131a9	dm cache: fix writethrough mode quiescing in cache_map Commit `2ee57d5873` ("dm cache: add passthrough mode") inadvertently removed the deferred set reference that was taken in cache_map()'s writethrough mode support. Restore taking this reference. This issue was found with code inspection. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # 3.13+	2014-05-01 16:14:24 -04:00
Mike Snitzer	fbcde3d8b9	dm thin: use INIT_WORK_ONSTACK in noflush_work to avoid ODEBUG warning Use INIT_WORK_ONSTACK to silence "ODEBUG: object is on stack, but not annotated". Reported-by: Zdeněk Kabeláč <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>	2014-04-29 11:22:04 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
Linus Torvalds	23c1a60e2e	One BUG fix for md for recent commit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUAU099lznsnt1WYoG5AQJIehAAoPdK4dUZ+A2g+hYxMbXCioakAaqDZwzt nFkYMZjJSan7yugkOpd9zBNR864c/9UYAnuggimimuXZuKu0N++Y8/ztJ7FjncDk 7/R3SPF8AtTaTm0BJ9mzK+/sfBxLRDl1v4Z+ZUAzweH6TTTLzKinuSgIXFObacV4 DjN2Cf1xZHHmUIXK3kzE0sNC+C8nVXlvFz4gdiCAeHloXMp78a//TucBaN9lpE4z +h3FN4++0w+2aFgURdddnmIhY6v76m1fWF7Q9qcbGcnXDnpAxis5CgprBcKGwNAa o0bbVl1MNWlcVxO1H1wafbxrXTQZwE71UE47ssXl6vqePUpM1tKVm5ZP2wFbIlTN kwIRne2oWmhsBw177K6WUohaY28wHohi+ukt6UzfX81Zm6HAnXnB5LLneEizRTO/ WBBftzoObiKJ758HIbPs6s300DoSw8CPs/CmdLO9ycxo1m2p2tmDz0802W5k2mO/ pFSxDGL43c91cnHaoJPAgrWOHf45Lo8IKxfUZDLVliuhgvNKLP+CSyMCLAiV7Kxc aeuI1a9fcmjc/+rRSpC62itzk9tQeinI9TR2iBZJUnQVnTfFoPU889tED6jkElbP E7A+XBHbuOiRisjynX4RebFb2t23ONSnRLd1/Ce3dkVnAB75v2Zbh0xZ1usHlrH4 3uPiETq2KiE= =CxEv -----END PGP SIGNATURE----- Merge tag '3.15-fixes' of git://neil.brown.name/md Pull md bugfix from Neil Brown: "One BUG fix for md for recent commit" * tag '3.15-fixes' of git://neil.brown.name/md: raid5: fix a race of stripe count check	2014-04-17 10:51:01 -07:00
Shaohua Li	c7a6d35e46	raid5: fix a race of stripe count check I hit another BUG_ON with `e240c1839d`. In __get_priority_stripe(), stripe count equals to 0 initially. Between atomic_inc and BUG_ON, get_active_stripe() finds the stripe. So the stripe count isn't 1 any more. V2: keeps the BUG_ON suggested by Neil. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-04-17 17:05:28 +10:00
Jens Axboe	b4f42e2831	block: remove struct request buffer member This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-04-15 14:03:02 -06:00
Milan Broz	3a7745215e	dm verity: fix biovecs hash calculation regression Commit `003b5c5719` ("block: Convert drivers to immutable biovecs") incorrectly converted biovec iteration in dm-verity to always calculate the hash from a full biovec, but the function only needs to calculate the hash from part of the biovec (up to the calculated "todo" value). Fix this issue by limiting hash input to only the requested data size. This problem was identified using the cryptsetup regression test for veritysetup (verity-compat-test). Signed-off-by: Milan Broz <gmazyland@gmail.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.14+	2014-04-15 12:19:24 -04:00
Linus Torvalds	7f87307818	Just a few md patches for the 3.15 merge window. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIVAwUAU0h5fjnsnt1WYoG5AQKmkw/8DUn0vV4q5UbLp0m2Yy6o6EOxwiSJUH/p 6EEUmSwyXou75w9OWOJKDX2lI7z1yAtzqiuCQ19ekD5lA6gAosja+D0jKjv0SA01 rQm2rMjnwOIxZUIRx/7Z+w/H1ZxjIAc8uxOcCBP6DOynWt/YAyVz1SzLLtwCxELL N9vjgb+4lVt0E9bBYvVRNiJmtVXpDcmn1YySd6Dqyj9t+Mmysnv8QuIStrT3CE7k apss3ew6bBbtbiJHuCno/Q4FDWVAhUH+9GMvksdajw8QW52oHV+RRBB5IpCU6hOx OKCT4MVdzmTgi6GRhSr86Dt+KMOLWZmbx7pK7aRQPiL6uFNhqAlJDb/u0xfaHohG DiRclZBbsHkEpejHaZcJCkyKFHQTEiia3JVk426FAhtiK1qIBuyxEc65RmKf6dsA 1KlZVeclD3wYWKG4hWk/0W3qIPOWBMll+Ely5Zg6s2X3gGy9u5TU+tUsfJaL1aDU NOY+5D0+hg7o21kK7WgTaP2upexC/iaBrVrdlasM2KYXJVDrsfCAQr1/BwTl4qLq Lm/OkIg+WtrQ95RvsI85Hm4PJVxBd1HeyDlKNCcz47kc3Xxqabeq8KnwERyOh4hU U4EmAeCZmSGOOETIWQQxlIn8XdM1+dF4olUH9viEAXrQfGgUyrg6Vcc7BOdTTZCY 8ek3CWG3TwE= =kHl3 -----END PGP SIGNATURE----- Merge tag 'md/3.15' of git://neil.brown.name/md Pull md updates from Neil Brown: "Just a few md patches for the 3.15 merge window. Not much happening in md/raid at the moment. Just a few bug fixes (one for -stable) and a couple of performance tweaks" * tag 'md/3.15' of git://neil.brown.name/md: raid5: get_active_stripe avoids device_lock raid5: make_request does less prepare wait md: avoid oops on unload if some process is in poll or select. md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails. md/bitmap: don't abuse i_writecount for bitmap files.	2014-04-11 17:20:38 -07:00
Shaohua Li	e240c1839d	raid5: get_active_stripe avoids device_lock For sequential workload (or request size big workload), get_active_stripe can find cached stripe. In this case, we always hold device_lock, which exposes a lot of lock contention for such workload. If stripe count isn't 0, we don't need hold the lock actually, since we just increase its count. And this is the hot code path for such workload. Unfortunately we must delete the BUG_ON. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-04-09 14:42:42 +10:00
Shaohua Li	27c0f68f07	raid5: make_request does less prepare wait In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of contention for sequential workload (or big request size workload). For such workload, each bio includes several stripes. So we can just do prepare_to_wait/finish_wait once for the whold bio instead of every stripe. This reduces the lock contention completely for such workload. Random workload might have the similar lock contention too, but I didn't see it yet, maybe because my stroage is still not fast enough. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-04-09 14:42:38 +10:00

1 2 3 4 5 ...

3274 Commits