linux/drivers/md
Shaohua Li 566c09c534 raid5: relieve lock contention in get_active_stripe()
get_active_stripe() is the last place we have lock contention. It has two
paths. One is stripe isn't found and new stripe is allocated, the other is
stripe is found.

The first path basically calls __find_stripe and init_stripe. It accesses
conf->generation, conf->previous_raid_disks, conf->raid_disks,
conf->prev_chunk_sectors, conf->chunk_sectors, conf->max_degraded,
conf->prev_algo, conf->algorithm, the stripe_hashtbl and inactive_list. Except
stripe_hashtbl and inactive_list, other fields are changed very rarely.

With this patch, we split inactive_list and add new hash locks. Each free
stripe belongs to a specific inactive list. Which inactive list is determined
by stripe's lock_hash. Note, even a stripe hasn't a sector assigned, it has a
lock_hash assigned. Stripe's inactive list is protected by a hash lock, which
is determined by it's lock_hash too. The lock_hash is derivied from current
stripe_hashtbl hash, which guarantees any stripe_hashtbl list will be assigned
to a specific lock_hash, so we can use new hash lock to protect stripe_hashtbl
list too. The goal of the new hash locks introduced is we can only use the new
locks in the first path of get_active_stripe(). Since we have several hash
locks, lock contention is relieved significantly.

The first path of get_active_stripe() accesses other fields, since they are
changed rarely, changing them now need take conf->device_lock and all hash
locks. For a slow path, this isn't a problem.

If we need lock device_lock and hash lock, we always lock hash lock first. The
tricky part is release_stripe and friends. We need take device_lock first.
Neil's suggestion is we put inactive stripes to a temporary list and readd it
to inactive_list after device_lock is released. In this way, we add stripes to
temporary list with device_lock hold and remove stripes from the list with hash
lock hold. So we don't allow concurrent access to the temporary list, which
means we need allocate temporary list for all participants of release_stripe.

One downside is free stripes are maintained in their inactive list, they can't
across between the lists. By default, we have total 256 stripes and 8 lists, so
each list will have 32 stripes. It's possible one list has free stripe but
other list hasn't. The chance should be rare because stripes allocation are
even distributed. And we can always allocate more stripes for cache, several
mega bytes memory isn't a big deal.

This completely removes the lock contention of the first path of
get_active_stripe(). It slows down the second code path a little bit though
because we now need takes two locks, but since the hash lock isn't contended,
the overhead should be quite small (several atomic instructions). The second
path of get_active_stripe() (basically sequential write or big request size
randwrite) still has lock contentions.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-11-14 15:20:58 +11:00
..
bcache bcache: Fixed incorrect order of arguments to bio_alloc_bioset() 2013-10-23 07:55:36 +01:00
persistent-data dm space map: optimise sm_ll_dec and sm_ll_inc 2013-08-23 09:02:14 -04:00
bitmap.c sysfs: clean up sysfs_get_dirent() 2013-09-26 15:33:18 -07:00
bitmap.h md/bitmap: record the space available for the bitmap in the superblock. 2012-05-22 13:55:34 +10:00
dm-bio-prison.c dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-prison.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-record.h
dm-bufio.c drivers: convert shrinkers to new count/scan API 2013-09-10 18:56:32 -04:00
dm-bufio.h dm bufio: prefetch 2012-03-28 18:41:29 +01:00
dm-cache-block-types.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-cache-metadata.c dm cache: replace memcpy with struct assignment 2013-05-10 14:37:18 +01:00
dm-cache-metadata.h dm cache: policy ignore hints if generated by different version 2013-03-20 17:21:28 +00:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-mq.c dm cache: avoid conflicting remove_mapping() in mq policy 2013-08-16 15:56:51 -04:00
dm-cache-policy.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy.h dm cache policy: fix description of lookup fn 2013-05-10 14:37:17 +01:00
dm-cache-target.c dm cache: eliminate holes in cache structure 2013-08-23 09:02:14 -04:00
dm-crypt.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-delay.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h
dm-flakey.c dm flakey: correct ctr alloc failure mesg 2013-07-10 23:41:17 +01:00
dm-io.c dm: add reserved_bio_based_ios module parameter 2013-09-23 10:42:24 -04:00
dm-ioctl.c dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-log-userspace-base.c
dm-log-userspace-transfer.c connector/userns: replace netlink uses of cap_raised() with capable() 2012-05-10 23:21:39 -04:00
dm-log-userspace-transfer.h
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm: add reserved_rq_based_ios module parameter 2013-09-23 10:42:24 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-raid.c MD: Remember the last sync operation that was performed 2013-06-26 12:38:24 +10:00
dm-region-hash.c dm raid1: fix crash with mirror recovery and discard 2012-07-20 14:25:03 +01:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c dm snapshot: fix data corruption 2013-10-16 03:17:47 +01:00
dm-snap-transient.c
dm-snap.c dm-snapshot: fix performance degradation due to small hash size 2013-09-20 10:36:34 -04:00
dm-stats.c dm stats: fix possible counter corruption on 32-bit systems 2013-09-18 14:41:06 -04:00
dm-stats.h dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-stripe.c dm stripe: silence a couple sparse warnings 2013-09-06 11:36:01 -04:00
dm-switch.c dm: add switch target 2013-07-10 23:41:19 +01:00
dm-sysfs.c
dm-table.c dm ioctl: increase granularity of type_lock when loading table 2013-09-05 20:46:06 -04:00
dm-target.c dm: allow error target to replace bio-based and request-based targets 2013-09-05 20:46:05 -04:00
dm-thin-metadata.c dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin-metadata.h dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin.c dm thin: do not expose non-zero discard limits if discards disabled 2013-09-23 10:42:06 -04:00
dm-uevent.c
dm-uevent.h
dm-verity.c dm verity: use __ffs and __fls 2013-07-10 23:41:17 +01:00
dm-zero.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm.c dm: add reserved_bio_based_ios module parameter 2013-09-23 10:42:24 -04:00
dm.h dm: add reserved_bio_based_ios module parameter 2013-09-23 10:42:24 -04:00
faulty.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
Kconfig dm: add switch target 2013-07-10 23:41:19 +01:00
linear.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
linear.h
Makefile dm: add statistics support 2013-09-05 20:46:06 -04:00
md.c md: fix calculation of stacking limits on level change. 2013-11-14 15:16:15 +11:00
md.h sysfs: clean up sysfs_get_dirent() 2013-09-26 15:33:18 -07:00
multipath.c MD: change the parameter of md thread 2012-10-11 13:34:00 +11:00
multipath.h
raid0.c md: fix buglet in RAID5 -> RAID0 conversion. 2013-06-26 12:38:19 +10:00
raid0.h md: add proper merge_bvec handling to RAID0 and Linear. 2012-03-19 12:46:39 +11:00
raid1.c md: Fix skipping recovery for read-only arrays. 2013-10-24 12:55:17 +11:00
raid1.h md/raid1: prevent merging too large request 2012-07-31 10:03:53 +10:00
raid5.c raid5: relieve lock contention in get_active_stripe() 2013-11-14 15:20:58 +11:00
raid5.h raid5: relieve lock contention in get_active_stripe() 2013-11-14 15:20:58 +11:00
raid10.c md: Fix skipping recovery for read-only arrays. 2013-10-24 12:55:17 +11:00
raid10.h MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) 2013-02-26 11:55:30 +11:00