linux

iv/linux

Author	SHA1	Message	Date
Jens Axboe	ffc4e75957	cfq-iosched: add hlist for browsing parallel to the radix tree It's cumbersome to browse a radix tree from start to finish, especially since we modify keys when a process exits. So add a hlist for the single purpose of browsing over all known cfq_io_contexts, used for exit, io prio change, etc. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948 Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-19 10:04:00 +01:00
Jens Axboe	fe094d98e7	cfq-iosched: make checkpatch compliant Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-02-01 09:26:33 +01:00
Jens Axboe	febffd6181	cfq-iosched: kill some big inlines Use of inlines were a bit over the top, trim them down a bit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 13:19:43 +01:00
Jens Axboe	0871714e08	cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions Currently you must be root to set idle io prio class on a process. This is due to the fact that the idle class is implemented as a true idle class, meaning that it will not make progress if someone else is requesting disk access. Unfortunately this means that it opens DOS opportunities by locking down file system resources, hence it is root only at the moment. This patch relaxes the idle class a little, by removing the truly idle part (which entals a grace period with associated timer). The modifications make the idle class as close to zero impact as can be done while still guarenteeing progress. This means we can relax the root only criteria as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 11:38:15 +01:00
Jens Axboe	4ac845a2e9	block: cfq: make the io contect sharing lockless The io context sharing introduced a per-ioc spinlock, that would protect the cfq io context lookup. That is a regression from the original, since we never needed any locking there because the ioc/cic were process private. The cic lookup is changed from an rbtree construct to a radix tree, which we can then use RCU to make the reader side lockless. That is the performance critical path, modifying the radix tree is only done on process creation (when that process first does IO, actually) and on process exit (if that process has done IO). As it so happens, radix trees are also much faster for this type of lookup where the key is a pointer. It's a very sparse tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:33 +01:00
Nikanth Karthikesan	66dac98ed0	io_context sharing - cfq changes changes in the cfq for io_context sharing Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:32 +01:00
Jens Axboe	fd0928df98	ioprio: move io priority from task_struct to io_context This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-01-28 10:50:29 +01:00
Adrian Bunk	2fdd82bd88	block: let elv_register() return void elv_register() always returns 0, and there isn't anything it does where it should return an error (the only error condition is so grave that it's handled with a BUG_ON). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-12-18 08:29:28 +01:00
Oleg Nesterov	0e7be9edb9	cfq_idle_class_timer: add paranoid checks for jiffies overflow In theory, if the queue was idle long enough, cfq_idle_class_timer may have a false (and very long) timeout because jiffies can wrap into the past wrt ->last_end_request. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-07 13:51:35 +01:00
Oleg Nesterov	b70c864d3c	cfq: fix IOPRIO_CLASS_IDLE delays After the fresh boot: ionice -c3 -p $$ echo cfq >> /sys/block/XXX/queue/scheduler dd if=/dev/XXX of=/dev/null bs=512 count=1 Now dd hangs in D state and the queue is completely stalled for approximately INITIAL_JIFFIES + CFQ_IDLE_GRACE jiffies. This is because cfq_init_queue() forgets to initialize cfq_data->last_end_request. (I guess this patch is not complete, overflow is still possible) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-07 09:46:13 +01:00
Oleg Nesterov	2389d1ef17	cfq: fix IOPRIO_CLASS_IDLE accounting Spotted by Nick <gentuu@gmail.com>, hopefully can explain the second trace in http://bugzilla.kernel.org/show_bug.cgi?id=9180. If ->async_idle_cfqq != NULL cfq_put_async_queues() puts it IOPRIO_BE_NR times in a loop. Fix this. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-11-07 09:45:00 +01:00
Oleg Nesterov	0a0836a09c	cfq_get_queue: fix possible NULL pointer access cfq_get_queue()->cfq_find_alloc_queue() can fail, check the returned value. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Note that this isn't a bug at the moment, since the regular IO path does not call this path without __GFP_WAIT set. However, it could be a future bug, so I've applied it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-29 11:33:05 +01:00
Oleg Nesterov	4310864b9d	cfq_exit_queue() should cancel cfq_data->unplug_work Spotted by Nick <gentuu@gmail.com>, perhaps explains the first trace in http://bugzilla.kernel.org/show_bug.cgi?id=9180. cfq_exit_queue() should cancel cfqd->unplug_work before freeing cfqd. blk_sync_queue() seems unneeded, removed. Q: why cfq_exit_queue() calls cfq_shutdown_timer_wq() twice? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-29 11:33:05 +01:00
Jens Axboe	165125e1e4	[BLOCK] Get rid of request_queue_t typedef Some of the code has been gradually transitioned to using the proper struct request_queue, but there's lots left. So do a full sweet of the kernel and get rid of this typedef and replace its uses with the proper type. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-24 09:28:11 +02:00
Alexey Dobriyan	8350163a90	cfq: Write-only stuff in CFQ data structures There are some leftover bits from the task cooperator patch, that was yanked out again. While it will get reintroduced, no point in having this write-only stuff in the tree. So yank it. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-20 10:07:50 +02:00
Vasily Tarasov	c2dea2d1fd	cfq: async queue allocation per priority If we have two processes with different ioprio_class, but the same ioprio_data, their async requests will fall into the same queue. I guess such behavior is not expected, because it's not right to put real-time requests and best-effort requests in the same queue. The attached patch fixes the problem by introducing additional *cfqq fields on cfqd, pointing to per-(class,priority) async queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-20 10:06:38 +02:00
Christoph Lameter	94f6030ca7	Slab allocators: Replace explicit zeroing with __GFP_ZERO kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing variant in the past. But with __GFP_ZERO it is possible now to do zeroing while allocating. Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever we can. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-17 10:23:02 -07:00
Jens Axboe	15c31be4d5	cfq-iosched: fix async queue behaviour With the cfq_queue hash removal, we inadvertently got rid of the async queue sharing. This was not intentional, in fact CFQ purposely shares the async queue per priority level to get good merging for async writes. So put some logic in cfq_get_queue() to track the shared queues. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-07-10 13:43:25 +02:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Jens Axboe	597bc485d6	cfq-iosched: speedup cic rb lookup We often lookup the same queue many times in succession, so cache the last looked up queue to avoid browsing the rbtree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Vasily Tarasov	91fac317a3	cfq-iosched: get rid of cfqq hash cfq hash is no more necessary. We always can get cfqq from io context. cfq_get_io_context_noalloc() function is introduced, because we don't want to allocate cic on merging and checking may_queue. In order to identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is eliminated we need to use other criterion: sync flag for queue is added. In all places where we dig in rb_tree we're in current context, so no additional locking is required. Advantages of this patch: no additional memory for hash, no seeking in hash, code is cleaner. But it is necessary now to seek cic in per-ioc rbtree, but it is faster: - most processes work only with few devices - most systems have only few block devices - it is a rb-tree Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Changes by me: - Merge into CFQ devel branch - Get rid of cfq_get_io_context_noalloc() - Fix various bugs with dereferencing cic->cfqq[] with offset other than 0 or 1. - Fix bug in cfqq setup, is_sync condition was reversed. - Fix bug where only bio_sync() is used, we need to check for a READ too Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	cc19747977	cfq-iosched: tighten queue request overlap condition For tagged devices, allow overlap of requests if the idle window isn't enabled on the current active queue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	3ed9a2965c	cfq-iosched: improve sync vs async workloads Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Jens Axboe	1be92f2fc7	cfq-iosched: never allow an async queue idling We don't enable it by default, don't let it get enabled during runtime. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	20e493a8d0	cfq-iosched: get rid of ->dispatch_slice We can track it fairly accurately locally, let the slice handling take care of the rest. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	6084cdda0e	cfq-iosched: don't pass unused preemption variable around We don't use it anymore in the slice expiry handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	edd75ffd92	cfq-iosched: get rid of ->cur_rr and ->cfq_list It's only used for preemption now that the IDLE and RT queues also use the rbtree. If we pass an 'add_front' variable to cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion at the front of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	67e6b49e39	cfq-iosched: slice offset should take ioprio into account Use the max_slice-cur_slice as the multipler for the insertion offset. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	498d3aa2b4	[PATCH] cfq-iosched: style cleanups and comments Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	67060e3799	cfq-iosched: sort IDLE queues into the rbtree Same treatment as the RT conversion, just put the sorted idle branch at the end of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	0c534e0a46	cfq-iosched: sort RT queues into the rbtree Currently CFQ does a linked insert into the current list for RT queues. We can just factor the class into the rb insertion, and then we don't have to treat RT queues in a special way. It's faster, too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:22 +02:00
Jens Axboe	cc09e2990f	[PATCH] cfq-iosched: speed up rbtree handling For cases where the rbtree is mainly used for sorting and min retrieval, a nice speedup of the rbtree code is to maintain a cache of the leftmost node in the tree. Also spotted in the CFS CPU scheduler code. Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the leftmost hint in cfq_rb_first() if it isn't set, instead of only updating it on insert. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	d9e7620e60	cfq-iosched: rework the whole round-robin list concept Drawing on some inspiration from the CFS CPU scheduler design, overhaul the pending cfq_queue concept list management. Currently CFQ uses a doubly linked list per priority level for sorting and service uses. Kill those lists and maintain an rbtree of cfq_queue's, sorted by when to service them. This unfortunately means that the ionice levels aren't as strong anymore, will work on improving those later. We only scale the slice time now, not the number of times we service. This means that latency is better (for all priority levels), but that the distinction between the highest and lower levels aren't as big. The diffstat speaks for itself. cfq-iosched.c \| 363 +++++++++++++++++--------------------------------- 1 file changed, 125 insertions(+), 238 deletions(-) Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	1afba0451c	cfq-iosched: minor updates - Move the queue_new flag clear to when the queue is selected - Only select the non-first queue in cfq_get_best_queue(), if there's a substantial difference between the best and first. - Get rid of ->busy_rr - Only select a close cooperator, if the current queue is known to take a while to "think". Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	6d048f5310	cfq-iosched: development update - Implement logic for detecting cooperating processes, so we choose the best available queue whenever possible. - Improve residual slice time accounting. - Remove dead code: we no longer see async requests coming in on sync queues. That part was removed a long time ago. That means that we can also remove the difference between cfq_cfqq_sync() and cfq_cfqq_class_sync(), they are now indentical. And we can kill the on_dispatch array, just make it a counter. - Allow a process to go into the current list, if it hasn't been serviced in this scheduler tick yet. Possible future improvements including caching the cfqq lookup in cfq_close_cooperator(), so we don't have to look it up twice. cfq_get_best_queue() should just use that last decision instead of doing it again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	1e3335de05	cfq-iosched: improve preemption for cooperating tasks When testing the syslet async io approach, I discovered that CFQ sometimes didn't perform as well as expected. cfq_should_preempt() needs to better check for cooperating tasks, so fix that by allowing preemption of an equal priority queue if the recently queued request is as good a candidate for IO as the one we are currently waiting for. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:21 +02:00
Jens Axboe	5044eed488	cfq-iosched: fix alias + front merge bug There's a really rare and obscure bug in CFQ, that causes a crash in cfq_dispatch_insert() due to rq == NULL. One example of the resulting oops is seen here: http://lkml.org/lkml/2007/4/15/41 Neil correctly diagnosed the situation for how this can happen: if two concurrent requests with the exact same sector number (due to direct IO or aliasing between MD and the raw device access), the alias handling will add the request to the sortlist, but next_rq remains NULL. Read the more complete analysis at: http://lkml.org/lkml/2007/4/25/57 This looks like it requires md to trigger, even though it should potentially be possible to due with O_DIRECT (at least if you edit the kernel and doctor some of the unplug calls). The fix is to move the ->next_rq update to when we add a request to the rbtree. Then we remove the possibility for a request to exist in the rbtree code, but not have ->next_rq correctly updated. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-25 08:41:48 -07:00
Jens Axboe	a993800655	cfq-iosched: fix sequential write regression We have a 10-15% performance regression for sequential writes on TCQ/NCQ enabled drives in 2.6.21-rcX after the CFQ update went in. It has been reported by Valerie Clement <valerie.clement@bull.net> and the Intel testing folks. The regression is because of CFQ's now more aggressive queue control, limiting the depth available to the device. This patches fixes that regression by allowing a greater depth when only one queue is busy. It has been tested to not impact sync-vs-async workloads too much - we still do a lot better than 2.6.20. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-20 22:56:29 -07:00
Jens Axboe	9ede209e83	cfq-iosched: improve continue or break logic in cfq_dispatch This improves performance considerably for sync requests when you have command queuing enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	28f95cbc3e	cfq-iosched: remove the implicit queue kicking in slice expire We only really need it for a process going away, so move it to those locations. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	3c6bd2f879	cfq-iosched: check whether a queue timed out in accounting Makes it more fair for the residual slice count. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	cb8874119e	cfq-iosched: tweak the FIFO checking We currently check the FIFO once per slice. Optimize that a bit and only do it as the first thing for a new slice, so we don't end up doing a single request and then seek to the FIFO requests. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	1792669cc1	cfq-iosched: don't pass in queue for cfq_arm_slice_timer() It must always be the active queue, otherwise it's a bug. So just use the active_queue, don't pass it in explicitly. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	c5b680f3b7	cfq-iosched: account for slice over/under time If a slice uses less than it is entitled to (or perhaps more), include that in the decision on how much time to give it the next time it gets serviced. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	44f7c16065	cfq-iosched: defer slice activation to first request being active This better matches what time the queue is actually spending doing IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	99f9628aba	[PATCH] cfq-iosched: use last service point as the fairness criteria Right now we use slice_start, which gives async queues an unfair advantage. Chance that to service_last, and base the resorter on that. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:45 +01:00
Jens Axboe	b0b8d74941	cfq-iosched: document the cfqq flags Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	98e41c7dfc	[PATCH] cfq-iosched: move on_rr check into cfq_resort_rr_list() Move the on_rr check into cfq_resort_rr_list(), every call site needs to check it anyway. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	aaf1228ddf	cfq-iosched: remove cfq_io_context last_queue It hasn't been used for a while, kill it off and remove the old if 0 code chunk. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-02-11 23:14:44 +01:00
Jens Axboe	ec8acb6904	[PATCH] cfq-iosched: merging problem Two issues: - The final return 1 should be a return 0, otherwise comparing cfqq is a noop. - bio_sync() only checks the sync flag, while rq_is_sync() checks both for READ and sync. The latter is what we want. Expand the bio check to include reads, and relax the restriction to allow merging of async io into sync requests. In the future we want to clean up the SYNC logic, right now it means both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue. Leave that for later. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-02 09:46:16 -08:00

1 2 3

125 Commits