[PATCH] 05/05 update biodoc to match new generic dispatch api
Updates biodoc to reflect changes in elevator API Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de>
This commit is contained in:
parent
98b11471d7
commit
4c9f783640
@ -906,9 +906,20 @@ Aside:
|
||||
|
||||
|
||||
4. The I/O scheduler
|
||||
I/O schedulers are now per queue. They should be runtime switchable and modular
|
||||
but aren't yet. Jens has most bits to do this, but the sysfs implementation is
|
||||
missing.
|
||||
I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch
|
||||
queue and specific I/O schedulers. Unless stated otherwise, elevator is used
|
||||
to refer to both parts and I/O scheduler to specific I/O schedulers.
|
||||
|
||||
Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c.
|
||||
The generic dispatch queue is responsible for properly ordering barrier
|
||||
requests, requeueing, handling non-fs requests and all other subtleties.
|
||||
|
||||
Specific I/O schedulers are responsible for ordering normal filesystem
|
||||
requests. They can also choose to delay certain requests to improve
|
||||
throughput or whatever purpose. As the plural form indicates, there are
|
||||
multiple I/O schedulers. They can be built as modules but at least one should
|
||||
be built inside the kernel. Each queue can choose different one and can also
|
||||
change to another one dynamically.
|
||||
|
||||
A block layer call to the i/o scheduler follows the convention elv_xxx(). This
|
||||
calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh,
|
||||
@ -921,44 +932,36 @@ keeping work.
|
||||
The functions an elevator may implement are: (* are mandatory)
|
||||
elevator_merge_fn called to query requests for merge with a bio
|
||||
|
||||
elevator_merge_req_fn " " " with another request
|
||||
elevator_merge_req_fn called when two requests get merged. the one
|
||||
which gets merged into the other one will be
|
||||
never seen by I/O scheduler again. IOW, after
|
||||
being merged, the request is gone.
|
||||
|
||||
elevator_merged_fn called when a request in the scheduler has been
|
||||
involved in a merge. It is used in the deadline
|
||||
scheduler for example, to reposition the request
|
||||
if its sorting order has changed.
|
||||
|
||||
*elevator_next_req_fn returns the next scheduled request, or NULL
|
||||
if there are none (or none are ready).
|
||||
elevator_dispatch_fn fills the dispatch queue with ready requests.
|
||||
I/O schedulers are free to postpone requests by
|
||||
not filling the dispatch queue unless @force
|
||||
is non-zero. Once dispatched, I/O schedulers
|
||||
are not allowed to manipulate the requests -
|
||||
they belong to generic dispatch queue.
|
||||
|
||||
*elevator_add_req_fn called to add a new request into the scheduler
|
||||
elevator_add_req_fn called to add a new request into the scheduler
|
||||
|
||||
elevator_queue_empty_fn returns true if the merge queue is empty.
|
||||
Drivers shouldn't use this, but rather check
|
||||
if elv_next_request is NULL (without losing the
|
||||
request if one exists!)
|
||||
|
||||
elevator_remove_req_fn This is called when a driver claims ownership of
|
||||
the target request - it now belongs to the
|
||||
driver. It must not be modified or merged.
|
||||
Drivers must not lose the request! A subsequent
|
||||
call of elevator_next_req_fn must return the
|
||||
_next_ request.
|
||||
|
||||
elevator_requeue_req_fn called to add a request to the scheduler. This
|
||||
is used when the request has alrnadebeen
|
||||
returned by elv_next_request, but hasn't
|
||||
completed. If this is not implemented then
|
||||
elevator_add_req_fn is called instead.
|
||||
|
||||
elevator_former_req_fn
|
||||
elevator_latter_req_fn These return the request before or after the
|
||||
one specified in disk sort order. Used by the
|
||||
block layer to find merge possibilities.
|
||||
|
||||
elevator_completed_req_fn called when a request is completed. This might
|
||||
come about due to being merged with another or
|
||||
when the device completes the request.
|
||||
elevator_completed_req_fn called when a request is completed.
|
||||
|
||||
elevator_may_queue_fn returns true if the scheduler wants to allow the
|
||||
current context to queue a new request even if
|
||||
@ -967,13 +970,33 @@ elevator_may_queue_fn returns true if the scheduler wants to allow the
|
||||
|
||||
elevator_set_req_fn
|
||||
elevator_put_req_fn Must be used to allocate and free any elevator
|
||||
specific storate for a request.
|
||||
specific storage for a request.
|
||||
|
||||
elevator_activate_req_fn Called when device driver first sees a request.
|
||||
I/O schedulers can use this callback to
|
||||
determine when actual execution of a request
|
||||
starts.
|
||||
elevator_deactivate_req_fn Called when device driver decides to delay
|
||||
a request by requeueing it.
|
||||
|
||||
elevator_init_fn
|
||||
elevator_exit_fn Allocate and free any elevator specific storage
|
||||
for a queue.
|
||||
|
||||
4.2 I/O scheduler implementation
|
||||
4.2 Request flows seen by I/O schedulers
|
||||
All requests seens by I/O schedulers strictly follow one of the following three
|
||||
flows.
|
||||
|
||||
set_req_fn ->
|
||||
|
||||
i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn ->
|
||||
(deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn
|
||||
ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn
|
||||
iii. [none]
|
||||
|
||||
-> put_req_fn
|
||||
|
||||
4.3 I/O scheduler implementation
|
||||
The generic i/o scheduler algorithm attempts to sort/merge/batch requests for
|
||||
optimal disk scan and request servicing performance (based on generic
|
||||
principles and device capabilities), optimized for:
|
||||
@ -993,18 +1016,7 @@ request in sort order to prevent binary tree lookups.
|
||||
This arrangement is not a generic block layer characteristic however, so
|
||||
elevators may implement queues as they please.
|
||||
|
||||
ii. Last merge hint
|
||||
The last merge hint is part of the generic queue layer. I/O schedulers must do
|
||||
some management on it. For the most part, the most important thing is to make
|
||||
sure q->last_merge is cleared (set to NULL) when the request on it is no longer
|
||||
a candidate for merging (for example if it has been sent to the driver).
|
||||
|
||||
The last merge performed is cached as a hint for the subsequent request. If
|
||||
sequential data is being submitted, the hint is used to perform merges without
|
||||
any scanning. This is not sufficient when there are multiple processes doing
|
||||
I/O though, so a "merge hash" is used by some schedulers.
|
||||
|
||||
iii. Merge hash
|
||||
ii. Merge hash
|
||||
AS and deadline use a hash table indexed by the last sector of a request. This
|
||||
enables merging code to quickly look up "back merge" candidates, even when
|
||||
multiple I/O streams are being performed at once on one disk.
|
||||
@ -1013,29 +1025,8 @@ multiple I/O streams are being performed at once on one disk.
|
||||
are far less common than "back merges" due to the nature of most I/O patterns.
|
||||
Front merges are handled by the binary trees in AS and deadline schedulers.
|
||||
|
||||
iv. Handling barrier cases
|
||||
A request with flags REQ_HARDBARRIER or REQ_SOFTBARRIER must not be ordered
|
||||
around. That is, they must be processed after all older requests, and before
|
||||
any newer ones. This includes merges!
|
||||
|
||||
In AS and deadline schedulers, barriers have the effect of flushing the reorder
|
||||
queue. The performance cost of this will vary from nothing to a lot depending
|
||||
on i/o patterns and device characteristics. Obviously they won't improve
|
||||
performance, so their use should be kept to a minimum.
|
||||
|
||||
v. Handling insertion position directives
|
||||
A request may be inserted with a position directive. The directives are one of
|
||||
ELEVATOR_INSERT_BACK, ELEVATOR_INSERT_FRONT, ELEVATOR_INSERT_SORT.
|
||||
|
||||
ELEVATOR_INSERT_SORT is a general directive for non-barrier requests.
|
||||
ELEVATOR_INSERT_BACK is used to insert a barrier to the back of the queue.
|
||||
ELEVATOR_INSERT_FRONT is used to insert a barrier to the front of the queue, and
|
||||
overrides the ordering requested by any previous barriers. In practice this is
|
||||
harmless and required, because it is used for SCSI requeueing. This does not
|
||||
require flushing the reorder queue, so does not impose a performance penalty.
|
||||
|
||||
vi. Plugging the queue to batch requests in anticipation of opportunities for
|
||||
merge/sort optimizations
|
||||
iii. Plugging the queue to batch requests in anticipation of opportunities for
|
||||
merge/sort optimizations
|
||||
|
||||
This is just the same as in 2.4 so far, though per-device unplugging
|
||||
support is anticipated for 2.5. Also with a priority-based i/o scheduler,
|
||||
@ -1069,7 +1060,7 @@ Aside:
|
||||
blk_kick_queue() to unplug a specific queue (right away ?)
|
||||
or optionally, all queues, is in the plan.
|
||||
|
||||
4.3 I/O contexts
|
||||
4.4 I/O contexts
|
||||
I/O contexts provide a dynamically allocated per process data area. They may
|
||||
be used in I/O schedulers, and in the block layer (could be used for IO statis,
|
||||
priorities for example). See *io_context in drivers/block/ll_rw_blk.c, and
|
||||
|
Loading…
Reference in New Issue
Block a user