2019-04-30 21:42:40 +03:00
// SPDX-License-Identifier: GPL-2.0-or-later
2017-04-19 17:48:24 +03:00
/*
* Hierarchical Budget Worst - case Fair Weighted Fair Queueing
* ( B - WF2Q + ) : hierarchical scheduling algorithm by which the BFQ I / O
* scheduler schedules generic entities . The latter can represent
* either single bfq queues ( associated with processes ) or groups of
* bfq queues ( associated with cgroups ) .
*/
# include "bfq-iosched.h"
/**
* bfq_gt - compare two timestamps .
* @ a : first ts .
* @ b : second ts .
*
* Return @ a > @ b , dealing with wrapping correctly .
*/
static int bfq_gt ( u64 a , u64 b )
{
return ( s64 ) ( a - b ) > 0 ;
}
static struct bfq_entity * bfq_root_active_entity ( struct rb_root * tree )
{
struct rb_node * node = tree - > rb_node ;
return rb_entry ( node , struct bfq_entity , rb_node ) ;
}
static unsigned int bfq_class_idx ( struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
return bfqq ? bfqq - > ioprio_class - 1 :
BFQ_DEFAULT_GRP_CLASS - 1 ;
}
2019-01-29 14:06:29 +03:00
unsigned int bfq_tot_busy_queues ( struct bfq_data * bfqd )
{
return bfqd - > busy_queues [ 0 ] + bfqd - > busy_queues [ 1 ] +
bfqd - > busy_queues [ 2 ] ;
}
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
static struct bfq_entity * bfq_lookup_next_entity ( struct bfq_sched_data * sd ,
bool expiration ) ;
2017-04-19 17:48:24 +03:00
static bool bfq_update_parent_budget ( struct bfq_entity * next_in_service ) ;
/**
* bfq_update_next_in_service - update sd - > next_in_service
* @ sd : sched_data for which to perform the update .
* @ new_entity : if not NULL , pointer to the entity whose activation ,
2019-04-08 18:35:34 +03:00
* requeueing or repositioning triggered the invocation of
2017-04-19 17:48:24 +03:00
* this function .
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* @ expiration : id true , this function is being invoked after the
* expiration of the in - service entity
2017-04-19 17:48:24 +03:00
*
* This function is called to update sd - > next_in_service , which , in
* its turn , may change as a consequence of the insertion or
* extraction of an entity into / from one of the active trees of
* sd . These insertions / extractions occur as a consequence of
* activations / deactivations of entities , with some activations being
* ' true ' activations , and other activations being requeueings ( i . e . ,
* implementing the second , requeueing phase of the mechanism used to
* reposition an entity in its active tree ; see comments on
* __bfq_activate_entity and __bfq_requeue_entity for details ) . In
* both the last two activation sub - cases , new_entity points to the
* just activated or requeued entity .
*
* Returns true if sd - > next_in_service changes in such a way that
* entity - > parent may become the next_in_service for its parent
* entity .
*/
static bool bfq_update_next_in_service ( struct bfq_sched_data * sd ,
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
struct bfq_entity * new_entity ,
bool expiration )
2017-04-19 17:48:24 +03:00
{
struct bfq_entity * next_in_service = sd - > next_in_service ;
bool parent_sched_may_change = false ;
block, bfq: guarantee update_next_in_service always returns an eligible entity
If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.
In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.
This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:31 +03:00
bool change_without_lookup = false ;
2017-04-19 17:48:24 +03:00
/*
* If this update is triggered by the activation , requeueing
2019-04-08 18:35:34 +03:00
* or repositioning of an entity that does not coincide with
2017-04-19 17:48:24 +03:00
* sd - > next_in_service , then a full lookup in the active tree
* can be avoided . In fact , it is enough to check whether the
2017-08-31 09:46:30 +03:00
* just - modified entity has the same priority as
* sd - > next_in_service , is eligible and has a lower virtual
2017-04-19 17:48:24 +03:00
* finish time than sd - > next_in_service . If this compound
* condition holds , then the new entity becomes the new
* next_in_service . Otherwise no change is needed .
*/
if ( new_entity & & new_entity ! = sd - > next_in_service ) {
/*
* Flag used to decide whether to replace
* sd - > next_in_service with new_entity . Tentatively
* set to true , and left as true if
* sd - > next_in_service is NULL .
*/
block, bfq: guarantee update_next_in_service always returns an eligible entity
If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.
In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.
This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:31 +03:00
change_without_lookup = true ;
2017-04-19 17:48:24 +03:00
/*
* If there is already a next_in_service candidate
2017-08-31 09:46:30 +03:00
* entity , then compare timestamps to decide whether
* to replace sd - > service_tree with new_entity .
2017-04-19 17:48:24 +03:00
*/
if ( next_in_service ) {
unsigned int new_entity_class_idx =
bfq_class_idx ( new_entity ) ;
struct bfq_service_tree * st =
sd - > service_tree + new_entity_class_idx ;
block, bfq: guarantee update_next_in_service always returns an eligible entity
If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.
In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.
This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:31 +03:00
change_without_lookup =
2017-04-19 17:48:24 +03:00
( new_entity_class_idx = =
bfq_class_idx ( next_in_service )
& &
! bfq_gt ( new_entity - > start , st - > vtime )
& &
bfq_gt ( next_in_service - > finish ,
2017-08-31 09:46:30 +03:00
new_entity - > finish ) ) ;
2017-04-19 17:48:24 +03:00
}
block, bfq: guarantee update_next_in_service always returns an eligible entity
If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.
In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.
This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:31 +03:00
if ( change_without_lookup )
2017-04-19 17:48:24 +03:00
next_in_service = new_entity ;
block, bfq: guarantee update_next_in_service always returns an eligible entity
If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.
In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.
This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:31 +03:00
}
if ( ! change_without_lookup ) /* lookup needed */
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
next_in_service = bfq_lookup_next_entity ( sd , expiration ) ;
2017-04-19 17:48:24 +03:00
2018-08-16 19:51:16 +03:00
if ( next_in_service ) {
bool new_budget_triggers_change =
2017-04-19 17:48:24 +03:00
bfq_update_parent_budget ( next_in_service ) ;
2018-08-16 19:51:16 +03:00
parent_sched_may_change = ! sd - > next_in_service | |
new_budget_triggers_change ;
}
2017-04-19 17:48:24 +03:00
sd - > next_in_service = next_in_service ;
return parent_sched_may_change ;
}
# ifdef CONFIG_BFQ_GROUP_IOSCHED
/*
* Returns true if this budget changes may let next_in_service - > parent
* become the next_in_service entity for its parent entity .
*/
static bool bfq_update_parent_budget ( struct bfq_entity * next_in_service )
{
struct bfq_entity * bfqg_entity ;
struct bfq_group * bfqg ;
struct bfq_sched_data * group_sd ;
bool ret = false ;
group_sd = next_in_service - > sched_data ;
bfqg = container_of ( group_sd , struct bfq_group , sched_data ) ;
/*
* bfq_group ' s my_entity field is not NULL only if the group
* is not the root group . We must not touch the root entity
* as it must never become an in - service entity .
*/
bfqg_entity = bfqg - > my_entity ;
if ( bfqg_entity ) {
if ( bfqg_entity - > budget > next_in_service - > budget )
ret = true ;
bfqg_entity - > budget = next_in_service - > budget ;
}
return ret ;
}
/*
* This function tells whether entity stops being a candidate for next
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* service , according to the restrictive definition of the field
* next_in_service . In particular , this function is invoked for an
* entity that is about to be set in service .
2017-04-19 17:48:24 +03:00
*
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* If entity is a queue , then the entity is no longer a candidate for
* next service according to the that definition , because entity is
* about to become the in - service queue . This function then returns
* true if entity is a queue .
2017-04-19 17:48:24 +03:00
*
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* In contrast , entity could still be a candidate for next service if
* it is not a queue , and has more than one active child . In fact ,
* even if one of its children is about to be set in service , other
* active children may still be the next to serve , for the parent
* entity , even according to the above definition . As a consequence , a
* non - queue entity is not a candidate for next - service only if it has
* only one active child . And only if this condition holds , then this
* function returns true for a non - queue entity .
2017-04-19 17:48:24 +03:00
*/
static bool bfq_no_longer_next_in_service ( struct bfq_entity * entity )
{
struct bfq_group * bfqg ;
if ( bfq_entity_to_bfqq ( entity ) )
return true ;
bfqg = container_of ( entity , struct bfq_group , entity ) ;
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
/*
* The field active_entities does not always contain the
* actual number of active children entities : it happens to
* not account for the in - service entity in case the latter is
* removed from its active tree ( which may get done after
* invoking the function bfq_no_longer_next_in_service in
* bfq_get_next_queue ) . Fortunately , here , i . e . , while
* bfq_no_longer_next_in_service is not yet completed in
* bfq_get_next_queue , bfq_active_extract has not yet been
* invoked , and thus active_entities still coincides with the
* actual number of active entities .
*/
2017-04-19 17:48:24 +03:00
if ( bfqg - > active_entities = = 1 )
return true ;
return false ;
}
# else /* CONFIG_BFQ_GROUP_IOSCHED */
static bool bfq_update_parent_budget ( struct bfq_entity * next_in_service )
{
return false ;
}
static bool bfq_no_longer_next_in_service ( struct bfq_entity * entity )
{
return true ;
}
# endif /* CONFIG_BFQ_GROUP_IOSCHED */
/*
* Shift for timestamp calculations . This actually limits the maximum
* service allowed in one timestamp delta ( small shift values increase it ) ,
* the maximum total weight that can be used for the queues in the system
* ( big shift values increase it ) , and the period of virtual time
* wraparounds .
*/
# define WFQ_SERVICE_SHIFT 22
struct bfq_queue * bfq_entity_to_bfqq ( struct bfq_entity * entity )
{
struct bfq_queue * bfqq = NULL ;
if ( ! entity - > my_sched_data )
bfqq = container_of ( entity , struct bfq_queue , entity ) ;
return bfqq ;
}
/**
* bfq_delta - map service into the virtual time domain .
* @ service : amount of service .
* @ weight : scale factor ( weight of an entity or weight sum ) .
*/
static u64 bfq_delta ( unsigned long service , unsigned long weight )
{
2020-01-20 13:04:43 +03:00
return div64_ul ( ( u64 ) service < < WFQ_SERVICE_SHIFT , weight ) ;
2017-04-19 17:48:24 +03:00
}
/**
* bfq_calc_finish - assign the finish time to an entity .
* @ entity : the entity to act upon .
* @ service : the service to be charged to the entity .
*/
static void bfq_calc_finish ( struct bfq_entity * entity , unsigned long service )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
entity - > finish = entity - > start +
bfq_delta ( service , entity - > weight ) ;
if ( bfqq ) {
bfq_log_bfqq ( bfqq - > bfqd , bfqq ,
" calc_finish: serv %lu, w %d " ,
service , entity - > weight ) ;
bfq_log_bfqq ( bfqq - > bfqd , bfqq ,
" calc_finish: start %llu, finish %llu, delta %llu " ,
entity - > start , entity - > finish ,
bfq_delta ( service , entity - > weight ) ) ;
}
}
/**
* bfq_entity_of - get an entity from a node .
* @ node : the node field of the entity .
*
* Convert a node pointer to the relative entity . This is used only
* to simplify the logic of some functions and not as the generic
* conversion mechanism because , e . g . , in the tree walking functions ,
* the check for a % NULL value would be redundant .
*/
struct bfq_entity * bfq_entity_of ( struct rb_node * node )
{
struct bfq_entity * entity = NULL ;
if ( node )
entity = rb_entry ( node , struct bfq_entity , rb_node ) ;
return entity ;
}
/**
* bfq_extract - remove an entity from a tree .
* @ root : the tree root .
* @ entity : the entity to remove .
*/
static void bfq_extract ( struct rb_root * root , struct bfq_entity * entity )
{
entity - > tree = NULL ;
rb_erase ( & entity - > rb_node , root ) ;
}
/**
* bfq_idle_extract - extract an entity from the idle tree .
* @ st : the service tree of the owning @ entity .
* @ entity : the entity being removed .
*/
static void bfq_idle_extract ( struct bfq_service_tree * st ,
struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
struct rb_node * next ;
if ( entity = = st - > first_idle ) {
next = rb_next ( & entity - > rb_node ) ;
st - > first_idle = bfq_entity_of ( next ) ;
}
if ( entity = = st - > last_idle ) {
next = rb_prev ( & entity - > rb_node ) ;
st - > last_idle = bfq_entity_of ( next ) ;
}
bfq_extract ( & st - > idle , entity ) ;
if ( bfqq )
list_del ( & bfqq - > bfqq_list ) ;
}
/**
* bfq_insert - generic tree insertion .
* @ root : tree root .
* @ entity : entity to insert .
*
* This is used for the idle and the active tree , since they are both
* ordered by finish time .
*/
static void bfq_insert ( struct rb_root * root , struct bfq_entity * entity )
{
struct bfq_entity * entry ;
struct rb_node * * node = & root - > rb_node ;
struct rb_node * parent = NULL ;
while ( * node ) {
parent = * node ;
entry = rb_entry ( parent , struct bfq_entity , rb_node ) ;
if ( bfq_gt ( entry - > finish , entity - > finish ) )
node = & parent - > rb_left ;
else
node = & parent - > rb_right ;
}
rb_link_node ( & entity - > rb_node , parent , node ) ;
rb_insert_color ( & entity - > rb_node , root ) ;
entity - > tree = root ;
}
/**
* bfq_update_min - update the min_start field of a entity .
* @ entity : the entity to update .
* @ node : one of its children .
*
* This function is called when @ entity may store an invalid value for
* min_start due to updates to the active tree . The function assumes
* that the subtree rooted at @ node ( which may be its left or its right
* child ) has a valid min_start value .
*/
static void bfq_update_min ( struct bfq_entity * entity , struct rb_node * node )
{
struct bfq_entity * child ;
if ( node ) {
child = rb_entry ( node , struct bfq_entity , rb_node ) ;
if ( bfq_gt ( entity - > min_start , child - > min_start ) )
entity - > min_start = child - > min_start ;
}
}
/**
* bfq_update_active_node - recalculate min_start .
* @ node : the node to update .
*
* @ node may have changed position or one of its children may have moved ,
* this function updates its min_start value . The left and right subtrees
* are assumed to hold a correct min_start value .
*/
static void bfq_update_active_node ( struct rb_node * node )
{
struct bfq_entity * entity = rb_entry ( node , struct bfq_entity , rb_node ) ;
entity - > min_start = entity - > start ;
bfq_update_min ( entity , node - > rb_right ) ;
bfq_update_min ( entity , node - > rb_left ) ;
}
/**
* bfq_update_active_tree - update min_start for the whole active tree .
* @ node : the starting node .
*
* @ node must be the deepest modified node after an update . This function
* updates its min_start using the values held by its children , assuming
* that they did not change , and then updates all the nodes that may have
* changed in the path to the root . The only nodes that may have changed
* are the ones in the path or their siblings .
*/
static void bfq_update_active_tree ( struct rb_node * node )
{
struct rb_node * parent ;
up :
bfq_update_active_node ( node ) ;
parent = rb_parent ( node ) ;
if ( ! parent )
return ;
if ( node = = parent - > rb_left & & parent - > rb_right )
bfq_update_active_node ( parent - > rb_right ) ;
else if ( parent - > rb_left )
bfq_update_active_node ( parent - > rb_left ) ;
node = parent ;
goto up ;
}
/**
* bfq_active_insert - insert an entity in the active tree of its
* group / device .
* @ st : the service tree of the entity .
* @ entity : the entity being inserted .
*
* The active tree is ordered by finish time , but an extra key is kept
* per each node , containing the minimum value for the start times of
* its children ( and the node itself ) , so it ' s possible to search for
* the eligible node with the lowest finish time in logarithmic time .
*/
static void bfq_active_insert ( struct bfq_service_tree * st ,
struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
struct rb_node * node = & entity - > rb_node ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
struct bfq_sched_data * sd = NULL ;
struct bfq_group * bfqg = NULL ;
struct bfq_data * bfqd = NULL ;
# endif
bfq_insert ( & st - > active , entity ) ;
if ( node - > rb_left )
node = node - > rb_left ;
else if ( node - > rb_right )
node = node - > rb_right ;
bfq_update_active_tree ( node ) ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
sd = entity - > sched_data ;
bfqg = container_of ( sd , struct bfq_group , sched_data ) ;
bfqd = ( struct bfq_data * ) bfqg - > bfqd ;
# endif
if ( bfqq )
list_add ( & bfqq - > bfqq_list , & bfqq - > bfqd - > active_list ) ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
if ( bfqg ! = bfqd - > root_group )
bfqg - > active_entities + + ;
# endif
}
/**
* bfq_ioprio_to_weight - calc a weight from an ioprio .
* @ ioprio : the ioprio value to convert .
*/
unsigned short bfq_ioprio_to_weight ( int ioprio )
{
2021-08-11 06:37:01 +03:00
return ( IOPRIO_NR_LEVELS - ioprio ) * BFQ_WEIGHT_CONVERSION_COEFF ;
2017-04-19 17:48:24 +03:00
}
/**
* bfq_weight_to_ioprio - calc an ioprio from a weight .
* @ weight : the weight value to convert .
*
* To preserve as much as possible the old only - ioprio user interface ,
* 0 is used as an escape ioprio value for weights ( numerically ) equal or
2021-08-11 06:37:01 +03:00
* larger than IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF .
2017-04-19 17:48:24 +03:00
*/
static unsigned short bfq_weight_to_ioprio ( int weight )
{
return max_t ( int , 0 ,
2022-01-07 09:58:59 +03:00
IOPRIO_NR_LEVELS - weight / BFQ_WEIGHT_CONVERSION_COEFF ) ;
2017-04-19 17:48:24 +03:00
}
static void bfq_get_entity ( struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
if ( bfqq ) {
bfqq - > ref + + ;
bfq_log_bfqq ( bfqq - > bfqd , bfqq , " get_entity: %p %d " ,
bfqq , bfqq - > ref ) ;
2020-08-11 09:43:40 +03:00
}
2017-04-19 17:48:24 +03:00
}
/**
* bfq_find_deepest - find the deepest node that an extraction can modify .
* @ node : the node being removed .
*
* Do the first step of an extraction in an rb tree , looking for the
* node that will replace @ node , and returning the deepest node that
* the following modifications to the tree can touch . If @ node is the
* last node in the tree return % NULL .
*/
static struct rb_node * bfq_find_deepest ( struct rb_node * node )
{
struct rb_node * deepest ;
if ( ! node - > rb_right & & ! node - > rb_left )
deepest = rb_parent ( node ) ;
else if ( ! node - > rb_right )
deepest = node - > rb_left ;
else if ( ! node - > rb_left )
deepest = node - > rb_right ;
else {
deepest = rb_next ( node ) ;
if ( deepest - > rb_right )
deepest = deepest - > rb_right ;
else if ( rb_parent ( deepest ) ! = node )
deepest = rb_parent ( deepest ) ;
}
return deepest ;
}
/**
* bfq_active_extract - remove an entity from the active tree .
* @ st : the service_tree containing the tree .
* @ entity : the entity being removed .
*/
static void bfq_active_extract ( struct bfq_service_tree * st ,
struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
struct rb_node * node ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
struct bfq_sched_data * sd = NULL ;
struct bfq_group * bfqg = NULL ;
struct bfq_data * bfqd = NULL ;
# endif
node = bfq_find_deepest ( & entity - > rb_node ) ;
bfq_extract ( & st - > active , entity ) ;
if ( node )
bfq_update_active_tree ( node ) ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
sd = entity - > sched_data ;
bfqg = container_of ( sd , struct bfq_group , sched_data ) ;
bfqd = ( struct bfq_data * ) bfqg - > bfqd ;
# endif
if ( bfqq )
list_del ( & bfqq - > bfqq_list ) ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
if ( bfqg ! = bfqd - > root_group )
bfqg - > active_entities - - ;
# endif
}
/**
* bfq_idle_insert - insert an entity into the idle tree .
* @ st : the service tree containing the tree .
* @ entity : the entity to insert .
*/
static void bfq_idle_insert ( struct bfq_service_tree * st ,
struct bfq_entity * entity )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
struct bfq_entity * first_idle = st - > first_idle ;
struct bfq_entity * last_idle = st - > last_idle ;
if ( ! first_idle | | bfq_gt ( first_idle - > finish , entity - > finish ) )
st - > first_idle = entity ;
if ( ! last_idle | | bfq_gt ( entity - > finish , last_idle - > finish ) )
st - > last_idle = entity ;
bfq_insert ( & st - > idle , entity ) ;
if ( bfqq )
list_add ( & bfqq - > bfqq_list , & bfqq - > bfqd - > idle_list ) ;
}
/**
* bfq_forget_entity - do not consider entity any longer for scheduling
* @ st : the service tree .
* @ entity : the entity being removed .
* @ is_in_service : true if entity is currently the in - service entity .
*
* Forget everything about @ entity . In addition , if entity represents
* a queue , and the latter is not in service , then release the service
* reference to the queue ( the one taken through bfq_get_entity ) . In
* fact , in this case , there is really no more service reference to
* the queue , as the latter is also outside any service tree . If ,
* instead , the queue is in service , then __bfq_bfqd_reset_in_service
* will take care of putting the reference when the queue finally
* stops being served .
*/
static void bfq_forget_entity ( struct bfq_service_tree * st ,
struct bfq_entity * entity ,
bool is_in_service )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
2020-02-03 13:40:57 +03:00
entity - > on_st_or_in_serv = false ;
2017-04-19 17:48:24 +03:00
st - > wsum - = entity - > weight ;
2020-08-11 09:43:40 +03:00
if ( bfqq & & ! is_in_service )
2017-04-19 17:48:24 +03:00
bfq_put_queue ( bfqq ) ;
}
/**
* bfq_put_idle_entity - release the idle tree ref of an entity .
* @ st : service tree for the entity .
* @ entity : the entity being released .
*/
void bfq_put_idle_entity ( struct bfq_service_tree * st , struct bfq_entity * entity )
{
bfq_idle_extract ( st , entity ) ;
bfq_forget_entity ( st , entity ,
entity = = entity - > sched_data - > in_service_entity ) ;
}
/**
* bfq_forget_idle - update the idle tree if necessary .
* @ st : the service tree to act upon .
*
* To preserve the global O ( log N ) complexity we only remove one entry here ;
* as the idle tree will not grow indefinitely this can be done safely .
*/
static void bfq_forget_idle ( struct bfq_service_tree * st )
{
struct bfq_entity * first_idle = st - > first_idle ;
struct bfq_entity * last_idle = st - > last_idle ;
if ( RB_EMPTY_ROOT ( & st - > active ) & & last_idle & &
! bfq_gt ( last_idle - > finish , st - > vtime ) ) {
/*
* Forget the whole idle tree , increasing the vtime past
* the last finish time of idle entities .
*/
st - > vtime = last_idle - > finish ;
}
if ( first_idle & & ! bfq_gt ( first_idle - > finish , st - > vtime ) )
bfq_put_idle_entity ( st , first_idle ) ;
}
struct bfq_service_tree * bfq_entity_service_tree ( struct bfq_entity * entity )
{
struct bfq_sched_data * sched_data = entity - > sched_data ;
unsigned int idx = bfq_class_idx ( entity ) ;
return sched_data - > service_tree + idx ;
}
block, bfq: don't change ioprio class for a bfq_queue on a service tree
On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).
This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: Marco Piazza <mpiazza@gmail.com>
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 11:00:10 +03:00
/*
* Update weight and priority of entity . If update_class_too is true ,
* then update the ioprio_class of entity too .
*
* The reason why the update of ioprio_class is controlled through the
* last parameter is as follows . Changing the ioprio class of an
* entity implies changing the destination service trees for that
* entity . If such a change occurred when the entity is already on one
* of the service trees for its previous class , then the state of the
* entity would become more complex : none of the new possible service
* trees for the entity , according to bfq_entity_service_tree ( ) , would
* match any of the possible service trees on which the entity
* is . Complex operations involving these trees , such as entity
* activations and deactivations , should take into account this
* additional complexity . To avoid this issue , this function is
* invoked with update_class_too unset in the points in the code where
* entity may happen to be on some tree .
*/
2017-04-19 17:48:24 +03:00
struct bfq_service_tree *
__bfq_entity_update_weight_prio ( struct bfq_service_tree * old_st ,
block, bfq: don't change ioprio class for a bfq_queue on a service tree
On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).
This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: Marco Piazza <mpiazza@gmail.com>
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 11:00:10 +03:00
struct bfq_entity * entity ,
bool update_class_too )
2017-04-19 17:48:24 +03:00
{
struct bfq_service_tree * new_st = old_st ;
if ( entity - > prio_changed ) {
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
unsigned int prev_weight , new_weight ;
struct bfq_data * bfqd = NULL ;
block, bfq: do not idle for lowest-weight queues
In most cases, it is detrimental for throughput to plug I/O dispatch
when the in-service bfq_queue becomes temporarily empty (plugging is
performed to wait for the possible arrival, soon, of new I/O from the
in-service queue). There is however a case where plugging is needed
for service guarantees. If a bfq_queue, say Q, has a higher weight
than some other active bfq_queue, and is sync, i.e., contains sync
I/O, then, to guarantee that Q does receive a higher share of the
throughput than other lower-weight queues, it is necessary to plug I/O
dispatch when Q remains temporarily empty while being served.
For this reason, BFQ performs I/O plugging when some active bfq_queue
has a higher weight than some other active bfq_queue. But this is
unnecessarily overkill. In fact, if the in-service bfq_queue actually
has a weight lower than or equal to the other queues, then the queue
*must not* be guaranteed a higher share of the throughput than the
other queues. So, not plugging I/O cannot cause any harm to the
queue. And can boost throughput.
Taking advantage of this fact, this commit does not plug I/O for sync
bfq_queues with a weight lower than or equal to the weights of the
other queues. Here is an example of the resulting throughput boost
with the dbench workload, which is particularly nasty for BFQ. With
the dbench test in the Phoronix suite, BFQ reaches its lowest total
throughput with 6 clients on a filesystem with journaling, in case the
journaling daemon has a higher weight than normal processes. Before
this commit, the total throughput was ~80 MB/sec on a PLEXTOR PX-256M5,
after this commit it is ~100 MB/sec.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-12 11:59:28 +03:00
struct rb_root_cached * root ;
2017-04-19 17:48:24 +03:00
# ifdef CONFIG_BFQ_GROUP_IOSCHED
struct bfq_sched_data * sd ;
struct bfq_group * bfqg ;
# endif
if ( bfqq )
bfqd = bfqq - > bfqd ;
# ifdef CONFIG_BFQ_GROUP_IOSCHED
else {
sd = entity - > my_sched_data ;
bfqg = container_of ( sd , struct bfq_group , sched_data ) ;
bfqd = ( struct bfq_data * ) bfqg - > bfqd ;
}
# endif
2019-08-28 06:54:51 +03:00
/* Matches the smp_wmb() in bfq_group_set_weight. */
smp_rmb ( ) ;
2017-04-19 17:48:24 +03:00
old_st - > wsum - = entity - > weight ;
if ( entity - > new_weight ! = entity - > orig_weight ) {
if ( entity - > new_weight < BFQ_MIN_WEIGHT | |
entity - > new_weight > BFQ_MAX_WEIGHT ) {
pr_crit ( " update_weight_prio: new_weight %d \n " ,
entity - > new_weight ) ;
if ( entity - > new_weight < BFQ_MIN_WEIGHT )
entity - > new_weight = BFQ_MIN_WEIGHT ;
else
entity - > new_weight = BFQ_MAX_WEIGHT ;
}
entity - > orig_weight = entity - > new_weight ;
if ( bfqq )
bfqq - > ioprio =
bfq_weight_to_ioprio ( entity - > orig_weight ) ;
}
block, bfq: don't change ioprio class for a bfq_queue on a service tree
On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).
This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: Marco Piazza <mpiazza@gmail.com>
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 11:00:10 +03:00
if ( bfqq & & update_class_too )
2017-04-19 17:48:24 +03:00
bfqq - > ioprio_class = bfqq - > new_ioprio_class ;
block, bfq: don't change ioprio class for a bfq_queue on a service tree
On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).
This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: Marco Piazza <mpiazza@gmail.com>
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 11:00:10 +03:00
/*
* Reset prio_changed only if the ioprio_class change
* is not pending any longer .
*/
if ( ! bfqq | | bfqq - > ioprio_class = = bfqq - > new_ioprio_class )
entity - > prio_changed = 0 ;
2017-04-19 17:48:24 +03:00
/*
* NOTE : here we may be changing the weight too early ,
* this will cause unfairness . The correct approach
* would have required additional complexity to defer
* weight changes to the proper time instants ( i . e . ,
* when entity - > finish < = old_st - > vtime ) .
*/
new_st = bfq_entity_service_tree ( entity ) ;
prev_weight = entity - > weight ;
new_weight = entity - > orig_weight *
( bfqq ? bfqq - > wr_coeff : 1 ) ;
/*
block, bfq: improve asymmetric scenarios detection
bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities. If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request. In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput. Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.
Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
general tend to have more backlog than E).
In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E. In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.
This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.
This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group. Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size. The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.
On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight. So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes. We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness. At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-12 12:55:57 +03:00
* If the weight of the entity changes , and the entity is a
* queue , remove the entity from its old weight counter ( if
* there is a counter associated with the entity ) .
2017-04-19 17:48:24 +03:00
*/
block, bfq: fix asymmetric scenarios detection
Since commit 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection"), a scenario is defined asymmetric when one of the
following conditions holds:
- active bfq_queues have different weights
- one or more group of entities (bfq_queue or other groups of entities)
are active
bfq grants fairness and low latency also in such asymmetric scenarios,
by plugging the dispatching of I/O if the bfq_queue in service happens
to be temporarily idle. This plugging may lower throughput, so it is
important to do it only when strictly needed.
By mistake, in commit '2d29c9f89fcd' ("block, bfq: improve asymmetric
scenarios detection") the num_active_groups counter was firstly
incremented and subsequently decremented at any entity (group or
bfq_queue) weight change.
This is useless, because only transitions from active to inactive and
vice versa matter for that counter. Unfortunately this is also
incorrect in the following case: the entity at issue is a bfq_queue
and it is under weight raising. In fact in this case there is a
spurious increment of the num_active_groups counter.
This spurious increment may cause scenarios to be wrongly detected as
asymmetric, thus causing useless plugging and loss of throughput.
This commit fixes this issue by simply removing the above useless and
wrong increments and decrements.
Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-24 20:13:25 +03:00
if ( prev_weight ! = new_weight & & bfqq ) {
root = & bfqd - > queue_weights_tree ;
__bfq_weights_tree_remove ( bfqd , bfqq , root ) ;
2017-04-19 17:48:24 +03:00
}
entity - > weight = new_weight ;
/*
block, bfq: improve asymmetric scenarios detection
bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities. If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request. In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput. Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.
Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
general tend to have more backlog than E).
In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E. In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.
This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.
This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group. Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size. The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.
On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight. So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes. We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness. At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-12 12:55:57 +03:00
* Add the entity , if it is not a weight - raised queue ,
* to the counter associated with its new weight .
2017-04-19 17:48:24 +03:00
*/
block, bfq: fix asymmetric scenarios detection
Since commit 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection"), a scenario is defined asymmetric when one of the
following conditions holds:
- active bfq_queues have different weights
- one or more group of entities (bfq_queue or other groups of entities)
are active
bfq grants fairness and low latency also in such asymmetric scenarios,
by plugging the dispatching of I/O if the bfq_queue in service happens
to be temporarily idle. This plugging may lower throughput, so it is
important to do it only when strictly needed.
By mistake, in commit '2d29c9f89fcd' ("block, bfq: improve asymmetric
scenarios detection") the num_active_groups counter was firstly
incremented and subsequently decremented at any entity (group or
bfq_queue) weight change.
This is useless, because only transitions from active to inactive and
vice versa matter for that counter. Unfortunately this is also
incorrect in the following case: the entity at issue is a bfq_queue
and it is under weight raising. In fact in this case there is a
spurious increment of the num_active_groups counter.
This spurious increment may cause scenarios to be wrongly detected as
asymmetric, thus causing useless plugging and loss of throughput.
This commit fixes this issue by simply removing the above useless and
wrong increments and decrements.
Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-24 20:13:25 +03:00
if ( prev_weight ! = new_weight & & bfqq & & bfqq - > wr_coeff = = 1 ) {
/* If we get here, root has been initialized. */
bfq_weights_tree_add ( bfqd , bfqq , root ) ;
block, bfq: improve asymmetric scenarios detection
bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities. If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request. In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput. Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.
Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
general tend to have more backlog than E).
In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E. In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.
This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.
This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group. Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size. The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.
On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight. So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes. We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness. At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-12 12:55:57 +03:00
}
2017-04-19 17:48:24 +03:00
new_st - > wsum + = entity - > weight ;
if ( new_st ! = old_st )
entity - > start = new_st - > vtime ;
}
return new_st ;
}
/**
* bfq_bfqq_served - update the scheduler status after selection for
* service .
* @ bfqq : the queue being served .
* @ served : bytes to transfer .
*
* NOTE : this can be optimized , as the timestamps of upper level entities
* are synchronized every time a new bfqq is selected for service . By now ,
* we keep it to better check consistency .
*/
void bfq_bfqq_served ( struct bfq_queue * bfqq , int served )
{
struct bfq_entity * entity = & bfqq - > entity ;
struct bfq_service_tree * st ;
block, bfq: let a queue be merged only shortly after starting I/O
In BFQ and CFQ, two processes are said to be cooperating if they do
I/O in such a way that the union of their I/O requests yields a
sequential I/O pattern. To get such a sequential I/O pattern out of
the non-sequential pattern of each cooperating process, BFQ and CFQ
merge the queues associated with these processes. In more detail,
cooperating processes, and thus their associated queues, usually
start, or restart, to do I/O shortly after each other. This is the
case, e.g., for the I/O threads of KVM/QEMU and of the dump
utility. Basing on this assumption, this commit allows a bfq_queue to
be merged only during a short time interval (100ms) after it starts,
or re-starts, to do I/O. This filtering provides two important
benefits.
First, it greatly reduces the probability that two non-cooperating
processes have their queues merged by mistake, if they just happen to
do I/O close to each other for a short time interval. These spurious
merges cause loss of service guarantees. A low-weight bfq_queue may
unjustly get more than its expected share of the throughput: if such a
low-weight queue is merged with a high-weight queue, then the I/O for
the low-weight queue is served as if the queue had a high weight. This
may damage other high-weight queues unexpectedly. For instance,
because of this issue, lxterminal occasionally took 7.5 seconds to
start, instead of 6.5 seconds, when some sequential readers and
writers did I/O in the background on a FUJITSU MHX2300BT HDD. The
reason is that the bfq_queues associated with some of the readers or
the writers were merged with the high-weight queues of some processes
that had to do some urgent but little I/O. The readers then exploited
the inherited high weight for all or most of their I/O, during the
start-up of terminal. The filtering introduced by this commit
eliminated any outlier caused by spurious queue merges in our start-up
time tests.
This filtering also provides a little boost of the throughput
sustainable by BFQ: 3-4%, depending on the CPU. The reason is that,
once a bfq_queue cannot be merged any longer, this commit makes BFQ
stop updating the data needed to handle merging for the queue.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20 14:38:33 +03:00
if ( ! bfqq - > service_from_backlogged )
bfqq - > first_IO_time = jiffies ;
block, bfq: limit sectors served with interactive weight raising
To maximise responsiveness, BFQ raises the weight, and performs device
idling, for bfq_queues associated with processes deemed as
interactive. In particular, weight raising has a maximum duration,
equal to the time needed to start a large application. If a
weight-raised process goes on doing I/O beyond this maximum duration,
it loses weight-raising.
This mechanism is evidently vulnerable to the following false
positives: I/O-bound applications that will go on doing I/O for much
longer than the duration of weight-raising. These applications have
basically no benefit from being weight-raised at the beginning of
their I/O. On the opposite end, while being weight-raised, these
applications
a) unjustly steal throughput to applications that may truly need
low latency;
b) make BFQ uselessly perform device idling; device idling results
in loss of device throughput with most flash-based storage, and may
increase latencies when used purposelessly.
This commit adds a countermeasure to reduce both the above
problems. To introduce this countermeasure, we provide the following
extra piece of information (full details in the comments added by this
commit). During the start-up of the large application used as a
reference to set the duration of weight-raising, involved processes
transfer at most ~110K sectors each. Accordingly, a process initially
deemed as interactive has no right to be weight-raised any longer,
once transferred 110K sectors or more.
Basing on this consideration, this commit early-ends weight-raising
for a bfq_queue if the latter happens to have received an amount of
service at least equal to 110K sectors (actually, a little bit more,
to keep a safety margin). I/O-bound applications that reach a high
throughput, such as file copy, get to this threshold much before the
allowed weight-raising period finishes. Thus this early ending of
weight-raising reduces the amount of time during which these
applications cause the problems described above.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-13 14:05:18 +03:00
if ( bfqq - > wr_coeff > 1 )
bfqq - > service_from_wr + = served ;
block, bfq: let a queue be merged only shortly after starting I/O
In BFQ and CFQ, two processes are said to be cooperating if they do
I/O in such a way that the union of their I/O requests yields a
sequential I/O pattern. To get such a sequential I/O pattern out of
the non-sequential pattern of each cooperating process, BFQ and CFQ
merge the queues associated with these processes. In more detail,
cooperating processes, and thus their associated queues, usually
start, or restart, to do I/O shortly after each other. This is the
case, e.g., for the I/O threads of KVM/QEMU and of the dump
utility. Basing on this assumption, this commit allows a bfq_queue to
be merged only during a short time interval (100ms) after it starts,
or re-starts, to do I/O. This filtering provides two important
benefits.
First, it greatly reduces the probability that two non-cooperating
processes have their queues merged by mistake, if they just happen to
do I/O close to each other for a short time interval. These spurious
merges cause loss of service guarantees. A low-weight bfq_queue may
unjustly get more than its expected share of the throughput: if such a
low-weight queue is merged with a high-weight queue, then the I/O for
the low-weight queue is served as if the queue had a high weight. This
may damage other high-weight queues unexpectedly. For instance,
because of this issue, lxterminal occasionally took 7.5 seconds to
start, instead of 6.5 seconds, when some sequential readers and
writers did I/O in the background on a FUJITSU MHX2300BT HDD. The
reason is that the bfq_queues associated with some of the readers or
the writers were merged with the high-weight queues of some processes
that had to do some urgent but little I/O. The readers then exploited
the inherited high weight for all or most of their I/O, during the
start-up of terminal. The filtering introduced by this commit
eliminated any outlier caused by spurious queue merges in our start-up
time tests.
This filtering also provides a little boost of the throughput
sustainable by BFQ: 3-4%, depending on the CPU. The reason is that,
once a bfq_queue cannot be merged any longer, this commit makes BFQ
stop updating the data needed to handle merging for the queue.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Angelo Ruocco <angeloruocco90@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-20 14:38:33 +03:00
bfqq - > service_from_backlogged + = served ;
2017-04-19 17:48:24 +03:00
for_each_entity ( entity ) {
st = bfq_entity_service_tree ( entity ) ;
entity - > service + = served ;
st - > vtime + = bfq_delta ( served , st - > wsum ) ;
bfq_forget_idle ( st ) ;
}
bfq_log_bfqq ( bfqq - > bfqd , bfqq , " bfqq_served %d secs " , served ) ;
}
/**
* bfq_bfqq_charge_time - charge an amount of service equivalent to the length
* of the time interval during which bfqq has been in
* service .
* @ bfqd : the device
* @ bfqq : the queue that needs a service update .
* @ time_ms : the amount of time during which the queue has received service
*
* If a queue does not consume its budget fast enough , then providing
* the queue with service fairness may impair throughput , more or less
* severely . For this reason , queues that consume their budget slowly
* are provided with time fairness instead of service fairness . This
* goal is achieved through the BFQ scheduling engine , even if such an
* engine works in the service , and not in the time domain . The trick
* is charging these queues with an inflated amount of service , equal
* to the amount of service that they would have received during their
* service slot if they had been fast , i . e . , if their requests had
* been dispatched at a rate equal to the estimated peak rate .
*
* It is worth noting that time fairness can cause important
* distortions in terms of bandwidth distribution , on devices with
* internal queueing . The reason is that I / O requests dispatched
* during the service slot of a queue may be served after that service
* slot is finished , and may have a total processing time loosely
* correlated with the duration of the service slot . This is
* especially true for short service slots .
*/
void bfq_bfqq_charge_time ( struct bfq_data * bfqd , struct bfq_queue * bfqq ,
unsigned long time_ms )
{
struct bfq_entity * entity = & bfqq - > entity ;
2018-08-16 19:51:18 +03:00
unsigned long timeout_ms = jiffies_to_msecs ( bfq_timeout ) ;
unsigned long bounded_time_ms = min ( time_ms , timeout_ms ) ;
int serv_to_charge_for_time =
( bfqd - > bfq_max_budget * bounded_time_ms ) / timeout_ms ;
int tot_serv_to_charge = max ( serv_to_charge_for_time , entity - > service ) ;
2017-04-19 17:48:24 +03:00
/* Increase budget to avoid inconsistencies */
if ( tot_serv_to_charge > entity - > budget )
entity - > budget = tot_serv_to_charge ;
bfq_bfqq_served ( bfqq ,
max_t ( int , 0 , tot_serv_to_charge - entity - > service ) ) ;
}
static void bfq_update_fin_time_enqueue ( struct bfq_entity * entity ,
struct bfq_service_tree * st ,
bool backshifted )
{
struct bfq_queue * bfqq = bfq_entity_to_bfqq ( entity ) ;
block, bfq: don't change ioprio class for a bfq_queue on a service tree
On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).
This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: Marco Piazza <mpiazza@gmail.com>
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 11:00:10 +03:00
/*
* When this function is invoked , entity is not in any service
* tree , then it is safe to invoke next function with the last
* parameter set ( see the comments on the function ) .
*/
st = __bfq_entity_update_weight_prio ( st , entity , true ) ;
2017-04-19 17:48:24 +03:00
bfq_calc_finish ( entity , entity - > budget ) ;
/*
* If some queues enjoy backshifting for a while , then their
* ( virtual ) finish timestamps may happen to become lower and
* lower than the system virtual time . In particular , if
* these queues often happen to be idle for short time
* periods , and during such time periods other queues with
* higher timestamps happen to be busy , then the backshifted
* timestamps of the former queues can become much lower than
* the system virtual time . In fact , to serve the queues with
* higher timestamps while the ones with lower timestamps are
* idle , the system virtual time may be pushed - up to much
* higher values than the finish timestamps of the idle
* queues . As a consequence , the finish timestamps of all new
* or newly activated queues may end up being much larger than
* those of lucky queues with backshifted timestamps . The
* latter queues may then monopolize the device for a lot of
* time . This would simply break service guarantees .
*
* To reduce this problem , push up a little bit the
* backshifted timestamps of the queue associated with this
* entity ( only a queue can happen to have the backshifted
* flag set ) : just enough to let the finish timestamp of the
* queue be equal to the current value of the system virtual
* time . This may introduce a little unfairness among queues
* with backshifted timestamps , but it does not break
* worst - case fairness guarantees .
*
* As a special case , if bfqq is weight - raised , push up
* timestamps much less , to keep very low the probability that
* this push up causes the backshifted finish timestamps of
* weight - raised queues to become higher than the backshifted
* finish timestamps of non weight - raised queues .
*/
if ( backshifted & & bfq_gt ( st - > vtime , entity - > finish ) ) {
unsigned long delta = st - > vtime - entity - > finish ;
if ( bfqq )
delta / = bfqq - > wr_coeff ;
entity - > start + = delta ;
entity - > finish + = delta ;
}
bfq_active_insert ( st , entity ) ;
}
/**
* __bfq_activate_entity - handle activation of entity .
* @ entity : the entity being activated .
* @ non_blocking_wait_rq : true if entity was waiting for a request
*
* Called for a ' true ' activation , i . e . , if entity is not active and
* one of its children receives a new request .
*
* Basically , this function updates the timestamps of entity and
block, bfq: add/remove entity weights correctly
To keep I/O throughput high as often as possible, BFQ performs
I/O-dispatch plugging (aka device idling) only when beneficial exactly
for throughput, or when needed for service guarantees (low latency,
fairness). An important case where the latter condition holds is when
the scenario is 'asymmetric' in terms of weights: i.e., when some
bfq_queue or whole group of queues has a higher weight, and thus has
to receive more service, than other queues or groups. Without dispatch
plugging, lower-weight queues/groups may unjustly steal bandwidth to
higher-weight queues/groups.
To detect asymmetric scenarios, BFQ checks some sufficient
conditions. One of these conditions is that active groups have
different weights. BFQ controls this condition by maintaining a
special set of unique weights of active groups
(group_weights_tree). To this purpose, in the function
bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
group to/from this set.
Unfortunately, the function bfq_active_extract may happen to be
invoked also for a group that is still active (to preserve the correct
update of the next queue to serve, see comments in function
bfq_no_longer_next_in_service() for details). In this case, removing
the weight of the group makes the set group_weights_tree
inconsistent. Service-guarantee violations follow.
This commit addresses this issue by moving group_weights_tree
insertions from their previous location (in bfq_active_insert) into
the function __bfq_activate_entity, and by moving group_weights_tree
extractions from bfq_active_extract to when the entity that represents
a group remains throughly idle, i.e., with no request either enqueued
or dispatched.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-25 22:55:34 +03:00
* inserts entity into its active tree , after possibly extracting it
2017-04-19 17:48:24 +03:00
* from its idle tree .
*/
static void __bfq_activate_entity ( struct bfq_entity * entity ,
bool non_blocking_wait_rq )
{
struct bfq_service_tree * st = bfq_entity_service_tree ( entity ) ;
bool backshifted = false ;
unsigned long long min_vstart ;
/* See comments on bfq_fqq_update_budg_for_activation */
if ( non_blocking_wait_rq & & bfq_gt ( st - > vtime , entity - > finish ) ) {
backshifted = true ;
min_vstart = entity - > finish ;
} else
min_vstart = st - > vtime ;
if ( entity - > tree = = & st - > idle ) {
/*
* Must be on the idle tree , bfq_idle_extract ( ) will
* check for that .
*/
bfq_idle_extract ( st , entity ) ;
entity - > start = bfq_gt ( min_vstart , entity - > finish ) ?
min_vstart : entity - > finish ;
} else {
/*
* The finish time of the entity may be invalid , and
* it is in the past for sure , otherwise the queue
* would have been on the idle tree .
*/
entity - > start = min_vstart ;
st - > wsum + = entity - > weight ;
/*
* entity is about to be inserted into a service tree ,
* and then set in service : get a reference to make
* sure entity does not disappear until it is no
* longer in service or scheduled for service .
*/
bfq_get_entity ( entity ) ;
2020-02-03 13:40:57 +03:00
entity - > on_st_or_in_serv = true ;
2017-04-19 17:48:24 +03:00
}
2019-03-29 17:01:18 +03:00
# ifdef CONFIG_BFQ_GROUP_IOSCHED
block, bfq: add/remove entity weights correctly
To keep I/O throughput high as often as possible, BFQ performs
I/O-dispatch plugging (aka device idling) only when beneficial exactly
for throughput, or when needed for service guarantees (low latency,
fairness). An important case where the latter condition holds is when
the scenario is 'asymmetric' in terms of weights: i.e., when some
bfq_queue or whole group of queues has a higher weight, and thus has
to receive more service, than other queues or groups. Without dispatch
plugging, lower-weight queues/groups may unjustly steal bandwidth to
higher-weight queues/groups.
To detect asymmetric scenarios, BFQ checks some sufficient
conditions. One of these conditions is that active groups have
different weights. BFQ controls this condition by maintaining a
special set of unique weights of active groups
(group_weights_tree). To this purpose, in the function
bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
group to/from this set.
Unfortunately, the function bfq_active_extract may happen to be
invoked also for a group that is still active (to preserve the correct
update of the next queue to serve, see comments in function
bfq_no_longer_next_in_service() for details). In this case, removing
the weight of the group makes the set group_weights_tree
inconsistent. Service-guarantee violations follow.
This commit addresses this issue by moving group_weights_tree
insertions from their previous location (in bfq_active_insert) into
the function __bfq_activate_entity, and by moving group_weights_tree
extractions from bfq_active_extract to when the entity that represents
a group remains throughly idle, i.e., with no request either enqueued
or dispatched.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-25 22:55:34 +03:00
if ( ! bfq_entity_to_bfqq ( entity ) ) { /* bfq_group */
struct bfq_group * bfqg =
container_of ( entity , struct bfq_group , entity ) ;
block, bfq: improve asymmetric scenarios detection
bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities. If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request. In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput. Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.
Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
general tend to have more backlog than E).
In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E. In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.
This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.
This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group. Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size. The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.
On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight. So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes. We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness. At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-12 12:55:57 +03:00
struct bfq_data * bfqd = bfqg - > bfqd ;
block, bfq: add/remove entity weights correctly
To keep I/O throughput high as often as possible, BFQ performs
I/O-dispatch plugging (aka device idling) only when beneficial exactly
for throughput, or when needed for service guarantees (low latency,
fairness). An important case where the latter condition holds is when
the scenario is 'asymmetric' in terms of weights: i.e., when some
bfq_queue or whole group of queues has a higher weight, and thus has
to receive more service, than other queues or groups. Without dispatch
plugging, lower-weight queues/groups may unjustly steal bandwidth to
higher-weight queues/groups.
To detect asymmetric scenarios, BFQ checks some sufficient
conditions. One of these conditions is that active groups have
different weights. BFQ controls this condition by maintaining a
special set of unique weights of active groups
(group_weights_tree). To this purpose, in the function
bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
group to/from this set.
Unfortunately, the function bfq_active_extract may happen to be
invoked also for a group that is still active (to preserve the correct
update of the next queue to serve, see comments in function
bfq_no_longer_next_in_service() for details). In this case, removing
the weight of the group makes the set group_weights_tree
inconsistent. Service-guarantee violations follow.
This commit addresses this issue by moving group_weights_tree
insertions from their previous location (in bfq_active_insert) into
the function __bfq_activate_entity, and by moving group_weights_tree
extractions from bfq_active_extract to when the entity that represents
a group remains throughly idle, i.e., with no request either enqueued
or dispatched.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-25 22:55:34 +03:00
block, bfq: fix decrement of num_active_groups
Since commit '2d29c9f89fcd ("block, bfq: improve asymmetric scenarios
detection")', if there are process groups with I/O requests waiting for
completion, then BFQ tags the scenario as 'asymmetric'. This detection
is needed for preserving service guarantees (for details, see comments
on the computation * of the variable asymmetric_scenario in the
function bfq_better_to_idle).
Unfortunately, commit '2d29c9f89fcd ("block, bfq: improve asymmetric
scenarios detection")' contains an error exactly in the updating of
the number of groups with I/O requests waiting for completion: if a
group has more than one descendant process, then the above number of
groups, which is renamed from num_active_groups to a more appropriate
num_groups_with_pending_reqs by this commit, may happen to be wrongly
decremented multiple times, namely every time one of the descendant
processes gets all its pending I/O requests completed.
A correct, complete solution should work as follows. Consider a group
that is inactive, i.e., that has no descendant process with pending
I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
is still accounting for this group, because the group still has some
descendant process with some I/O request still in
flight. num_groups_with_pending_reqs should be decremented when the
in-flight request of the last descendant process is finally completed
(assuming that nothing else has changed for the group in the meantime,
in terms of composition of the group and active/inactive state of
child groups and processes). To accomplish this, an additional
pending-request counter must be added to entities, and must be
updated correctly.
To avoid this additional field and operations, this commit resorts to
the following tradeoff between simplicity and accuracy: for an
inactive group that is still counted in num_groups_with_pending_reqs,
this commit decrements num_groups_with_pending_reqs when the first
descendant process of the group remains with no request waiting for
completion.
This simplified scheme provides a fix to the unbalanced decrements
introduced by 2d29c9f89fcd. Since this error was also caused by lack
of comments on this non-trivial issue, this commit also adds related
comments.
Fixes: 2d29c9f89fcd ("block, bfq: improve asymmetric scenarios detection")
Reported-by: Steven Barrett <steven@liquorix.net>
Tested-by: Steven Barrett <steven@liquorix.net>
Tested-by: Lucjan Lucjanov <lucjan.lucjanov@gmail.com>
Reviewed-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-06 21:18:18 +03:00
if ( ! entity - > in_groups_with_pending_reqs ) {
entity - > in_groups_with_pending_reqs = true ;
bfqd - > num_groups_with_pending_reqs + + ;
}
block, bfq: add/remove entity weights correctly
To keep I/O throughput high as often as possible, BFQ performs
I/O-dispatch plugging (aka device idling) only when beneficial exactly
for throughput, or when needed for service guarantees (low latency,
fairness). An important case where the latter condition holds is when
the scenario is 'asymmetric' in terms of weights: i.e., when some
bfq_queue or whole group of queues has a higher weight, and thus has
to receive more service, than other queues or groups. Without dispatch
plugging, lower-weight queues/groups may unjustly steal bandwidth to
higher-weight queues/groups.
To detect asymmetric scenarios, BFQ checks some sufficient
conditions. One of these conditions is that active groups have
different weights. BFQ controls this condition by maintaining a
special set of unique weights of active groups
(group_weights_tree). To this purpose, in the function
bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
group to/from this set.
Unfortunately, the function bfq_active_extract may happen to be
invoked also for a group that is still active (to preserve the correct
update of the next queue to serve, see comments in function
bfq_no_longer_next_in_service() for details). In this case, removing
the weight of the group makes the set group_weights_tree
inconsistent. Service-guarantee violations follow.
This commit addresses this issue by moving group_weights_tree
insertions from their previous location (in bfq_active_insert) into
the function __bfq_activate_entity, and by moving group_weights_tree
extractions from bfq_active_extract to when the entity that represents
a group remains throughly idle, i.e., with no request either enqueued
or dispatched.
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-25 22:55:34 +03:00
}
# endif
2017-04-19 17:48:24 +03:00
bfq_update_fin_time_enqueue ( entity , st , backshifted ) ;
}
/**
* __bfq_requeue_entity - handle requeueing or repositioning of an entity .
* @ entity : the entity being requeued or repositioned .
*
* Requeueing is needed if this entity stops being served , which
* happens if a leaf descendant entity has expired . On the other hand ,
* repositioning is needed if the next_inservice_entity for the child
* entity has changed . See the comments inside the function for
* details .
*
* Basically , this function : 1 ) removes entity from its active tree if
* present there , 2 ) updates the timestamps of entity and 3 ) inserts
* entity back into its active tree ( in the new , right position for
* the new values of the timestamps ) .
*/
static void __bfq_requeue_entity ( struct bfq_entity * entity )
{
struct bfq_sched_data * sd = entity - > sched_data ;
struct bfq_service_tree * st = bfq_entity_service_tree ( entity ) ;
if ( entity = = sd - > in_service_entity ) {
/*
* We are requeueing the current in - service entity ,
* which may have to be done for one of the following
* reasons :
* - entity represents the in - service queue , and the
* in - service queue is being requeued after an
* expiration ;
* - entity represents a group , and its budget has
* changed because one of its child entities has
* just been either activated or requeued for some
* reason ; the timestamps of the entity need then to
* be updated , and the entity needs to be enqueued
* or repositioned accordingly .
*
* In particular , before requeueing , the start time of
* the entity must be moved forward to account for the
* service that the entity has received while in
* service . This is done by the next instructions . The
* finish time will then be updated according to this
* new value of the start time , and to the budget of
* the entity .
*/
bfq_calc_finish ( entity , entity - > service ) ;
entity - > start = entity - > finish ;
/*
* In addition , if the entity had more than one child
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* when set in service , then it was not extracted from
2017-04-19 17:48:24 +03:00
* the active tree . This implies that the position of
* the entity in the active tree may need to be
* changed now , because we have just updated the start
* time of the entity , and we will update its finish
* time in a moment ( the requeueing is then , more
* precisely , a repositioning in this case ) . To
* implement this repositioning , we : 1 ) dequeue the
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* entity here , 2 ) update the finish time and requeue
* the entity according to the new timestamps below .
2017-04-19 17:48:24 +03:00
*/
if ( entity - > tree )
bfq_active_extract ( st , entity ) ;
} else { /* The entity is already active, and not in service */
/*
* In this case , this function gets called only if the
* next_in_service entity below this entity has
* changed , and this change has caused the budget of
* this entity to change , which , finally implies that
* the finish time of this entity must be
* updated . Such an update may cause the scheduling ,
* i . e . , the position in the active tree , of this
* entity to change . We handle this change by : 1 )
* dequeueing the entity here , 2 ) updating the finish
* time and requeueing the entity according to the new
* timestamps below . This is the same approach as the
* non - extracted - entity sub - case above .
*/
bfq_active_extract ( st , entity ) ;
}
bfq_update_fin_time_enqueue ( entity , st , false ) ;
}
static void __bfq_activate_requeue_entity ( struct bfq_entity * entity ,
struct bfq_sched_data * sd ,
bool non_blocking_wait_rq )
{
struct bfq_service_tree * st = bfq_entity_service_tree ( entity ) ;
if ( sd - > in_service_entity = = entity | | entity - > tree = = & st - > active )
/*
* in service or already queued on the active tree ,
* requeue or reposition
*/
__bfq_requeue_entity ( entity ) ;
else
/*
* Not in service and not queued on its active tree :
* the activity is idle and this is a true activation .
*/
__bfq_activate_entity ( entity , non_blocking_wait_rq ) ;
}
/**
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* bfq_activate_requeue_entity - activate or requeue an entity representing a
* bfq_queue , and activate , requeue or reposition
* all ancestors for which such an update becomes
* necessary .
2017-04-19 17:48:24 +03:00
* @ entity : the entity to activate .
* @ non_blocking_wait_rq : true if this entity was waiting for a request
* @ requeue : true if this is a requeue , which implies that bfqq is
* being expired ; thus ALL its ancestors stop being served and must
* therefore be requeued
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* @ expiration : true if this function is being invoked in the expiration path
* of the in - service queue
2017-04-19 17:48:24 +03:00
*/
static void bfq_activate_requeue_entity ( struct bfq_entity * entity ,
bool non_blocking_wait_rq ,
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
bool requeue , bool expiration )
2017-04-19 17:48:24 +03:00
{
struct bfq_sched_data * sd ;
for_each_entity ( entity ) {
sd = entity - > sched_data ;
__bfq_activate_requeue_entity ( entity , sd , non_blocking_wait_rq ) ;
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
if ( ! bfq_update_next_in_service ( sd , entity , expiration ) & &
! requeue )
2017-04-19 17:48:24 +03:00
break ;
}
}
/**
2018-12-06 21:18:19 +03:00
* __bfq_deactivate_entity - update sched_data and service trees for
* entity , so as to represent entity as inactive
* @ entity : the entity being deactivated .
2017-04-19 17:48:24 +03:00
* @ ins_into_idle_tree : if false , the entity will not be put into the
* idle tree .
*
2018-12-06 21:18:19 +03:00
* If necessary and allowed , puts entity into the idle tree . NOTE :
* entity may be on no tree if in service .
2017-04-19 17:48:24 +03:00
*/
bool __bfq_deactivate_entity ( struct bfq_entity * entity , bool ins_into_idle_tree )
{
struct bfq_sched_data * sd = entity - > sched_data ;
2017-05-09 12:37:27 +03:00
struct bfq_service_tree * st ;
bool is_in_service ;
2017-04-19 17:48:24 +03:00
2020-02-03 13:40:57 +03:00
if ( ! entity - > on_st_or_in_serv ) /*
* entity never activated , or
* already inactive
*/
2017-04-19 17:48:24 +03:00
return false ;
2017-05-09 12:37:27 +03:00
/*
* If we get here , then entity is active , which implies that
* bfq_group_set_parent has already been invoked for the group
* represented by entity . Therefore , the field
* entity - > sched_data has been set , and we can safely use it .
*/
st = bfq_entity_service_tree ( entity ) ;
is_in_service = entity = = sd - > in_service_entity ;
block, bfq: correctly charge and reset entity service in all cases
BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.
Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.
In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.
This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-14 17:23:07 +03:00
bfq_calc_finish ( entity , entity - > service ) ;
if ( is_in_service )
block, bfq: reset in_service_entity if it becomes idle
BFQ implements hierarchical scheduling by representing each group of
queues with a generic parent entity. For each parent entity, BFQ
maintains an in_service_entity pointer: if one of the child entities
happens to be in service, in_service_entity points to it. The
resetting of these pointers happens only on queue expirations: when
the in-service queue is expired, i.e., stops to be the queue in
service, BFQ resets all in_service_entity pointers along the
parent-entity path from this queue to the root entity.
Functions handling the scheduling of entities assume, naturally, that
in-service entities are active, i.e., have pending I/O requests (or,
as a special case, even if they have no pending requests, they are
expected to receive a new request very soon, with the scheduler idling
the storage device while waiting for such an event). Unfortunately,
the above resetting scheme of the in_service_entity pointers may cause
this assumption to be violated. For example, the in-service queue may
happen to remain without requests because of a request merge. In this
case the queue does become idle, and all related data structures are
updated accordingly. But in_service_entity still points to the queue
in the parent entity. This inconsistency may even propagate to
higher-level parent entities, if they happen to become idle as well,
as a consequence of the leaf queue becoming idle. For this queue and
parent entities, scheduling functions have an undefined behaviour,
and, as reported, may easily lead to kernel crashes or hangs.
This commit addresses this issue by simply resetting the
in_service_entity field also when it is detected to point to an entity
becoming idle (regardless of why the entity becomes idle).
Reported-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-28 22:41:18 +03:00
sd - > in_service_entity = NULL ;
block, bfq: correctly charge and reset entity service in all cases
BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.
Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.
In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.
This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-09-14 17:23:07 +03:00
else
/*
* Non in - service entity : nobody will take care of
* resetting its service counter on expiration . Do it
* now .
*/
entity - > service = 0 ;
2017-04-19 17:48:24 +03:00
if ( entity - > tree = = & st - > active )
bfq_active_extract ( st , entity ) ;
else if ( ! is_in_service & & entity - > tree = = & st - > idle )
bfq_idle_extract ( st , entity ) ;
if ( ! ins_into_idle_tree | | ! bfq_gt ( entity - > finish , st - > vtime ) )
bfq_forget_entity ( st , entity , is_in_service ) ;
else
bfq_idle_insert ( st , entity ) ;
return true ;
}
/**
* bfq_deactivate_entity - deactivate an entity representing a bfq_queue .
* @ entity : the entity to deactivate .
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* @ ins_into_idle_tree : true if the entity can be put into the idle tree
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* @ expiration : true if this function is being invoked in the expiration path
* of the in - service queue
2017-04-19 17:48:24 +03:00
*/
static void bfq_deactivate_entity ( struct bfq_entity * entity ,
bool ins_into_idle_tree ,
bool expiration )
{
struct bfq_sched_data * sd ;
struct bfq_entity * parent = NULL ;
for_each_entity_safe ( entity , parent ) {
sd = entity - > sched_data ;
if ( ! __bfq_deactivate_entity ( entity , ins_into_idle_tree ) ) {
/*
* entity is not in any tree any more , so
* this deactivation is a no - op , and there is
* nothing to change for upper - level entities
* ( in case of expiration , this can never
* happen ) .
*/
return ;
}
if ( sd - > next_in_service = = entity )
/*
* entity was the next_in_service entity ,
* then , since entity has just been
* deactivated , a new one must be found .
*/
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
bfq_update_next_in_service ( sd , NULL , expiration ) ;
2017-04-19 17:48:24 +03:00
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
if ( sd - > next_in_service | | sd - > in_service_entity ) {
2017-04-19 17:48:24 +03:00
/*
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* The parent entity is still active , because
* either next_in_service or in_service_entity
* is not NULL . So , no further upwards
* deactivation must be performed . Yet ,
* next_in_service has changed . Then the
* schedule does need to be updated upwards .
*
* NOTE If in_service_entity is not NULL , then
* next_in_service may happen to be NULL ,
* although the parent entity is evidently
* active . This happens if 1 ) the entity
* pointed by in_service_entity is the only
* active entity in the parent entity , and 2 )
* according to the definition of
* next_in_service , the in_service_entity
* cannot be considered as
* next_in_service . See the comments on the
* definition of next_in_service for details .
2017-04-19 17:48:24 +03:00
*/
break ;
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
}
2017-04-19 17:48:24 +03:00
/*
* If we get here , then the parent is no more
* backlogged and we need to propagate the
* deactivation upwards . Thus let the loop go on .
*/
/*
* Also let parent be queued into the idle tree on
* deactivation , to preserve service guarantees , and
* assuming that who invoked this function does not
* need parent entities too to be removed completely .
*/
ins_into_idle_tree = true ;
}
/*
* If the deactivation loop is fully executed , then there are
* no more entities to touch and next loop is not executed at
* all . Otherwise , requeue remaining entities if they are
* about to stop receiving service , or reposition them if this
* is not the case .
*/
entity = parent ;
for_each_entity ( entity ) {
/*
* Invoke __bfq_requeue_entity on entity , even if
* already active , to requeue / reposition it in the
* active tree ( because sd - > next_in_service has
* changed )
*/
__bfq_requeue_entity ( entity ) ;
sd = entity - > sched_data ;
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
if ( ! bfq_update_next_in_service ( sd , entity , expiration ) & &
2017-04-19 17:48:24 +03:00
! expiration )
/*
* next_in_service unchanged or not causing
* any change in entity - > parent - > sd , and no
* requeueing needed for expiration : stop
* here .
*/
break ;
}
}
/**
* bfq_calc_vtime_jump - compute the value to which the vtime should jump ,
* if needed , to have at least one entity eligible .
* @ st : the service tree to act upon .
*
* Assumes that st is not empty .
*/
static u64 bfq_calc_vtime_jump ( struct bfq_service_tree * st )
{
struct bfq_entity * root_entity = bfq_root_active_entity ( & st - > active ) ;
if ( bfq_gt ( root_entity - > min_start , st - > vtime ) )
return root_entity - > min_start ;
return st - > vtime ;
}
static void bfq_update_vtime ( struct bfq_service_tree * st , u64 new_value )
{
if ( new_value > st - > vtime ) {
st - > vtime = new_value ;
bfq_forget_idle ( st ) ;
}
}
/**
* bfq_first_active_entity - find the eligible entity with
* the smallest finish time
* @ st : the service tree to select from .
* @ vtime : the system virtual to use as a reference for eligibility
*
* This function searches the first schedulable entity , starting from the
* root of the tree and going on the left every time on this side there is
2017-07-12 10:25:01 +03:00
* a subtree with at least one eligible ( start < = vtime ) entity . The path on
2017-04-19 17:48:24 +03:00
* the right is followed only if a ) the left subtree contains no eligible
* entities and b ) no eligible entity has been found yet .
*/
static struct bfq_entity * bfq_first_active_entity ( struct bfq_service_tree * st ,
u64 vtime )
{
struct bfq_entity * entry , * first = NULL ;
struct rb_node * node = st - > active . rb_node ;
while ( node ) {
entry = rb_entry ( node , struct bfq_entity , rb_node ) ;
left :
if ( ! bfq_gt ( entry - > start , vtime ) )
first = entry ;
if ( node - > rb_left ) {
entry = rb_entry ( node - > rb_left ,
struct bfq_entity , rb_node ) ;
if ( ! bfq_gt ( entry - > min_start , vtime ) ) {
node = node - > rb_left ;
goto left ;
}
}
if ( first )
break ;
node = node - > rb_right ;
}
return first ;
}
/**
* __bfq_lookup_next_entity - return the first eligible entity in @ st .
* @ st : the service tree .
2022-06-18 00:08:59 +03:00
* @ in_service : whether or not there is an in - service entity for the sched_data
* this active tree belongs to .
2017-04-19 17:48:24 +03:00
*
* If there is no in - service entity for the sched_data st belongs to ,
* then return the entity that will be set in service if :
* 1 ) the parent entity this st belongs to is set in service ;
* 2 ) no entity belonging to such parent entity undergoes a state change
* that would influence the timestamps of the entity ( e . g . , becomes idle ,
* becomes backlogged , changes its budget , . . . ) .
*
* In this first case , update the virtual time in @ st too ( see the
* comments on this update inside the function ) .
*
2019-04-08 18:35:34 +03:00
* In contrast , if there is an in - service entity , then return the
2017-04-19 17:48:24 +03:00
* entity that would be set in service if not only the above
* conditions , but also the next one held true : the currently
* in - service entity , on expiration ,
* 1 ) gets a finish time equal to the current one , or
* 2 ) is not eligible any more , or
* 3 ) is idle .
*/
static struct bfq_entity *
__bfq_lookup_next_entity ( struct bfq_service_tree * st , bool in_service )
{
struct bfq_entity * entity ;
u64 new_vtime ;
if ( RB_EMPTY_ROOT ( & st - > active ) )
return NULL ;
/*
* Get the value of the system virtual time for which at
* least one entity is eligible .
*/
new_vtime = bfq_calc_vtime_jump ( st ) ;
/*
* If there is no in - service entity for the sched_data this
* active tree belongs to , then push the system virtual time
* up to the value that guarantees that at least one entity is
* eligible . If , instead , there is an in - service entity , then
* do not make any such update , because there is already an
* eligible entity , namely the in - service one ( even if the
* entity is not on st , because it was extracted when set in
* service ) .
*/
if ( ! in_service )
bfq_update_vtime ( st , new_vtime ) ;
entity = bfq_first_active_entity ( st , new_vtime ) ;
return entity ;
}
/**
* bfq_lookup_next_entity - return the first eligible entity in @ sd .
* @ sd : the sched_data .
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* @ expiration : true if we are on the expiration path of the in - service queue
2017-04-19 17:48:24 +03:00
*
* This function is invoked when there has been a change in the trees
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* for sd , and we need to know what is the new next entity to serve
* after this change .
2017-04-19 17:48:24 +03:00
*/
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
static struct bfq_entity * bfq_lookup_next_entity ( struct bfq_sched_data * sd ,
bool expiration )
2017-04-19 17:48:24 +03:00
{
struct bfq_service_tree * st = sd - > service_tree ;
struct bfq_service_tree * idle_class_st = st + ( BFQ_IOPRIO_CLASSES - 1 ) ;
struct bfq_entity * entity = NULL ;
int class_idx = 0 ;
/*
* Choose from idle class , if needed to guarantee a minimum
* bandwidth to this class ( and if there is some active entity
* in idle class ) . This should also mitigate
* priority - inversion problems in case a low priority task is
* holding file system resources .
*/
if ( time_is_before_jiffies ( sd - > bfq_class_idle_last_service +
BFQ_CL_IDLE_TIMEOUT ) ) {
if ( ! RB_EMPTY_ROOT ( & idle_class_st - > active ) )
class_idx = BFQ_IOPRIO_CLASSES - 1 ;
/* About to be served if backlogged, or not yet backlogged */
sd - > bfq_class_idle_last_service = jiffies ;
}
/*
* Find the next entity to serve for the highest - priority
* class , unless the idle class needs to be served .
*/
for ( ; class_idx < BFQ_IOPRIO_CLASSES ; class_idx + + ) {
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
/*
* If expiration is true , then bfq_lookup_next_entity
* is being invoked as a part of the expiration path
* of the in - service queue . In this case , even if
* sd - > in_service_entity is not NULL ,
2019-04-08 18:35:34 +03:00
* sd - > in_service_entity at this point is actually not
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* in service any more , and , if needed , has already
* been properly queued or requeued into the right
* tree . The reason why sd - > in_service_entity is still
* not NULL here , even if expiration is true , is that
2019-04-08 18:35:34 +03:00
* sd - > in_service_entity is reset as a last step in the
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
* expiration path . So , if expiration is true , tell
* __bfq_lookup_next_entity that there is no
* sd - > in_service_entity .
*/
2017-04-19 17:48:24 +03:00
entity = __bfq_lookup_next_entity ( st + class_idx ,
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
sd - > in_service_entity & &
! expiration ) ;
2017-04-19 17:48:24 +03:00
if ( entity )
break ;
}
return entity ;
}
bool next_queue_may_preempt ( struct bfq_data * bfqd )
{
struct bfq_sched_data * sd = & bfqd - > root_group - > sched_data ;
return sd - > next_in_service ! = sd - > in_service_entity ;
}
/*
* Get next queue for service .
*/
struct bfq_queue * bfq_get_next_queue ( struct bfq_data * bfqd )
{
struct bfq_entity * entity = NULL ;
struct bfq_sched_data * sd ;
struct bfq_queue * bfqq ;
2019-01-29 14:06:29 +03:00
if ( bfq_tot_busy_queues ( bfqd ) = = 0 )
2017-04-19 17:48:24 +03:00
return NULL ;
/*
* Traverse the path from the root to the leaf entity to
* serve . Set in service all the entities visited along the
* way .
*/
sd = & bfqd - > root_group - > sched_data ;
for ( ; sd ; sd = entity - > my_sched_data ) {
/*
* WARNING . We are about to set the in - service entity
* to sd - > next_in_service , i . e . , to the ( cached ) value
* returned by bfq_lookup_next_entity ( sd ) the last
* time it was invoked , i . e . , the last time when the
* service order in sd changed as a consequence of the
* activation or deactivation of an entity . In this
* respect , if we execute bfq_lookup_next_entity ( sd )
* in this very moment , it may , although with low
* probability , yield a different entity than that
* pointed to by sd - > next_in_service . This rare event
* happens in case there was no CLASS_IDLE entity to
* serve for sd when bfq_lookup_next_entity ( sd ) was
* invoked for the last time , while there is now one
* such entity .
*
* If the above event happens , then the scheduling of
* such entity in CLASS_IDLE is postponed until the
* service of the sd - > next_in_service entity
* finishes . In fact , when the latter is expired ,
* bfq_lookup_next_entity ( sd ) gets called again ,
* exactly to update sd - > next_in_service .
*/
/* Make next_in_service entity become in_service_entity */
entity = sd - > next_in_service ;
sd - > in_service_entity = entity ;
/*
* If entity is no longer a candidate for next
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* service , then it must be extracted from its active
* tree , so as to make sure that it won ' t be
* considered when computing next_in_service . See the
* comments on the function
* bfq_no_longer_next_in_service ( ) for details .
2017-04-19 17:48:24 +03:00
*/
if ( bfq_no_longer_next_in_service ( entity ) )
bfq_active_extract ( bfq_entity_service_tree ( entity ) ,
entity ) ;
/*
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* Even if entity is not to be extracted according to
* the above check , a descendant entity may get
* extracted in one of the next iterations of this
* loop . Such an event could cause a change in
* next_in_service for the level of the descendant
* entity , and thus possibly back to this level .
2017-04-19 17:48:24 +03:00
*
block, bfq: consider also in_service_entity to state whether an entity is active
Groups of BFQ queues are represented by generic entities in BFQ. When
a queue belonging to a parent entity is deactivated, the parent entity
may need to be deactivated too, in case the deactivated queue was the
only active queue for the parent entity. This deactivation may need to
be propagated upwards if the entity belongs, in its turn, to a further
higher-level entity, and so on. In particular, the upward propagation
of deactivation stops at the first parent entity that remains active
even if one of its child entities has been deactivated.
To decide whether the last non-deactivation condition holds for a
parent entity, BFQ checks whether the field next_in_service is still
not NULL for the parent entity, after the deactivation of one of its
child entity. If it is not NULL, then there are certainly other active
entities in the parent entity, and deactivations can stop.
Unfortunately, this check misses a corner case: if in_service_entity
is not NULL, then next_in_service may happen to be NULL, although the
parent entity is evidently active. This happens if: 1) the entity
pointed by in_service_entity is the only active entity in the parent
entity, and 2) according to the definition of next_in_service, the
in_service_entity cannot be considered as next_in_service. See the
comments on the definition of next_in_service for details on this
second point.
Hitting the above corner case causes crashes.
To address this issue, this commit:
1) Extends the above check on only next_in_service to controlling both
next_in_service and in_service_entity (if any of them is not NULL,
then no further deactivation is performed)
2) Improves the (important) comments on how next_in_service is defined
and updated; in particular it fixes a few rather obscure paragraphs
Reported-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Reported-by: Rick Yiu <rick_yiu@htc.com>
Reported-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Eric Wheeler <bfq-sched@lists.ewheeler.net>
Tested-by: Rick Yiu <rick_yiu@htc.com>
Tested-by: Laurentiu Nicola <lnicola@dend.ro>
Tested-by: Tom X Nguyen <tom81094@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 13:42:56 +03:00
* However , we cannot perform the resulting needed
* update of next_in_service for this level before the
* end of the whole loop , because , to know which is
* the correct next - to - serve candidate entity for each
* level , we need first to find the leaf entity to set
* in service . In fact , only after we know which is
* the next - to - serve leaf entity , we can discover
* whether the parent entity of the leaf entity
* becomes the next - to - serve , and so on .
2017-04-19 17:48:24 +03:00
*/
}
bfqq = bfq_entity_to_bfqq ( entity ) ;
/*
* We can finally update all next - to - serve entities along the
* path from the leaf entity just set in service to the root .
*/
for_each_entity ( entity ) {
struct bfq_sched_data * sd = entity - > sched_data ;
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
if ( ! bfq_update_next_in_service ( sd , NULL , false ) )
2017-04-19 17:48:24 +03:00
break ;
}
return bfqq ;
}
block, bfq: fix use after free in bfq_bfqq_expire
The function bfq_bfqq_expire() invokes the function
__bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
If this happens, then no other instruction of bfq_bfqq_expire() must
be executed, or a use-after-free will occur.
Basing on the assumption that __bfq_bfqq_expire() invokes
bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
assumed to be freed if its refcounter is equal to one right before
invoking __bfq_bfqq_expire().
But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from
weights tree") this assumption is false. __bfq_bfqq_expire() may also
invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e
("block, bfq: fix queue removal from weights tree"), also
the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
may invoke bfq_put_queue() twice, and this is the actual case where
the in-service queue may happen to be freed.
To address this issue, this commit moves the check on the refcounter
of the queue right around the last bfq_put_queue() that may be invoked
on the queue.
Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree")
Reported-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10 11:38:33 +03:00
/* returns true if the in-service queue gets freed */
bool __bfq_bfqd_reset_in_service ( struct bfq_data * bfqd )
2017-04-19 17:48:24 +03:00
{
struct bfq_queue * in_serv_bfqq = bfqd - > in_service_queue ;
struct bfq_entity * in_serv_entity = & in_serv_bfqq - > entity ;
struct bfq_entity * entity = in_serv_entity ;
bfq_clear_bfqq_wait_request ( in_serv_bfqq ) ;
hrtimer_try_to_cancel ( & bfqd - > idle_slice_timer ) ;
bfqd - > in_service_queue = NULL ;
/*
* When this function is called , all in - service entities have
* been properly deactivated or requeued , so we can safely
* execute the final step : reset in_service_entity along the
* path from entity to the root .
*/
for_each_entity ( entity )
entity - > sched_data - > in_service_entity = NULL ;
/*
* in_serv_entity is no longer in service , so , if it is in no
* service tree either , then release the service reference to
* the queue it represents ( taken with bfq_get_entity ) .
*/
2020-02-03 13:40:57 +03:00
if ( ! in_serv_entity - > on_st_or_in_serv ) {
block, bfq: fix use after free in bfq_bfqq_expire
The function bfq_bfqq_expire() invokes the function
__bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
If this happens, then no other instruction of bfq_bfqq_expire() must
be executed, or a use-after-free will occur.
Basing on the assumption that __bfq_bfqq_expire() invokes
bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
assumed to be freed if its refcounter is equal to one right before
invoking __bfq_bfqq_expire().
But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from
weights tree") this assumption is false. __bfq_bfqq_expire() may also
invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e
("block, bfq: fix queue removal from weights tree"), also
the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
may invoke bfq_put_queue() twice, and this is the actual case where
the in-service queue may happen to be freed.
To address this issue, this commit moves the check on the refcounter
of the queue right around the last bfq_put_queue() that may be invoked
on the queue.
Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree")
Reported-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10 11:38:33 +03:00
/*
* If no process is referencing in_serv_bfqq any
* longer , then the service reference may be the only
* reference to the queue . If this is the case , then
* bfqq gets freed here .
*/
int ref = in_serv_bfqq - > ref ;
2017-04-19 17:48:24 +03:00
bfq_put_queue ( in_serv_bfqq ) ;
block, bfq: fix use after free in bfq_bfqq_expire
The function bfq_bfqq_expire() invokes the function
__bfq_bfqq_expire(), and the latter may free the in-service bfq-queue.
If this happens, then no other instruction of bfq_bfqq_expire() must
be executed, or a use-after-free will occur.
Basing on the assumption that __bfq_bfqq_expire() invokes
bfq_put_queue() on the in-service bfq-queue exactly once, the queue is
assumed to be freed if its refcounter is equal to one right before
invoking __bfq_bfqq_expire().
But, since commit 9dee8b3b057e ("block, bfq: fix queue removal from
weights tree") this assumption is false. __bfq_bfqq_expire() may also
invoke bfq_weights_tree_remove() and, since commit 9dee8b3b057e
("block, bfq: fix queue removal from weights tree"), also
the latter function may invoke bfq_put_queue(). So __bfq_bfqq_expire()
may invoke bfq_put_queue() twice, and this is the actual case where
the in-service queue may happen to be freed.
To address this issue, this commit moves the check on the refcounter
of the queue right around the last bfq_put_queue() that may be invoked
on the queue.
Fixes: 9dee8b3b057e ("block, bfq: fix queue removal from weights tree")
Reported-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Dmitrii Tcvetkov <demfloro@demfloro.ru>
Tested-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-10 11:38:33 +03:00
if ( ref = = 1 )
return true ;
}
return false ;
2017-04-19 17:48:24 +03:00
}
void bfq_deactivate_bfqq ( struct bfq_data * bfqd , struct bfq_queue * bfqq ,
bool ins_into_idle_tree , bool expiration )
{
struct bfq_entity * entity = & bfqq - > entity ;
bfq_deactivate_entity ( entity , ins_into_idle_tree , expiration ) ;
}
void bfq_activate_bfqq ( struct bfq_data * bfqd , struct bfq_queue * bfqq )
{
struct bfq_entity * entity = & bfqq - > entity ;
bfq_activate_requeue_entity ( entity , bfq_bfqq_non_blocking_wait_rq ( bfqq ) ,
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
false , false ) ;
2017-04-19 17:48:24 +03:00
bfq_clear_bfqq_non_blocking_wait_rq ( bfqq ) ;
}
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
void bfq_requeue_bfqq ( struct bfq_data * bfqd , struct bfq_queue * bfqq ,
bool expiration )
2017-04-19 17:48:24 +03:00
{
struct bfq_entity * entity = & bfqq - > entity ;
bfq_activate_requeue_entity ( entity , false ,
block, bfq: make lookup_next_entity push up vtime on expirations
To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).
There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).
Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.
This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-31 09:46:29 +03:00
bfqq = = bfqd - > in_service_queue , expiration ) ;
2017-04-19 17:48:24 +03:00
}
/*
* Called when the bfqq no longer has requests pending , remove it from
* the service tree . As a special case , it can be invoked during an
* expiration .
*/
void bfq_del_bfqq_busy ( struct bfq_data * bfqd , struct bfq_queue * bfqq ,
bool expiration )
{
bfq_log_bfqq ( bfqd , bfqq , " del from busy " ) ;
bfq_clear_bfqq_busy ( bfqq ) ;
2019-01-29 14:06:29 +03:00
bfqd - > busy_queues [ bfqq - > ioprio_class - 1 ] - - ;
2017-04-19 17:48:24 +03:00
if ( bfqq - > wr_coeff > 1 )
bfqd - > wr_busy_queues - - ;
bfqg_stats_update_dequeue ( bfqq_group ( bfqq ) ) ;
bfq_deactivate_bfqq ( bfqd , bfqq , true , expiration ) ;
2019-01-29 14:06:34 +03:00
if ( ! bfqq - > dispatched )
bfq_weights_tree_remove ( bfqd , bfqq ) ;
2017-04-19 17:48:24 +03:00
}
/*
* Called when an inactive queue receives a new request .
*/
void bfq_add_bfqq_busy ( struct bfq_data * bfqd , struct bfq_queue * bfqq )
{
bfq_log_bfqq ( bfqd , bfqq , " add to busy " ) ;
bfq_activate_bfqq ( bfqd , bfqq ) ;
bfq_mark_bfqq_busy ( bfqq ) ;
2019-01-29 14:06:29 +03:00
bfqd - > busy_queues [ bfqq - > ioprio_class - 1 ] + + ;
2017-04-19 17:48:24 +03:00
if ( ! bfqq - > dispatched )
if ( bfqq - > wr_coeff = = 1 )
block, bfq: improve asymmetric scenarios detection
bfq defines as asymmetric a scenario where an active entity, say E
(representing either a single bfq_queue or a group of other entities),
has a higher weight than some other entities. If the entity E does sync
I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
other entities in the following situation: E is in service but
temporarily has no pending I/O request. In fact, without this plugging,
all the times that E stops being temporarily idle, it may find the
internal queues of the storage device already filled with an
out-of-control number of extra requests, from other entities. So E may
have to wait for the service of these extra requests, before finally
having its own requests served. This may easily break service
guarantees, with E getting less than its fair share of the device
throughput. Usually, the end result is that E gets the same fraction of
the throughput as the other entities, instead of getting more, according
to its higher weight.
Yet there are two other more subtle cases where E, even if its weight is
actually equal to or even lower than the weight of any other active
entities, may get less than its fair share of the throughput in case the
above I/O plugging is not performed:
1. other entities issue larger requests than E;
2. other entities contain more active child entities than E (or in
general tend to have more backlog than E).
In the first case, other entities may get more service than E because
they get larger requests, than those of E, served during the temporary
idle periods of E. In the second case, other entities get more service
because, by having many child entities, they have many requests ready
for dispatching while E is temporarily idle.
This commit addresses this issue by extending the definition of
asymmetric scenario: a scenario is asymmetric when
- active entities representing bfq_queues have differentiated weights,
as in the original definition
or (inclusive)
- one or more entities representing groups of entities are active.
This broader definition makes sure that I/O plugging will be performed
in all the above cases, provided that there is at least one active
group. Of course, this definition is very coarse, so it will trigger
I/O plugging also in cases where it is not needed, such as, e.g.,
multiple active entities with just one child each, and all with the same
I/O-request size. The reason for this coarse definition is just that a
finer-grained definition would be rather heavy to compute.
On the opposite end, even this new definition does not trigger I/O
plugging in all cases where there is no active group, and all bfq_queues
have the same weight. So, in these cases some unfairness may occur if
there are asymmetries in I/O-request sizes. We made this choice because
I/O plugging may lower throughput, and probably a user that has not
created any group cares more about throughput than about perfect
fairness. At any rate, as for possible applications that may care about
service guarantees, bfq already guarantees a high responsiveness and a
low latency to soft real-time applications automatically.
Signed-off-by: Federico Motta <federico@willer.it>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-12 12:55:57 +03:00
bfq_weights_tree_add ( bfqd , bfqq ,
2017-04-19 17:48:24 +03:00
& bfqd - > queue_weights_tree ) ;
if ( bfqq - > wr_coeff > 1 )
bfqd - > wr_busy_queues + + ;
2021-03-04 20:46:22 +03:00
/* Move bfqq to the head of the woken list of its waker */
if ( ! hlist_unhashed ( & bfqq - > woken_list_node ) & &
& bfqq - > woken_list_node ! = bfqq - > waker_bfqq - > woken_list . first ) {
hlist_del_init ( & bfqq - > woken_list_node ) ;
hlist_add_head ( & bfqq - > woken_list_node ,
& bfqq - > waker_bfqq - > woken_list ) ;
}
2017-04-19 17:48:24 +03:00
}