cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
// SPDX-License-Identifier: GPL-2.0
/*
* Timer events oriented CPU idle governor
*
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
* TEO governor :
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Copyright ( C ) 2018 - 2021 Intel Corporation
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
* Author : Rafael J . Wysocki < rafael . j . wysocki @ intel . com >
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
*
* Util - awareness mechanism :
* Copyright ( C ) 2022 Arm Ltd .
* Author : Kajetan Puchalski < kajetan . puchalski @ arm . com >
2021-06-02 21:18:02 +03:00
*/
/**
* DOC : teo - description
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
* The idea of this governor is based on the observation that on many systems
* timer events are two or more orders of magnitude more frequent than any
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* other interrupts , so they are likely to be the most significant cause of CPU
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
* wakeups from idle states . Moreover , information about what happened in the
* ( relatively recent ) past can be used to estimate whether or not the deepest
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* idle state with target residency within the ( known ) time till the closest
* timer event , referred to as the sleep length , is likely to be suitable for
* the upcoming CPU idle period and , if not , then which of the shallower idle
* states to choose instead of it .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Of course , non - timer wakeup sources are more important in some use cases
* which can be covered by taking a few most recent idle time intervals of the
* CPU into account . However , even in that context it is not necessary to
* consider idle duration values greater than the sleep length , because the
* closest timer will ultimately wake up the CPU anyway unless it is woken up
* earlier .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Thus this governor estimates whether or not the prospective idle duration of
* a CPU is likely to be significantly shorter than the sleep length and selects
* an idle state for it accordingly .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* The computations carried out by this governor are based on using bins whose
* boundaries are aligned with the target residency parameter values of the CPU
2021-06-02 21:18:02 +03:00
* idle states provided by the % CPUIdle driver in the ascending order . That is ,
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* the first bin spans from 0 up to , but not including , the target residency of
* the second idle state ( idle state 1 ) , the second bin spans from the target
* residency of idle state 1 up to , but not including , the target residency of
* idle state 2 , the third bin spans from the target residency of idle state 2
* up to , but not including , the target residency of idle state 3 and so on .
* The last bin spans from the target residency of the deepest idle state
* supplied by the driver to infinity .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Two metrics called " hits " and " intercepts " are associated with each bin .
* They are updated every time before selecting an idle state for the given CPU
* in accordance with what happened last time .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* The " hits " metric reflects the relative frequency of situations in which the
* sleep length and the idle duration measured after CPU wakeup fall into the
* same bin ( that is , the CPU appears to wake up " on time " relative to the sleep
* length ) . In turn , the " intercepts " metric reflects the relative frequency of
* situations in which the measured idle duration is so much shorter than the
* sleep length that the bin it falls into corresponds to an idle state
2021-06-02 21:17:18 +03:00
* shallower than the one whose bin is fallen into by the sleep length ( these
* situations are referred to as " intercepts " below ) .
*
* In addition to the metrics described above , the governor counts recent
2021-06-02 21:18:02 +03:00
* intercepts ( that is , intercepts that have occurred during the last
* % NR_RECENT invocations of it for the given CPU ) for each bin .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* In order to select an idle state for a CPU , the governor takes the following
* steps ( modulo the possible latency constraint that must be taken into account
* too ) :
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* 1. Find the deepest CPU idle state whose target residency does not exceed
2021-06-02 21:17:18 +03:00
* the current sleep length ( the candidate idle state ) and compute 3 sums as
* follows :
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*
* - The sum of the " hits " and " intercepts " metrics for the candidate state
* and all of the deeper idle states ( it represents the cases in which the
* CPU was idle long enough to avoid being intercepted if the sleep length
* had been equal to the current one ) .
*
* - The sum of the " intercepts " metrics for all of the idle states shallower
* than the candidate one ( it represents the cases in which the CPU was not
* idle long enough to avoid being intercepted if the sleep length had been
* equal to the current one ) .
*
2021-06-02 21:17:18 +03:00
* - The sum of the numbers of recent intercepts for all of the idle states
* shallower than the candidate one .
*
* 2. If the second sum is greater than the first one or the third sum is
2021-06-02 21:18:02 +03:00
* greater than % NR_RECENT / 2 , the CPU is likely to wake up early , so look
2021-06-02 21:17:18 +03:00
* for an alternative idle state to select .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*
* - Traverse the idle states shallower than the candidate one in the
* descending order .
*
2021-06-02 21:17:18 +03:00
* - For each of them compute the sum of the " intercepts " metrics and the sum
* of the numbers of recent intercepts over all of the idle states between
* it and the candidate one ( including the former and excluding the
* latter ) .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*
2021-06-02 21:17:18 +03:00
* - If each of these sums that needs to be taken into account ( because the
* check related to it has indicated that the CPU is likely to wake up
* early ) is greater than a half of the corresponding sum computed in step
* 1 ( which means that the target residency of the state in question had
* not exceeded the idle duration in over a half of the relevant cases ) ,
* select the given idle state instead of the candidate one .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*
2021-06-02 21:17:18 +03:00
* 3. By default , select the candidate state .
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
*
* Util - awareness mechanism :
*
* The idea behind the util - awareness extension is that there are two distinct
* scenarios for the CPU which should result in two different approaches to idle
* state selection - utilized and not utilized .
*
* In this case , ' utilized ' means that the average runqueue util of the CPU is
* above a certain threshold .
*
* When the CPU is utilized while going into idle , more likely than not it will
* be woken up to do more work soon and so a shallower idle state should be
* selected to minimise latency and maximise performance . When the CPU is not
* being utilized , the usual metrics - based approach to selecting the deepest
* available idle state should be preferred to take advantage of the power
* saving .
*
* In order to achieve this , the governor uses a utilization threshold .
* The threshold is computed per - CPU as a percentage of the CPU ' s capacity
* by bit shifting the capacity value . Based on testing , the shift of 6 ( ~ 1.56 % )
* seems to be getting the best results .
*
* Before selecting the next idle state , the governor compares the current CPU
* util to the precomputed util threshold . If it ' s below , it defaults to the
* TEO metrics mechanism . If it ' s above , the closest shallower idle state will
* be selected instead , as long as is not a polling state .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
# include <linux/cpuidle.h>
# include <linux/jiffies.h>
# include <linux/kernel.h>
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
# include <linux/sched.h>
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
# include <linux/sched/clock.h>
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
# include <linux/sched/topology.h>
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
# include <linux/tick.h>
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
/*
* The number of bits to shift the CPU ' s capacity by in order to determine
* the utilized threshold .
*
* 6 was chosen based on testing as the number that achieved the best balance
* of power and performance on average .
*
* The resulting threshold is high enough to not be triggered by background
* noise and low enough to react quickly when activity starts to ramp up .
*/
# define UTIL_THRESHOLD_SHIFT 6
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
* The PULSE value is added to metrics when they grow and the DECAY_SHIFT value
* is used for decreasing metrics on a regular basis .
*/
# define PULSE 1024
# define DECAY_SHIFT 3
/*
* Number of the most recent idle duration values to take into consideration for
2021-06-02 21:17:18 +03:00
* the detection of recent early wakeup patterns .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
2021-06-02 21:17:18 +03:00
# define NR_RECENT 9
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/**
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* struct teo_bin - Metrics used by the TEO cpuidle governor .
* @ intercepts : The " intercepts " metric .
* @ hits : The " hits " metric .
2021-06-02 21:17:18 +03:00
* @ recent : The number of recent " intercepts " .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
struct teo_bin {
unsigned int intercepts ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
unsigned int hits ;
2021-06-02 21:17:18 +03:00
unsigned int recent ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
} ;
/**
* struct teo_cpu - CPU data used by the TEO cpuidle governor .
* @ time_span_ns : Time between idle state selection and post - wakeup update .
* @ sleep_length_ns : Time till the closest timer event ( at the selection time ) .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* @ state_bins : Idle state data bins for this CPU .
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
* @ total : Grand total of the " intercepts " and " hits " metrics for all bins .
2021-06-02 21:17:18 +03:00
* @ next_recent_idx : Index of the next @ recent_idx entry to update .
* @ recent_idx : Indices of bins corresponding to recent " intercepts " .
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
* @ util_threshold : Threshold above which the CPU is considered utilized
* @ utilized : Whether the last sleep on the CPU happened while utilized
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
struct teo_cpu {
2021-03-29 21:21:43 +03:00
s64 time_span_ns ;
s64 sleep_length_ns ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
struct teo_bin state_bins [ CPUIDLE_STATE_MAX ] ;
unsigned int total ;
2021-06-02 21:17:18 +03:00
int next_recent_idx ;
int recent_idx [ NR_RECENT ] ;
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
unsigned long util_threshold ;
bool utilized ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
} ;
static DEFINE_PER_CPU ( struct teo_cpu , teo_cpus ) ;
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
/**
* teo_cpu_is_utilized - Check if the CPU ' s util is above the threshold
* @ cpu : Target CPU
* @ cpu_data : Governor CPU data for the target CPU
*/
# ifdef CONFIG_SMP
static bool teo_cpu_is_utilized ( int cpu , struct teo_cpu * cpu_data )
{
return sched_cpu_util ( cpu ) > cpu_data - > util_threshold ;
}
# else
static bool teo_cpu_is_utilized ( int cpu , struct teo_cpu * cpu_data )
{
return false ;
}
# endif
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/**
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* teo_update - Update CPU metrics after wakeup .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
* @ drv : cpuidle driver containing state data .
* @ dev : Target CPU .
*/
static void teo_update ( struct cpuidle_driver * drv , struct cpuidle_device * dev )
{
struct teo_cpu * cpu_data = per_cpu_ptr ( & teo_cpus , dev - > cpu ) ;
2021-06-02 21:15:10 +03:00
int i , idx_timer = 0 , idx_duration = 0 ;
2019-11-07 17:25:12 +03:00
u64 measured_ns ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
if ( cpu_data - > time_span_ns > = cpu_data - > sleep_length_ns ) {
/*
2019-07-30 13:11:08 +03:00
* One of the safety nets has triggered or the wakeup was close
* enough to the closest timer event expected at the idle state
* selection time to be discarded .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
2019-11-07 17:25:12 +03:00
measured_ns = U64_MAX ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
} else {
2019-11-07 17:25:12 +03:00
u64 lat_ns = drv - > states [ dev - > last_state_idx ] . exit_latency_ns ;
2019-07-04 02:51:27 +03:00
2019-11-12 12:51:16 +03:00
/*
* The computations below are to determine whether or not the
* ( saved ) time till the next timer event and the measured idle
* duration fall into the same " bin " , so use last_residency_ns
* for that instead of time_span_ns which includes the cpuidle
* overhead .
*/
measured_ns = dev - > last_residency_ns ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
* The delay between the wakeup and the first instruction
* executed by the CPU is not likely to be worst - case every
* time , so take 1 / 2 of the exit latency as a very rough
* approximation of the average of it .
*/
2019-11-07 17:25:12 +03:00
if ( measured_ns > = lat_ns )
measured_ns - = lat_ns / 2 ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
else
2019-11-07 17:25:12 +03:00
measured_ns / = 2 ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
cpu_data - > total = 0 ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Decay the " hits " and " intercepts " metrics for all of the bins and
* find the bins that the sleep length and the measured idle duration
* fall into .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
for ( i = 0 ; i < drv - > state_count ; i + + ) {
2021-06-02 21:15:10 +03:00
s64 target_residency_ns = drv - > states [ i ] . target_residency_ns ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
struct teo_bin * bin = & cpu_data - > state_bins [ i ] ;
bin - > hits - = bin - > hits > > DECAY_SHIFT ;
bin - > intercepts - = bin - > intercepts > > DECAY_SHIFT ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
cpu_data - > total + = bin - > hits + bin - > intercepts ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
2021-06-02 21:15:10 +03:00
if ( target_residency_ns < = cpu_data - > sleep_length_ns ) {
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
idx_timer = i ;
2021-06-02 21:15:10 +03:00
if ( target_residency_ns < = measured_ns )
idx_duration = i ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
}
2021-06-02 21:17:18 +03:00
i = cpu_data - > next_recent_idx + + ;
if ( cpu_data - > next_recent_idx > = NR_RECENT )
cpu_data - > next_recent_idx = 0 ;
if ( cpu_data - > recent_idx [ i ] > = 0 )
cpu_data - > state_bins [ cpu_data - > recent_idx [ i ] ] . recent - - ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* If the measured idle duration falls into the same bin as the sleep
* length , this is a " hit " , so update the " hits " metric for that bin .
* Otherwise , update the " intercepts " metric for the bin fallen into by
* the measured idle duration .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
2021-06-02 21:17:18 +03:00
if ( idx_timer = = idx_duration ) {
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
cpu_data - > state_bins [ idx_timer ] . hits + = PULSE ;
2021-06-02 21:17:18 +03:00
cpu_data - > recent_idx [ i ] = - 1 ;
} else {
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
cpu_data - > state_bins [ idx_duration ] . intercepts + = PULSE ;
2021-06-02 21:17:18 +03:00
cpu_data - > state_bins [ idx_duration ] . recent + + ;
cpu_data - > recent_idx [ i ] = idx_duration ;
}
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
cpu_data - > total + = PULSE ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
2019-11-13 03:10:13 +03:00
static bool teo_time_ok ( u64 interval_ns )
{
return ! tick_nohz_tick_stopped ( ) | | interval_ns > = TICK_NSEC ;
}
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
static s64 teo_middle_of_bin ( int idx , struct cpuidle_driver * drv )
{
return ( drv - > states [ idx ] . target_residency_ns +
drv - > states [ idx + 1 ] . target_residency_ns ) / 2 ;
}
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/**
* teo_find_shallower_state - Find shallower idle state matching given duration .
* @ drv : cpuidle driver containing state data .
* @ dev : Target CPU .
* @ state_idx : Index of the capping idle state .
2019-11-07 17:25:12 +03:00
* @ duration_ns : Idle duration value to match .
2023-01-05 17:51:58 +03:00
* @ no_poll : Don ' t consider polling states .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
static int teo_find_shallower_state ( struct cpuidle_driver * drv ,
struct cpuidle_device * dev , int state_idx ,
2023-01-05 17:51:58 +03:00
s64 duration_ns , bool no_poll )
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
{
int i ;
for ( i = state_idx - 1 ; i > = 0 ; i - - ) {
2023-01-05 17:51:58 +03:00
if ( dev - > states_usage [ i ] . disable | |
( no_poll & & drv - > states [ i ] . flags & CPUIDLE_FLAG_POLLING ) )
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
continue ;
state_idx = i ;
2019-11-07 17:25:12 +03:00
if ( drv - > states [ i ] . target_residency_ns < = duration_ns )
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
break ;
}
return state_idx ;
}
/**
* teo_select - Selects the next idle state to enter .
* @ drv : cpuidle driver containing state data .
* @ dev : Target CPU .
* @ stop_tick : Indication on whether or not to stop the scheduler tick .
*/
static int teo_select ( struct cpuidle_driver * drv , struct cpuidle_device * dev ,
bool * stop_tick )
{
struct teo_cpu * cpu_data = per_cpu_ptr ( & teo_cpus , dev - > cpu ) ;
2019-11-07 17:25:12 +03:00
s64 latency_req = cpuidle_governor_latency_req ( dev - > cpu ) ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
unsigned int idx_intercept_sum = 0 ;
unsigned int intercept_sum = 0 ;
2021-06-02 21:17:18 +03:00
unsigned int idx_recent_sum = 0 ;
unsigned int recent_sum = 0 ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
unsigned int idx_hit_sum = 0 ;
unsigned int hit_sum = 0 ;
int constraint_idx = 0 ;
int idx0 = 0 , idx = - 1 ;
2021-06-02 21:17:18 +03:00
bool alt_intercepts , alt_recent ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
ktime_t delta_tick ;
2021-03-29 21:21:43 +03:00
s64 duration_ns ;
2021-06-02 21:15:52 +03:00
int i ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
2019-07-04 02:51:27 +03:00
if ( dev - > last_state_idx > = 0 ) {
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
teo_update ( drv , dev ) ;
2019-07-04 02:51:27 +03:00
dev - > last_state_idx = - 1 ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
cpu_data - > time_span_ns = local_clock ( ) ;
2019-11-07 17:25:12 +03:00
duration_ns = tick_nohz_get_sleep_length ( & delta_tick ) ;
cpu_data - > sleep_length_ns = duration_ns ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
/* Check if there is any choice in the first place. */
if ( drv - > state_count < 2 ) {
2021-06-15 14:49:20 +03:00
idx = 0 ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
goto end ;
}
if ( ! dev - > states_usage [ 0 ] . disable ) {
idx = 0 ;
if ( drv - > states [ 1 ] . target_residency_ns > duration_ns )
goto end ;
}
cpuidle: teo: Fix "early hits" handling for disabled idle states
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "early hits" metric measures the likelihood of a
situation in which the idle duration measured after wakeup falls into
to given bin, but the time till the next timer (sleep length) falls
into a bin corresponding to one of the deeper idle states. It is
used when the "hits" and "misses" metrics indicate that the state
"matching" the sleep length should not be selected, so that the state
with the maximum "early hits" value is selected instead of it.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
As far as the "early hits" metric is concerned, teo_select() tries to
take disabled states into account, but the state index corresponding
to the maximum "early hits" value computed by it may be incorrect.
Namely, it always uses the index of the previous maximum "early hits"
state then, but there may be enabled idle states closer to the
disabled one in question. In particular, if the current candidate
state (whose index is the idx value) is closer to the disabled one
and the "early hits" value of the disabled state is greater than the
current maximum, the index of the current candidate state (idx)
should replace the "maximum early hits state" index.
Modify the code to handle that case correctly.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-11 00:37:39 +03:00
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
cpu_data - > utilized = teo_cpu_is_utilized ( dev - > cpu , cpu_data ) ;
/*
* If the CPU is being utilized over the threshold and there are only 2
* states to choose from , the metrics need not be considered , so choose
* the shallowest non - polling state and exit .
*/
if ( drv - > state_count < 3 & & cpu_data - > utilized ) {
for ( i = 0 ; i < drv - > state_count ; + + i ) {
if ( ! dev - > states_usage [ i ] . disable & &
! ( drv - > states [ i ] . flags & CPUIDLE_FLAG_POLLING ) ) {
idx = i ;
goto end ;
}
}
}
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
/*
* Find the deepest idle state whose target residency does not exceed
* the current sleep length and the deepest idle state not deeper than
* the former whose exit latency does not exceed the current latency
* constraint . Compute the sums of metrics for early wakeup pattern
* detection .
*/
for ( i = 1 ; i < drv - > state_count ; i + + ) {
struct teo_bin * prev_bin = & cpu_data - > state_bins [ i - 1 ] ;
struct cpuidle_state * s = & drv - > states [ i ] ;
cpuidle: teo: Fix "early hits" handling for disabled idle states
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "early hits" metric measures the likelihood of a
situation in which the idle duration measured after wakeup falls into
to given bin, but the time till the next timer (sleep length) falls
into a bin corresponding to one of the deeper idle states. It is
used when the "hits" and "misses" metrics indicate that the state
"matching" the sleep length should not be selected, so that the state
with the maximum "early hits" value is selected instead of it.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
As far as the "early hits" metric is concerned, teo_select() tries to
take disabled states into account, but the state index corresponding
to the maximum "early hits" value computed by it may be incorrect.
Namely, it always uses the index of the previous maximum "early hits"
state then, but there may be enabled idle states closer to the
disabled one in question. In particular, if the current candidate
state (whose index is the idx value) is closer to the disabled one
and the "early hits" value of the disabled state is greater than the
current maximum, the index of the current candidate state (idx)
should replace the "maximum early hits state" index.
Modify the code to handle that case correctly.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-11 00:37:39 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
/*
* Update the sums of idle state mertics for all of the states
* shallower than the current one .
*/
intercept_sum + = prev_bin - > intercepts ;
hit_sum + = prev_bin - > hits ;
2021-06-02 21:17:18 +03:00
recent_sum + = prev_bin - > recent ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
if ( dev - > states_usage [ i ] . disable )
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
continue ;
cpuidle: teo: Consider hits and misses metrics of disabled states
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "hits" and "misses" metrics measure the likelihood
of a situation in which both the time till the next timer (sleep
length) and the idle duration measured after wakeup fall into the
given bin. Namely, if the "hits" value is greater than the "misses"
one, that situation is more likely than the one in which the sleep
length falls into the given bin, but the idle duration measured after
wakeup falls into a bin corresponding to one of the shallower idle
states.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
For this reason, make teo_select() always use the "hits" and "misses"
values of the idle duration range that the sleep length falls into
even if the specific idle state corresponding to it is disabled and
if the "hits" values is greater than the "misses" one, select the
closest enabled shallower idle state in that case.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-11 00:36:15 +03:00
if ( idx < 0 ) {
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
idx = i ; /* first enabled state */
2021-03-29 21:21:43 +03:00
idx0 = i ;
cpuidle: teo: Consider hits and misses metrics of disabled states
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "hits" and "misses" metrics measure the likelihood
of a situation in which both the time till the next timer (sleep
length) and the idle duration measured after wakeup fall into the
given bin. Namely, if the "hits" value is greater than the "misses"
one, that situation is more likely than the one in which the sleep
length falls into the given bin, but the idle duration measured after
wakeup falls into a bin corresponding to one of the shallower idle
states.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
For this reason, make teo_select() always use the "hits" and "misses"
values of the idle duration range that the sleep length falls into
even if the specific idle state corresponding to it is disabled and
if the "hits" values is greater than the "misses" one, select the
closest enabled shallower idle state in that case.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
2019-10-11 00:36:15 +03:00
}
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
2019-11-07 17:25:12 +03:00
if ( s - > target_residency_ns > duration_ns )
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
break ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
idx = i ;
if ( s - > exit_latency_ns < = latency_req )
2019-07-19 13:12:42 +03:00
constraint_idx = i ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
idx_intercept_sum = intercept_sum ;
idx_hit_sum = hit_sum ;
2021-06-02 21:17:18 +03:00
idx_recent_sum = recent_sum ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
}
/* Avoid unnecessary overhead. */
if ( idx < 0 ) {
idx = 0 ; /* No states enabled, must use 0. */
goto end ;
} else if ( idx = = idx0 ) {
goto end ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
/*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* If the sum of the intercepts metric for all of the idle states
* shallower than the current candidate one ( idx ) is greater than the
* sum of the intercepts and hits metrics for the candidate state and
2021-06-02 21:17:18 +03:00
* all of the deeper states , or the sum of the numbers of recent
* intercepts over all of the states shallower than the candidate one
* is greater than a half of the number of recent events taken into
* account , the CPU is likely to wake up early , so find an alternative
* idle state to select .
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
*/
2021-06-02 21:17:18 +03:00
alt_intercepts = 2 * idx_intercept_sum > cpu_data - > total - idx_hit_sum ;
alt_recent = idx_recent_sum > NR_RECENT / 2 ;
if ( alt_recent | | alt_intercepts ) {
2021-07-30 17:38:52 +03:00
s64 first_suitable_span_ns = duration_ns ;
int first_suitable_idx = idx ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
2019-11-13 03:03:24 +03:00
/*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* Look for the deepest idle state whose target residency had
* not exceeded the idle duration in over a half of the relevant
2021-06-02 21:17:18 +03:00
* cases ( both with respect to intercepts overall and with
* respect to the recent intercepts only ) in the past .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*
* Take the possible latency constraint and duration limitation
* present if the tick has been stopped already into account .
2019-11-13 03:03:24 +03:00
*/
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
intercept_sum = 0 ;
2021-06-02 21:17:18 +03:00
recent_sum = 0 ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
2021-07-30 17:38:07 +03:00
for ( i = idx - 1 ; i > = 0 ; i - - ) {
2021-06-02 21:17:18 +03:00
struct teo_bin * bin = & cpu_data - > state_bins [ i ] ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
s64 span_ns ;
2019-11-13 03:03:24 +03:00
2021-06-02 21:17:18 +03:00
intercept_sum + = bin - > intercepts ;
recent_sum + = bin - > recent ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
2021-07-30 17:38:07 +03:00
span_ns = teo_middle_of_bin ( i , drv ) ;
if ( ( ! alt_recent | | 2 * recent_sum > idx_recent_sum ) & &
( ! alt_intercepts | |
2 * intercept_sum > idx_intercept_sum ) ) {
if ( teo_time_ok ( span_ns ) & &
! dev - > states_usage [ i ] . disable ) {
idx = i ;
duration_ns = span_ns ;
} else {
/*
* The current state is too shallow or
* disabled , so take the first enabled
* deeper state with suitable time span .
*/
2021-07-30 17:38:52 +03:00
idx = first_suitable_idx ;
duration_ns = first_suitable_span_ns ;
2021-07-30 17:38:07 +03:00
}
break ;
}
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
if ( dev - > states_usage [ i ] . disable )
continue ;
if ( ! teo_time_ok ( span_ns ) ) {
/*
2021-07-30 17:38:07 +03:00
* The current state is too shallow , but if an
* alternative candidate state has been found ,
* it may still turn out to be a better choice .
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
*/
2021-07-30 17:38:52 +03:00
if ( first_suitable_idx ! = idx )
2021-07-30 17:38:07 +03:00
continue ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
break ;
}
2021-07-30 17:38:52 +03:00
first_suitable_span_ns = span_ns ;
first_suitable_idx = i ;
2019-11-13 03:03:24 +03:00
}
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
2019-07-19 13:12:42 +03:00
/*
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
* If there is a latency constraint , it may be necessary to select an
* idle state shallower than the current candidate one .
2019-07-19 13:12:42 +03:00
*/
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
if ( idx > constraint_idx )
2019-07-19 13:12:42 +03:00
idx = constraint_idx ;
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
/*
* If the CPU is being utilized over the threshold , choose a shallower
* non - polling state to improve latency
*/
if ( cpu_data - > utilized )
idx = teo_find_shallower_state ( drv , dev , idx , duration_ns , true ) ;
cpuidle: teo: Change the main idle state selection logic
Two aspects of the current main idle state selection logic in the
TEO (Timer Events Oriented) cpuidle governor are quite questionable.
First of all, the "hits" and "misses" metrics used by it are only
updated for a given idle state if the time till the next timer event
("sleep length") is between the target residency of that state and
the target residency of the next one. Consequently, they are likely
to become stale if the sleep length tends to fall outside that
interval which increases the likelihood of subomtimal idle state
selection.
Second, the decision on whether or not to select the idle state
"matching" the sleep length is based on the metrics collected for
that state alone, whereas in principle the metrics collected for
the other idle states should be taken into consideration when that
decision is made. For example, if the measured idle duration is less
than the target residency of the idle state "matching" the sleep
length, then it is also less than the target residency of any deeper
idle state and that should be taken into account when considering
whether or not to select any of those states, but currently it is
not.
In order to address the above shortcomings, modify the main idle
state selection logic in the TEO governor to take the metrics
collected for all of the idle states into account when deciding
whether or not to select the one "matching" the sleep length.
Moreover, drop the "misses" metric that becomes redundant after the
above change and rename the "early_hits" metric to "intercepts" so
that its role is better reflected by its name (the idea being that
if a CPU wakes up earlier than indicated by the sleep length, then
it must be a result of a non-timer interrupt that "intercepts" the
CPU).
Also rename the states[] array in struct struct teo_cpu to
state_bins[] to avoid confusing it with the states[] array in
struct cpuidle_driver and update the documentation to match the
new code (and make it more comprehensive while at it).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-02 21:16:32 +03:00
end :
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
* Don ' t stop the tick if the selected state is a polling one or if the
* expected idle duration is shorter than the tick period length .
*/
if ( ( ( drv - > states [ idx ] . flags & CPUIDLE_FLAG_POLLING ) | |
2019-11-07 17:25:12 +03:00
duration_ns < TICK_NSEC ) & & ! tick_nohz_tick_stopped ( ) ) {
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
* stop_tick = false ;
/*
* The tick is not going to be stopped , so if the target
* residency of the state to be returned is not within the time
* till the closest timer including the tick , try to correct
* that .
*/
2021-03-29 21:21:43 +03:00
if ( idx > idx0 & &
drv - > states [ idx ] . target_residency_ns > delta_tick )
2023-01-05 17:51:58 +03:00
idx = teo_find_shallower_state ( drv , dev , idx , delta_tick , false ) ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
}
return idx ;
}
/**
* teo_reflect - Note that governor data for the CPU need to be updated .
* @ dev : Target CPU .
* @ state : Entered state .
*/
static void teo_reflect ( struct cpuidle_device * dev , int state )
{
struct teo_cpu * cpu_data = per_cpu_ptr ( & teo_cpus , dev - > cpu ) ;
2019-07-04 02:51:27 +03:00
dev - > last_state_idx = state ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
/*
* If the wakeup was not " natural " , but triggered by one of the safety
* nets , assume that the CPU might have been idle for the entire sleep
* length time .
*/
if ( dev - > poll_time_limit | |
( tick_nohz_idle_got_tick ( ) & & cpu_data - > sleep_length_ns > TICK_NSEC ) ) {
dev - > poll_time_limit = false ;
cpu_data - > time_span_ns = cpu_data - > sleep_length_ns ;
} else {
cpu_data - > time_span_ns = local_clock ( ) - cpu_data - > time_span_ns ;
}
}
/**
* teo_enable_device - Initialize the governor ' s data for the target CPU .
* @ drv : cpuidle driver ( not used ) .
* @ dev : Target CPU .
*/
static int teo_enable_device ( struct cpuidle_driver * drv ,
struct cpuidle_device * dev )
{
struct teo_cpu * cpu_data = per_cpu_ptr ( & teo_cpus , dev - > cpu ) ;
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
unsigned long max_capacity = arch_scale_cpu_capacity ( dev - > cpu ) ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
int i ;
memset ( cpu_data , 0 , sizeof ( * cpu_data ) ) ;
cpuidle: teo: Introduce util-awareness
Modern interactive systems, such as recent Android phones, tend to have
power efficient shallow idle states. Selecting deeper idle states on a
device while a latency-sensitive workload is running can adversely
impact performance due to increased latency. Additionally, if the CPU
wakes up from a deeper sleep before its target residency as is often the
case, it results in a waste of energy on top of that.
At the moment, none of the available idle governors take any scheduling
information into account. They also tend to overestimate the idle
duration quite often, which causes them to select excessively deep idle
states, thus leading to increased wakeup latency and lower performance
with no power saving. For 'menu' while web browsing on Android for
instance, those types of wakeups ('too deep') account for over 24% of
all wakeups.
At the same time, on some platforms idle state 0 can be power efficient
enough to warrant wanting to prefer it over idle state 1. This is
because the power usage of the two states can be so close that
sufficient amounts of too deep state 1 sleeps can completely offset the
state 1 power saving to the point where it would've been more power
efficient to just use state 0 instead. This is, of course, for systems
where state 0 is not a polling state, such as arm-based devices.
Sleeps that happened in state 0 while they could have used state 1 ('too
shallow') only save less power than they otherwise could have. Too deep
sleeps, on the other hand, harm performance and nullify the potential
power saving from using state 1 in the first place. While taking this
into account, it is clear that on balance it is preferable for an idle
governor to have more too shallow sleeps instead of more too deep sleeps
on those kinds of platforms.
This patch specifically tunes TEO to prefer shallower idle states in
order to reduce wakeup latency and achieve better performance.
To this end, before selecting the next idle state it uses the avg_util
signal of a CPU's runqueue in order to determine to what extent the CPU
is being utilized. This util value is then compared to a threshold
defined as a percentage of the CPU's capacity (capacity >> 6 ie. ~1.5%
in the current implementation). If the util is above the threshold, the
index of the idle state selected by TEO metrics will be reduced by 1,
thus selecting a shallower state. If the util is below the threshold,
the governor defaults to the TEO metrics mechanism to try to select the
deepest available idle state based on the closest timer event and its
own correctness.
The main goal of this is to reduce latency and increase performance for
some workloads. Under some workloads it will result in an increase in
power usage (Geekbench 5) while for other workloads it will also result
in a decrease in power usage compared to TEO (PCMark Web, Jankbench,
Speedometer).
It can provide drastically decreased latency and performance benefits in
certain types of workloads that are sensitive to latency.
Example test results:
1. GB5 (better score, latency & more power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean score | 2826.5 (0.0%) | 2764.8 (-2.18%) | 2865 (1.36%) |
| gmean power usage [mW] | 2551.4 (0.0%) | 2606.8 (2.17%) | 2722.3 (6.7%) |
| gmean too deep % | 14.99% | 9.65% | 4.02% |
| gmean too shallow % | 2.5% | 5.96% | 14.59% |
| gmean task wakeup latency (asynctask) | 78.16μs (0.0%) | 61.60μs (-21.19%) | 54.45μs (-30.34%) |
2. Jankbench (better score, latency & less power usage)
| metric | menu | teo | teo-util-aware |
| ------------------------------------- | -------------- | ----------------- | ----------------- |
| gmean frame duration | 13.9 (0.0%) | 14.7 (6.0%) | 12.6 (-9.0%) |
| gmean jank percentage | 1.5 (0.0%) | 2.1 (36.99%) | 1.3 (-17.37%) |
| gmean power usage [mW] | 144.6 (0.0%) | 136.9 (-5.27%) | 121.3 (-16.08%) |
| gmean too deep % | 26.00% | 11.00% | 2.54% |
| gmean too shallow % | 4.74% | 11.89% | 21.93% |
| gmean wakeup latency (RenderThread) | 139.5μs (0.0%) | 116.5μs (-16.49%) | 91.11μs (-34.7%) |
| gmean wakeup latency (surfaceflinger) | 124.0μs (0.0%) | 151.9μs (22.47%) | 87.65μs (-29.33%) |
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
[ rjw: Comment edits and white space adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-05 17:51:59 +03:00
cpu_data - > util_threshold = max_capacity > > UTIL_THRESHOLD_SHIFT ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
2021-06-02 21:17:18 +03:00
for ( i = 0 ; i < NR_RECENT ; i + + )
cpu_data - > recent_idx [ i ] = - 1 ;
cpuidle: New timer events oriented governor for tickless systems
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2019-01-04 14:30:47 +03:00
return 0 ;
}
static struct cpuidle_governor teo_governor = {
. name = " teo " ,
. rating = 19 ,
. enable = teo_enable_device ,
. select = teo_select ,
. reflect = teo_reflect ,
} ;
static int __init teo_governor_init ( void )
{
return cpuidle_register_governor ( & teo_governor ) ;
}
postcore_initcall ( teo_governor_init ) ;