Mel Gorman 7a25759eaa cpuidle: Select polling interval based on a c-state with a longer target residency
It was noted that a few workloads that idle rapidly regressed when commit
36fcb4292473 ("cpuidle: use first valid target residency as poll time")
was merged. The workloads in question were heavy communicators that idle
rapidly and were impacted by the c-state exit latency as the active CPUs
were not polling at the time of wakeup. As they were not particularly
realistic workloads, it was not considered to be a major problem.

Unfortunately, a bug was reported for a real workload in a production
environment that relied on large numbers of threads operating in a worker
pool pattern. These threads would idle for periods of time longer than the
C1 target residency and so incurred the c-state exit latency penalty. The
application is very sensitive to wakeup latency and indirectly relying
on behaviour prior to commit on a37b969a61c1 ("cpuidle: poll_state: Add
time limit to poll_idle()") to poll for long enough to avoid the exit
latency cost.

The target residency of C1 is typically very short. On some x86 machines,
it can be as low as 2 microseconds. In poll_idle(), the clock is checked
every POLL_IDLE_RELAX_COUNT interations of cpu_relax() and even one
iteration of that loop can be over 1 microsecond so the polling interval is
very close to the granularity of what poll_idle() can detect. Furthermore,
a basic ping pong workload like perf bench pipe has a longer round-trip
time than the 2 microseconds meaning that the CPU will almost certainly
not be polling when the ping-pong completes.

This patch selects a polling interval based on an enabled c-state that
has an target residency longer than 10usec. If there is no enabled-cstate
then polling will be up to a TICK_NSEC/16 similar to what it was up until
kernel 4.20. Polling for a full tick is unlikely (rescheduling event)
and is much longer than the existing target residencies for a deep c-state.

As an example, consider a CPU with the following c-state information from
an Intel CPU;

	residency	exit_latency
C1	2		2
C1E	20		10
C3	100		33
C6	400		133

The polling interval selected is 20usec. If booted with
intel_idle.max_cstate=1 then the polling interval is 250usec as the deeper
c-states were not available.

On an AMD EPYC machine, the c-state information is more limited and
looks like

	residency	exit_latency
C1	2		1
C2	800		400

The polling interval selected is 250usec. While C2 was considered, the
polling interval was clamped by CPUIDLE_POLL_MAX.

Note that it is not expected that polling will be a universal win. As
well as potentially trading power for performance, the performance is not
guaranteed if the extra polling prevented a turbo state being reached.
Making it a tunable was considered but it's driver-specific, may be
overridden by a governor and is not a guaranteed polling interval making
it difficult to describe without knowledge of the implementation.

tbench4
			     vanilla		    polling
Hmean     1        497.89 (   0.00%)      543.15 *   9.09%*
Hmean     2        975.88 (   0.00%)     1059.73 *   8.59%*
Hmean     4       1953.97 (   0.00%)     2081.37 *   6.52%*
Hmean     8       3645.76 (   0.00%)     4052.95 *  11.17%*
Hmean     16      6882.21 (   0.00%)     6995.93 *   1.65%*
Hmean     32     10752.20 (   0.00%)    10731.53 *  -0.19%*
Hmean     64     12875.08 (   0.00%)    12478.13 *  -3.08%*
Hmean     128    21500.54 (   0.00%)    21098.60 *  -1.87%*
Hmean     256    21253.70 (   0.00%)    21027.18 *  -1.07%*
Hmean     320    20813.50 (   0.00%)    20580.64 *  -1.12%*

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-01 17:59:18 +01:00
2020-10-13 13:04:41 -07:00
2020-11-15 09:46:36 -08:00
2020-11-06 12:54:00 -08:00
2020-11-14 12:04:02 -08:00
2020-11-05 18:19:32 +01:00
2020-10-17 11:18:18 -07:00
2020-11-15 10:02:41 -08:00
2020-11-15 16:44:31 -08:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
No description provided
Readme 5.7 GiB
Languages
C 97.6%
Assembly 1%
Shell 0.5%
Python 0.3%
Makefile 0.3%