e9a83bd232
- A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8 raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF 3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY= =D0eO -----END PGP SIGNATURE----- Merge tag 'docs-5.3' of git://git.lwn.net/linux Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
186 lines
6.8 KiB
ReStructuredText
186 lines
6.8 KiB
ReStructuredText
==========================
|
|
Real-Time group scheduling
|
|
==========================
|
|
|
|
.. CONTENTS
|
|
|
|
0. WARNING
|
|
1. Overview
|
|
1.1 The problem
|
|
1.2 The solution
|
|
2. The interface
|
|
2.1 System-wide settings
|
|
2.2 Default behaviour
|
|
2.3 Basis for grouping tasks
|
|
3. Future plans
|
|
|
|
|
|
0. WARNING
|
|
==========
|
|
|
|
Fiddling with these settings can result in an unstable system, the knobs are
|
|
root only and assumes root knows what he is doing.
|
|
|
|
Most notable:
|
|
|
|
* very small values in sched_rt_period_us can result in an unstable
|
|
system when the period is smaller than either the available hrtimer
|
|
resolution, or the time it takes to handle the budget refresh itself.
|
|
|
|
* very small values in sched_rt_runtime_us can result in an unstable
|
|
system when the runtime is so small the system has difficulty making
|
|
forward progress (NOTE: the migration thread and kstopmachine both
|
|
are real-time processes).
|
|
|
|
1. Overview
|
|
===========
|
|
|
|
|
|
1.1 The problem
|
|
---------------
|
|
|
|
Realtime scheduling is all about determinism, a group has to be able to rely on
|
|
the amount of bandwidth (eg. CPU time) being constant. In order to schedule
|
|
multiple groups of realtime tasks, each group must be assigned a fixed portion
|
|
of the CPU time available. Without a minimum guarantee a realtime group can
|
|
obviously fall short. A fuzzy upper limit is of no use since it cannot be
|
|
relied upon. Which leaves us with just the single fixed portion.
|
|
|
|
1.2 The solution
|
|
----------------
|
|
|
|
CPU time is divided by means of specifying how much time can be spent running
|
|
in a given period. We allocate this "run time" for each realtime group which
|
|
the other realtime groups will not be permitted to use.
|
|
|
|
Any time not allocated to a realtime group will be used to run normal priority
|
|
tasks (SCHED_OTHER). Any allocated run time not used will also be picked up by
|
|
SCHED_OTHER.
|
|
|
|
Let's consider an example: a frame fixed realtime renderer must deliver 25
|
|
frames a second, which yields a period of 0.04s per frame. Now say it will also
|
|
have to play some music and respond to input, leaving it with around 80% CPU
|
|
time dedicated for the graphics. We can then give this group a run time of 0.8
|
|
* 0.04s = 0.032s.
|
|
|
|
This way the graphics group will have a 0.04s period with a 0.032s run time
|
|
limit. Now if the audio thread needs to refill the DMA buffer every 0.005s, but
|
|
needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
|
|
0.00015s. So this group can be scheduled with a period of 0.005s and a run time
|
|
of 0.00015s.
|
|
|
|
The remaining CPU time will be used for user input and other tasks. Because
|
|
realtime tasks have explicitly allocated the CPU time they need to perform
|
|
their tasks, buffer underruns in the graphics or audio can be eliminated.
|
|
|
|
NOTE: the above example is not fully implemented yet. We still
|
|
lack an EDF scheduler to make non-uniform periods usable.
|
|
|
|
|
|
2. The Interface
|
|
================
|
|
|
|
|
|
2.1 System wide settings
|
|
------------------------
|
|
|
|
The system wide settings are configured under the /proc virtual file system:
|
|
|
|
/proc/sys/kernel/sched_rt_period_us:
|
|
The scheduling period that is equivalent to 100% CPU bandwidth
|
|
|
|
/proc/sys/kernel/sched_rt_runtime_us:
|
|
A global limit on how much time realtime scheduling may use. Even without
|
|
CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime
|
|
processes. With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth
|
|
available to all realtime groups.
|
|
|
|
* Time is specified in us because the interface is s32. This gives an
|
|
operating range from 1us to about 35 minutes.
|
|
* sched_rt_period_us takes values from 1 to INT_MAX.
|
|
* sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
|
|
* A run time of -1 specifies runtime == period, ie. no limit.
|
|
|
|
|
|
2.2 Default behaviour
|
|
---------------------
|
|
|
|
The default values for sched_rt_period_us (1000000 or 1s) and
|
|
sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by
|
|
SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
|
|
realtime tasks will not lock up the machine but leave a little time to recover
|
|
it. By setting runtime to -1 you'd get the old behaviour back.
|
|
|
|
By default all bandwidth is assigned to the root group and new groups get the
|
|
period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
|
|
want to assign bandwidth to another group, reduce the root group's bandwidth
|
|
and assign some or all of the difference to another group.
|
|
|
|
Realtime group scheduling means you have to assign a portion of total CPU
|
|
bandwidth to the group before it will accept realtime tasks. Therefore you will
|
|
not be able to run realtime tasks as any user other than root until you have
|
|
done that, even if the user has the rights to run processes with realtime
|
|
priority!
|
|
|
|
|
|
2.3 Basis for grouping tasks
|
|
----------------------------
|
|
|
|
Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
|
|
CPU bandwidth to task groups.
|
|
|
|
This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
|
|
to control the CPU time reserved for each control group.
|
|
|
|
For more information on working with control groups, you should read
|
|
Documentation/cgroup-v1/cgroups.rst as well.
|
|
|
|
Group settings are checked against the following limits in order to keep the
|
|
configuration schedulable:
|
|
|
|
\Sum_{i} runtime_{i} / global_period <= global_runtime / global_period
|
|
|
|
For now, this can be simplified to just the following (but see Future plans):
|
|
|
|
\Sum_{i} runtime_{i} <= global_runtime
|
|
|
|
|
|
3. Future plans
|
|
===============
|
|
|
|
There is work in progress to make the scheduling period for each group
|
|
("<cgroup>/cpu.rt_period_us") configurable as well.
|
|
|
|
The constraint on the period is that a subgroup must have a smaller or
|
|
equal period to its parent. But realistically its not very useful _yet_
|
|
as its prone to starvation without deadline scheduling.
|
|
|
|
Consider two sibling groups A and B; both have 50% bandwidth, but A's
|
|
period is twice the length of B's.
|
|
|
|
* group A: period=100000us, runtime=50000us
|
|
|
|
- this runs for 0.05s once every 0.1s
|
|
|
|
* group B: period= 50000us, runtime=25000us
|
|
|
|
- this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
|
|
|
|
This means that currently a while (1) loop in A will run for the full period of
|
|
B and can starve B's tasks (assuming they are of lower priority) for a whole
|
|
period.
|
|
|
|
The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring
|
|
full deadline scheduling to the linux kernel. Deadline scheduling the above
|
|
groups and treating end of the period as a deadline will ensure that they both
|
|
get their allocated time.
|
|
|
|
Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
|
|
the biggest challenge as the current linux PI infrastructure is geared towards
|
|
the limited static priority levels 0-99. With deadline scheduling you need to
|
|
do deadline inheritance (since priority is inversely proportional to the
|
|
deadline delta (deadline - now)).
|
|
|
|
This means the whole PI machinery will have to be reworked - and that is one of
|
|
the most complex pieces of code we have.
|