Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (61 commits)
  sched: refine negative nice level granularity
  sched: fix update_stats_enqueue() reniced codepath
  sched: round a bit better
  sched: make the multiplication table more accurate
  sched: optimize update_rq_clock() calls in the load-balancer
  sched: optimize activate_task()
  sched: clean up set_curr_task_fair()
  sched: remove __update_rq_clock() call from entity_tick()
  sched: move the __update_rq_clock() call to scheduler_tick()
  sched debug: remove the 'u64 now' parameter from print_task()/_rq()
  sched: remove the 'u64 now' local variables
  sched: remove the 'u64 now' parameter from deactivate_task()
  sched: remove the 'u64 now' parameter from dequeue_task()
  sched: remove the 'u64 now' parameter from enqueue_task()
  sched: remove the 'u64 now' parameter from dec_nr_running()
  sched: remove the 'u64 now' parameter from inc_nr_running()
  sched: remove the 'u64 now' parameter from dec_load()
  sched: remove the 'u64 now' parameter from inc_load()
  sched: remove the 'u64 now' parameter from update_curr_load()
  sched: remove the 'u64 now' parameter from ->task_new()
  ...
This commit is contained in:
Linus Torvalds 2007-08-09 08:23:31 -07:00
commit be12014dd7
8 changed files with 422 additions and 335 deletions

View File

@ -83,7 +83,7 @@ Some implementation details:
CFS uses nanosecond granularity accounting and does not rely on any CFS uses nanosecond granularity accounting and does not rely on any
jiffies or other HZ detail. Thus the CFS scheduler has no notion of jiffies or other HZ detail. Thus the CFS scheduler has no notion of
'timeslices' and has no heuristics whatsoever. There is only one 'timeslices' and has no heuristics whatsoever. There is only one
central tunable: central tunable (you have to switch on CONFIG_SCHED_DEBUG):
/proc/sys/kernel/sched_granularity_ns /proc/sys/kernel/sched_granularity_ns

View File

@ -0,0 +1,108 @@
This document explains the thinking about the revamped and streamlined
nice-levels implementation in the new Linux scheduler.
Nice levels were always pretty weak under Linux and people continuously
pestered us to make nice +19 tasks use up much less CPU time.
Unfortunately that was not that easy to implement under the old
scheduler, (otherwise we'd have done it long ago) because nice level
support was historically coupled to timeslice length, and timeslice
units were driven by the HZ tick, so the smallest timeslice was 1/HZ.
In the O(1) scheduler (in 2003) we changed negative nice levels to be
much stronger than they were before in 2.4 (and people were happy about
that change), and we also intentionally calibrated the linear timeslice
rule so that nice +19 level would be _exactly_ 1 jiffy. To better
understand it, the timeslice graph went like this (cheesy ASCII art
alert!):
A
\ | [timeslice length]
\ |
\ |
\ |
\ |
\|___100msecs
|^ . _
| ^ . _
| ^ . _
-*----------------------------------*-----> [nice level]
-20 | +19
|
|
So that if someone wanted to really renice tasks, +19 would give a much
bigger hit than the normal linear rule would do. (The solution of
changing the ABI to extend priorities was discarded early on.)
This approach worked to some degree for some time, but later on with
HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which
we felt to be a bit excessive. Excessive _not_ because it's too small of
a CPU utilization, but because it causes too frequent (once per
millisec) rescheduling. (and would thus trash the cache, etc. Remember,
this was long ago when hardware was weaker and caches were smaller, and
people were running number crunching apps at nice +19.)
So for HZ=1000 we changed nice +19 to 5msecs, because that felt like the
right minimal granularity - and this translates to 5% CPU utilization.
But the fundamental HZ-sensitive property for nice+19 still remained,
and we never got a single complaint about nice +19 being too _weak_ in
terms of CPU utilization, we only got complaints about it (still) being
too _strong_ :-)
To sum it up: we always wanted to make nice levels more consistent, but
within the constraints of HZ and jiffies and their nasty design level
coupling to timeslices and granularity it was not really viable.
The second (less frequent but still periodically occuring) complaint
about Linux's nice level support was its assymetry around the origo
(which you can see demonstrated in the picture above), or more
accurately: the fact that nice level behavior depended on the _absolute_
nice level as well, while the nice API itself is fundamentally
"relative":
int nice(int inc);
asmlinkage long sys_nice(int increment)
(the first one is the glibc API, the second one is the syscall API.)
Note that the 'inc' is relative to the current nice level. Tools like
bash's "nice" command mirror this relative API.
With the old scheduler, if you for example started a niced task with +1
and another task with +2, the CPU split between the two tasks would
depend on the nice level of the parent shell - if it was at nice -10 the
CPU split was different than if it was at +5 or +10.
A third complaint against Linux's nice level support was that negative
nice levels were not 'punchy enough', so lots of people had to resort to
run audio (and other multimedia) apps under RT priorities such as
SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation
proof, and a buggy SCHED_FIFO app can also lock up the system for good.
The new scheduler in v2.6.23 addresses all three types of complaints:
To address the first complaint (of nice levels being not "punchy"
enough), the scheduler was decoupled from 'time slice' and HZ concepts
(and granularity was made a separate concept from nice levels) and thus
it was possible to implement better and more consistent nice +19
support: with the new scheduler nice +19 tasks get a HZ-independent
1.5%, instead of the variable 3%-5%-9% range they got in the old
scheduler.
To address the second complaint (of nice levels not being consistent),
the new scheduler makes nice(1) have the same CPU utilization effect on
tasks, regardless of their absolute nice levels. So on the new
scheduler, running a nice +10 and a nice 11 task has the same CPU
utilization "split" between them as running a nice -5 and a nice -4
task. (one will get 55% of the CPU, the other 45%.) That is why nice
levels were changed to be "multiplicative" (or exponential) - that way
it does not matter which nice level you start out from, the 'relative
result' will always be the same.
The third complaint (of negative nice levels not being "punchy" enough
and forcing audio apps to run under the more dangerous SCHED_FIFO
scheduling policy) is addressed by the new scheduler almost
automatically: stronger negative nice levels are an automatic
side-effect of the recalibrated dynamic range of nice levels.

View File

@ -139,7 +139,7 @@ struct cfs_rq;
extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m); extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m);
extern void proc_sched_set_task(struct task_struct *p); extern void proc_sched_set_task(struct task_struct *p);
extern void extern void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now); print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
#else #else
static inline void static inline void
proc_sched_show_task(struct task_struct *p, struct seq_file *m) proc_sched_show_task(struct task_struct *p, struct seq_file *m)
@ -149,7 +149,7 @@ static inline void proc_sched_set_task(struct task_struct *p)
{ {
} }
static inline void static inline void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now) print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{ {
} }
#endif #endif
@ -855,26 +855,24 @@ struct sched_domain;
struct sched_class { struct sched_class {
struct sched_class *next; struct sched_class *next;
void (*enqueue_task) (struct rq *rq, struct task_struct *p, void (*enqueue_task) (struct rq *rq, struct task_struct *p, int wakeup);
int wakeup, u64 now); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int sleep);
void (*dequeue_task) (struct rq *rq, struct task_struct *p,
int sleep, u64 now);
void (*yield_task) (struct rq *rq, struct task_struct *p); void (*yield_task) (struct rq *rq, struct task_struct *p);
void (*check_preempt_curr) (struct rq *rq, struct task_struct *p); void (*check_preempt_curr) (struct rq *rq, struct task_struct *p);
struct task_struct * (*pick_next_task) (struct rq *rq, u64 now); struct task_struct * (*pick_next_task) (struct rq *rq);
void (*put_prev_task) (struct rq *rq, struct task_struct *p, u64 now); void (*put_prev_task) (struct rq *rq, struct task_struct *p);
int (*load_balance) (struct rq *this_rq, int this_cpu, unsigned long (*load_balance) (struct rq *this_rq, int this_cpu,
struct rq *busiest, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *total_load_moved); int *all_pinned, int *this_best_prio);
void (*set_curr_task) (struct rq *rq); void (*set_curr_task) (struct rq *rq);
void (*task_tick) (struct rq *rq, struct task_struct *p); void (*task_tick) (struct rq *rq, struct task_struct *p);
void (*task_new) (struct rq *rq, struct task_struct *p, u64 now); void (*task_new) (struct rq *rq, struct task_struct *p);
}; };
struct load_weight { struct load_weight {

View File

@ -318,15 +318,19 @@ static inline int cpu_of(struct rq *rq)
} }
/* /*
* Per-runqueue clock, as finegrained as the platform can give us: * Update the per-runqueue clock, as finegrained as the platform can give
* us, but without assuming monotonicity, etc.:
*/ */
static unsigned long long __rq_clock(struct rq *rq) static void __update_rq_clock(struct rq *rq)
{ {
u64 prev_raw = rq->prev_clock_raw; u64 prev_raw = rq->prev_clock_raw;
u64 now = sched_clock(); u64 now = sched_clock();
s64 delta = now - prev_raw; s64 delta = now - prev_raw;
u64 clock = rq->clock; u64 clock = rq->clock;
#ifdef CONFIG_SCHED_DEBUG
WARN_ON_ONCE(cpu_of(rq) != smp_processor_id());
#endif
/* /*
* Protect against sched_clock() occasionally going backwards: * Protect against sched_clock() occasionally going backwards:
*/ */
@ -349,18 +353,12 @@ static unsigned long long __rq_clock(struct rq *rq)
rq->prev_clock_raw = now; rq->prev_clock_raw = now;
rq->clock = clock; rq->clock = clock;
return clock;
} }
static inline unsigned long long rq_clock(struct rq *rq) static void update_rq_clock(struct rq *rq)
{ {
int this_cpu = smp_processor_id(); if (likely(smp_processor_id() == cpu_of(rq)))
__update_rq_clock(rq);
if (this_cpu == cpu_of(rq))
return __rq_clock(rq);
return rq->clock;
} }
/* /*
@ -386,9 +384,12 @@ unsigned long long cpu_clock(int cpu)
{ {
unsigned long long now; unsigned long long now;
unsigned long flags; unsigned long flags;
struct rq *rq;
local_irq_save(flags); local_irq_save(flags);
now = rq_clock(cpu_rq(cpu)); rq = cpu_rq(cpu);
update_rq_clock(rq);
now = rq->clock;
local_irq_restore(flags); local_irq_restore(flags);
return now; return now;
@ -637,6 +638,11 @@ static u64 div64_likely32(u64 divident, unsigned long divisor)
#define WMULT_SHIFT 32 #define WMULT_SHIFT 32
/*
* Shift right and round:
*/
#define RSR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
static unsigned long static unsigned long
calc_delta_mine(unsigned long delta_exec, unsigned long weight, calc_delta_mine(unsigned long delta_exec, unsigned long weight,
struct load_weight *lw) struct load_weight *lw)
@ -644,18 +650,17 @@ calc_delta_mine(unsigned long delta_exec, unsigned long weight,
u64 tmp; u64 tmp;
if (unlikely(!lw->inv_weight)) if (unlikely(!lw->inv_weight))
lw->inv_weight = WMULT_CONST / lw->weight; lw->inv_weight = (WMULT_CONST - lw->weight/2) / lw->weight + 1;
tmp = (u64)delta_exec * weight; tmp = (u64)delta_exec * weight;
/* /*
* Check whether we'd overflow the 64-bit multiplication: * Check whether we'd overflow the 64-bit multiplication:
*/ */
if (unlikely(tmp > WMULT_CONST)) { if (unlikely(tmp > WMULT_CONST))
tmp = ((tmp >> WMULT_SHIFT/2) * lw->inv_weight) tmp = RSR(RSR(tmp, WMULT_SHIFT/2) * lw->inv_weight,
>> (WMULT_SHIFT/2); WMULT_SHIFT/2);
} else { else
tmp = (tmp * lw->inv_weight) >> WMULT_SHIFT; tmp = RSR(tmp * lw->inv_weight, WMULT_SHIFT);
}
return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX); return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);
} }
@ -703,11 +708,14 @@ static void update_load_sub(struct load_weight *lw, unsigned long dec)
* the relative distance between them is ~25%.) * the relative distance between them is ~25%.)
*/ */
static const int prio_to_weight[40] = { static const int prio_to_weight[40] = {
/* -20 */ 88818, 71054, 56843, 45475, 36380, 29104, 23283, 18626, 14901, 11921, /* -20 */ 88761, 71755, 56483, 46273, 36291,
/* -10 */ 9537, 7629, 6103, 4883, 3906, 3125, 2500, 2000, 1600, 1280, /* -15 */ 29154, 23254, 18705, 14949, 11916,
/* 0 */ NICE_0_LOAD /* 1024 */, /* -10 */ 9548, 7620, 6100, 4904, 3906,
/* 1 */ 819, 655, 524, 419, 336, 268, 215, 172, 137, /* -5 */ 3121, 2501, 1991, 1586, 1277,
/* 10 */ 110, 87, 70, 56, 45, 36, 29, 23, 18, 15, /* 0 */ 1024, 820, 655, 526, 423,
/* 5 */ 335, 272, 215, 172, 137,
/* 10 */ 110, 87, 70, 56, 45,
/* 15 */ 36, 29, 23, 18, 15,
}; };
/* /*
@ -718,14 +726,14 @@ static const int prio_to_weight[40] = {
* into multiplications: * into multiplications:
*/ */
static const u32 prio_to_wmult[40] = { static const u32 prio_to_wmult[40] = {
/* -20 */ 48356, 60446, 75558, 94446, 118058, /* -20 */ 48388, 59856, 76040, 92818, 118348,
/* -15 */ 147573, 184467, 230589, 288233, 360285, /* -15 */ 147320, 184698, 229616, 287308, 360437,
/* -10 */ 450347, 562979, 703746, 879575, 1099582, /* -10 */ 449829, 563644, 704093, 875809, 1099582,
/* -5 */ 1374389, 1717986, 2147483, 2684354, 3355443, /* -5 */ 1376151, 1717300, 2157191, 2708050, 3363326,
/* 0 */ 4194304, 5244160, 6557201, 8196502, 10250518, /* 0 */ 4194304, 5237765, 6557202, 8165337, 10153587,
/* 5 */ 12782640, 16025997, 19976592, 24970740, 31350126, /* 5 */ 12820798, 15790321, 19976592, 24970740, 31350126,
/* 10 */ 39045157, 49367440, 61356675, 76695844, 95443717, /* 10 */ 39045157, 49367440, 61356676, 76695844, 95443717,
/* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153, /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153,
}; };
static void activate_task(struct rq *rq, struct task_struct *p, int wakeup); static void activate_task(struct rq *rq, struct task_struct *p, int wakeup);
@ -745,8 +753,7 @@ static int balance_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *load_moved, int *all_pinned, unsigned long *load_moved,
int this_best_prio, int best_prio, int best_prio_seen, int *this_best_prio, struct rq_iterator *iterator);
struct rq_iterator *iterator);
#include "sched_stats.h" #include "sched_stats.h"
#include "sched_rt.c" #include "sched_rt.c"
@ -782,14 +789,14 @@ static void __update_curr_load(struct rq *rq, struct load_stat *ls)
* This function is called /before/ updating rq->ls.load * This function is called /before/ updating rq->ls.load
* and when switching tasks. * and when switching tasks.
*/ */
static void update_curr_load(struct rq *rq, u64 now) static void update_curr_load(struct rq *rq)
{ {
struct load_stat *ls = &rq->ls; struct load_stat *ls = &rq->ls;
u64 start; u64 start;
start = ls->load_update_start; start = ls->load_update_start;
ls->load_update_start = now; ls->load_update_start = rq->clock;
ls->delta_stat += now - start; ls->delta_stat += rq->clock - start;
/* /*
* Stagger updates to ls->delta_fair. Very frequent updates * Stagger updates to ls->delta_fair. Very frequent updates
* can be expensive. * can be expensive.
@ -798,30 +805,28 @@ static void update_curr_load(struct rq *rq, u64 now)
__update_curr_load(rq, ls); __update_curr_load(rq, ls);
} }
static inline void static inline void inc_load(struct rq *rq, const struct task_struct *p)
inc_load(struct rq *rq, const struct task_struct *p, u64 now)
{ {
update_curr_load(rq, now); update_curr_load(rq);
update_load_add(&rq->ls.load, p->se.load.weight); update_load_add(&rq->ls.load, p->se.load.weight);
} }
static inline void static inline void dec_load(struct rq *rq, const struct task_struct *p)
dec_load(struct rq *rq, const struct task_struct *p, u64 now)
{ {
update_curr_load(rq, now); update_curr_load(rq);
update_load_sub(&rq->ls.load, p->se.load.weight); update_load_sub(&rq->ls.load, p->se.load.weight);
} }
static void inc_nr_running(struct task_struct *p, struct rq *rq, u64 now) static void inc_nr_running(struct task_struct *p, struct rq *rq)
{ {
rq->nr_running++; rq->nr_running++;
inc_load(rq, p, now); inc_load(rq, p);
} }
static void dec_nr_running(struct task_struct *p, struct rq *rq, u64 now) static void dec_nr_running(struct task_struct *p, struct rq *rq)
{ {
rq->nr_running--; rq->nr_running--;
dec_load(rq, p, now); dec_load(rq, p);
} }
static void set_load_weight(struct task_struct *p) static void set_load_weight(struct task_struct *p)
@ -848,18 +853,16 @@ static void set_load_weight(struct task_struct *p)
p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO]; p->se.load.inv_weight = prio_to_wmult[p->static_prio - MAX_RT_PRIO];
} }
static void static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
enqueue_task(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
{ {
sched_info_queued(p); sched_info_queued(p);
p->sched_class->enqueue_task(rq, p, wakeup, now); p->sched_class->enqueue_task(rq, p, wakeup);
p->se.on_rq = 1; p->se.on_rq = 1;
} }
static void static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
dequeue_task(struct rq *rq, struct task_struct *p, int sleep, u64 now)
{ {
p->sched_class->dequeue_task(rq, p, sleep, now); p->sched_class->dequeue_task(rq, p, sleep);
p->se.on_rq = 0; p->se.on_rq = 0;
} }
@ -914,13 +917,11 @@ static int effective_prio(struct task_struct *p)
*/ */
static void activate_task(struct rq *rq, struct task_struct *p, int wakeup) static void activate_task(struct rq *rq, struct task_struct *p, int wakeup)
{ {
u64 now = rq_clock(rq);
if (p->state == TASK_UNINTERRUPTIBLE) if (p->state == TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible--; rq->nr_uninterruptible--;
enqueue_task(rq, p, wakeup, now); enqueue_task(rq, p, wakeup);
inc_nr_running(p, rq, now); inc_nr_running(p, rq);
} }
/* /*
@ -928,13 +929,13 @@ static void activate_task(struct rq *rq, struct task_struct *p, int wakeup)
*/ */
static inline void activate_idle_task(struct task_struct *p, struct rq *rq) static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
{ {
u64 now = rq_clock(rq); update_rq_clock(rq);
if (p->state == TASK_UNINTERRUPTIBLE) if (p->state == TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible--; rq->nr_uninterruptible--;
enqueue_task(rq, p, 0, now); enqueue_task(rq, p, 0);
inc_nr_running(p, rq, now); inc_nr_running(p, rq);
} }
/* /*
@ -942,13 +943,11 @@ static inline void activate_idle_task(struct task_struct *p, struct rq *rq)
*/ */
static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep) static void deactivate_task(struct rq *rq, struct task_struct *p, int sleep)
{ {
u64 now = rq_clock(rq);
if (p->state == TASK_UNINTERRUPTIBLE) if (p->state == TASK_UNINTERRUPTIBLE)
rq->nr_uninterruptible++; rq->nr_uninterruptible++;
dequeue_task(rq, p, sleep, now); dequeue_task(rq, p, sleep);
dec_nr_running(p, rq, now); dec_nr_running(p, rq);
} }
/** /**
@ -1516,6 +1515,7 @@ out_set_cpu:
out_activate: out_activate:
#endif /* CONFIG_SMP */ #endif /* CONFIG_SMP */
update_rq_clock(rq);
activate_task(rq, p, 1); activate_task(rq, p, 1);
/* /*
* Sync wakeups (i.e. those types of wakeups where the waker * Sync wakeups (i.e. those types of wakeups where the waker
@ -1647,12 +1647,11 @@ void fastcall wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
unsigned long flags; unsigned long flags;
struct rq *rq; struct rq *rq;
int this_cpu; int this_cpu;
u64 now;
rq = task_rq_lock(p, &flags); rq = task_rq_lock(p, &flags);
BUG_ON(p->state != TASK_RUNNING); BUG_ON(p->state != TASK_RUNNING);
this_cpu = smp_processor_id(); /* parent's CPU */ this_cpu = smp_processor_id(); /* parent's CPU */
now = rq_clock(rq); update_rq_clock(rq);
p->prio = effective_prio(p); p->prio = effective_prio(p);
@ -1666,8 +1665,8 @@ void fastcall wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
* Let the scheduling class do new task startup * Let the scheduling class do new task startup
* management (if any): * management (if any):
*/ */
p->sched_class->task_new(rq, p, now); p->sched_class->task_new(rq, p);
inc_nr_running(p, rq, now); inc_nr_running(p, rq);
} }
check_preempt_curr(rq, p); check_preempt_curr(rq, p);
task_rq_unlock(rq, &flags); task_rq_unlock(rq, &flags);
@ -1954,7 +1953,6 @@ static void update_cpu_load(struct rq *this_rq)
unsigned long total_load = this_rq->ls.load.weight; unsigned long total_load = this_rq->ls.load.weight;
unsigned long this_load = total_load; unsigned long this_load = total_load;
struct load_stat *ls = &this_rq->ls; struct load_stat *ls = &this_rq->ls;
u64 now = __rq_clock(this_rq);
int i, scale; int i, scale;
this_rq->nr_load_updates++; this_rq->nr_load_updates++;
@ -1962,7 +1960,7 @@ static void update_cpu_load(struct rq *this_rq)
goto do_avg; goto do_avg;
/* Update delta_fair/delta_exec fields first */ /* Update delta_fair/delta_exec fields first */
update_curr_load(this_rq, now); update_curr_load(this_rq);
fair_delta64 = ls->delta_fair + 1; fair_delta64 = ls->delta_fair + 1;
ls->delta_fair = 0; ls->delta_fair = 0;
@ -1970,8 +1968,8 @@ static void update_cpu_load(struct rq *this_rq)
exec_delta64 = ls->delta_exec + 1; exec_delta64 = ls->delta_exec + 1;
ls->delta_exec = 0; ls->delta_exec = 0;
sample_interval64 = now - ls->load_update_last; sample_interval64 = this_rq->clock - ls->load_update_last;
ls->load_update_last = now; ls->load_update_last = this_rq->clock;
if ((s64)sample_interval64 < (s64)TICK_NSEC) if ((s64)sample_interval64 < (s64)TICK_NSEC)
sample_interval64 = TICK_NSEC; sample_interval64 = TICK_NSEC;
@ -2026,6 +2024,8 @@ static void double_rq_lock(struct rq *rq1, struct rq *rq2)
spin_lock(&rq1->lock); spin_lock(&rq1->lock);
} }
} }
update_rq_clock(rq1);
update_rq_clock(rq2);
} }
/* /*
@ -2166,8 +2166,7 @@ static int balance_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *load_moved, int *all_pinned, unsigned long *load_moved,
int this_best_prio, int best_prio, int best_prio_seen, int *this_best_prio, struct rq_iterator *iterator)
struct rq_iterator *iterator)
{ {
int pulled = 0, pinned = 0, skip_for_load; int pulled = 0, pinned = 0, skip_for_load;
struct task_struct *p; struct task_struct *p;
@ -2192,12 +2191,8 @@ next:
*/ */
skip_for_load = (p->se.load.weight >> 1) > rem_load_move + skip_for_load = (p->se.load.weight >> 1) > rem_load_move +
SCHED_LOAD_SCALE_FUZZ; SCHED_LOAD_SCALE_FUZZ;
if (skip_for_load && p->prio < this_best_prio) if ((skip_for_load && p->prio >= *this_best_prio) ||
skip_for_load = !best_prio_seen && p->prio == best_prio;
if (skip_for_load ||
!can_migrate_task(p, busiest, this_cpu, sd, idle, &pinned)) { !can_migrate_task(p, busiest, this_cpu, sd, idle, &pinned)) {
best_prio_seen |= p->prio == best_prio;
p = iterator->next(iterator->arg); p = iterator->next(iterator->arg);
goto next; goto next;
} }
@ -2211,8 +2206,8 @@ next:
* and the prescribed amount of weighted load. * and the prescribed amount of weighted load.
*/ */
if (pulled < max_nr_move && rem_load_move > 0) { if (pulled < max_nr_move && rem_load_move > 0) {
if (p->prio < this_best_prio) if (p->prio < *this_best_prio)
this_best_prio = p->prio; *this_best_prio = p->prio;
p = iterator->next(iterator->arg); p = iterator->next(iterator->arg);
goto next; goto next;
} }
@ -2231,32 +2226,52 @@ out:
} }
/* /*
* move_tasks tries to move up to max_nr_move tasks and max_load_move weighted * move_tasks tries to move up to max_load_move weighted load from busiest to
* load from busiest to this_rq, as part of a balancing operation within * this_rq, as part of a balancing operation within domain "sd".
* "domain". Returns the number of tasks moved. * Returns 1 if successful and 0 otherwise.
* *
* Called with both runqueues locked. * Called with both runqueues locked.
*/ */
static int move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest, static int move_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned) int *all_pinned)
{ {
struct sched_class *class = sched_class_highest; struct sched_class *class = sched_class_highest;
unsigned long load_moved, total_nr_moved = 0, nr_moved; unsigned long total_load_moved = 0;
long rem_load_move = max_load_move; int this_best_prio = this_rq->curr->prio;
do { do {
nr_moved = class->load_balance(this_rq, this_cpu, busiest, total_load_moved +=
max_nr_move, (unsigned long)rem_load_move, class->load_balance(this_rq, this_cpu, busiest,
sd, idle, all_pinned, &load_moved); ULONG_MAX, max_load_move - total_load_moved,
total_nr_moved += nr_moved; sd, idle, all_pinned, &this_best_prio);
max_nr_move -= nr_moved;
rem_load_move -= load_moved;
class = class->next; class = class->next;
} while (class && max_nr_move && rem_load_move > 0); } while (class && max_load_move > total_load_moved);
return total_nr_moved; return total_load_moved > 0;
}
/*
* move_one_task tries to move exactly one task from busiest to this_rq, as
* part of active balancing operations within "domain".
* Returns 1 if successful and 0 otherwise.
*
* Called with both runqueues locked.
*/
static int move_one_task(struct rq *this_rq, int this_cpu, struct rq *busiest,
struct sched_domain *sd, enum cpu_idle_type idle)
{
struct sched_class *class;
int this_best_prio = MAX_PRIO;
for (class = sched_class_highest; class; class = class->next)
if (class->load_balance(this_rq, this_cpu, busiest,
1, ULONG_MAX, sd, idle, NULL,
&this_best_prio))
return 1;
return 0;
} }
/* /*
@ -2588,11 +2603,6 @@ find_busiest_queue(struct sched_group *group, enum cpu_idle_type idle,
*/ */
#define MAX_PINNED_INTERVAL 512 #define MAX_PINNED_INTERVAL 512
static inline unsigned long minus_1_or_zero(unsigned long n)
{
return n > 0 ? n - 1 : 0;
}
/* /*
* Check this_cpu to ensure it is balanced within domain. Attempt to move * Check this_cpu to ensure it is balanced within domain. Attempt to move
* tasks if there is an imbalance. * tasks if there is an imbalance.
@ -2601,7 +2611,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *balance) int *balance)
{ {
int nr_moved, all_pinned = 0, active_balance = 0, sd_idle = 0; int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0;
struct sched_group *group; struct sched_group *group;
unsigned long imbalance; unsigned long imbalance;
struct rq *busiest; struct rq *busiest;
@ -2642,18 +2652,17 @@ redo:
schedstat_add(sd, lb_imbalance[idle], imbalance); schedstat_add(sd, lb_imbalance[idle], imbalance);
nr_moved = 0; ld_moved = 0;
if (busiest->nr_running > 1) { if (busiest->nr_running > 1) {
/* /*
* Attempt to move tasks. If find_busiest_group has found * Attempt to move tasks. If find_busiest_group has found
* an imbalance but busiest->nr_running <= 1, the group is * an imbalance but busiest->nr_running <= 1, the group is
* still unbalanced. nr_moved simply stays zero, so it is * still unbalanced. ld_moved simply stays zero, so it is
* correctly treated as an imbalance. * correctly treated as an imbalance.
*/ */
local_irq_save(flags); local_irq_save(flags);
double_rq_lock(this_rq, busiest); double_rq_lock(this_rq, busiest);
nr_moved = move_tasks(this_rq, this_cpu, busiest, ld_moved = move_tasks(this_rq, this_cpu, busiest,
minus_1_or_zero(busiest->nr_running),
imbalance, sd, idle, &all_pinned); imbalance, sd, idle, &all_pinned);
double_rq_unlock(this_rq, busiest); double_rq_unlock(this_rq, busiest);
local_irq_restore(flags); local_irq_restore(flags);
@ -2661,7 +2670,7 @@ redo:
/* /*
* some other cpu did the load balance for us. * some other cpu did the load balance for us.
*/ */
if (nr_moved && this_cpu != smp_processor_id()) if (ld_moved && this_cpu != smp_processor_id())
resched_cpu(this_cpu); resched_cpu(this_cpu);
/* All tasks on this runqueue were pinned by CPU affinity */ /* All tasks on this runqueue were pinned by CPU affinity */
@ -2673,7 +2682,7 @@ redo:
} }
} }
if (!nr_moved) { if (!ld_moved) {
schedstat_inc(sd, lb_failed[idle]); schedstat_inc(sd, lb_failed[idle]);
sd->nr_balance_failed++; sd->nr_balance_failed++;
@ -2722,10 +2731,10 @@ redo:
sd->balance_interval *= 2; sd->balance_interval *= 2;
} }
if (!nr_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER && if (!ld_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
return -1; return -1;
return nr_moved; return ld_moved;
out_balanced: out_balanced:
schedstat_inc(sd, lb_balanced[idle]); schedstat_inc(sd, lb_balanced[idle]);
@ -2757,7 +2766,7 @@ load_balance_newidle(int this_cpu, struct rq *this_rq, struct sched_domain *sd)
struct sched_group *group; struct sched_group *group;
struct rq *busiest = NULL; struct rq *busiest = NULL;
unsigned long imbalance; unsigned long imbalance;
int nr_moved = 0; int ld_moved = 0;
int sd_idle = 0; int sd_idle = 0;
int all_pinned = 0; int all_pinned = 0;
cpumask_t cpus = CPU_MASK_ALL; cpumask_t cpus = CPU_MASK_ALL;
@ -2792,12 +2801,13 @@ redo:
schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance); schedstat_add(sd, lb_imbalance[CPU_NEWLY_IDLE], imbalance);
nr_moved = 0; ld_moved = 0;
if (busiest->nr_running > 1) { if (busiest->nr_running > 1) {
/* Attempt to move tasks */ /* Attempt to move tasks */
double_lock_balance(this_rq, busiest); double_lock_balance(this_rq, busiest);
nr_moved = move_tasks(this_rq, this_cpu, busiest, /* this_rq->clock is already updated */
minus_1_or_zero(busiest->nr_running), update_rq_clock(busiest);
ld_moved = move_tasks(this_rq, this_cpu, busiest,
imbalance, sd, CPU_NEWLY_IDLE, imbalance, sd, CPU_NEWLY_IDLE,
&all_pinned); &all_pinned);
spin_unlock(&busiest->lock); spin_unlock(&busiest->lock);
@ -2809,7 +2819,7 @@ redo:
} }
} }
if (!nr_moved) { if (!ld_moved) {
schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]); schedstat_inc(sd, lb_failed[CPU_NEWLY_IDLE]);
if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER && if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
!test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
@ -2817,7 +2827,7 @@ redo:
} else } else
sd->nr_balance_failed = 0; sd->nr_balance_failed = 0;
return nr_moved; return ld_moved;
out_balanced: out_balanced:
schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]); schedstat_inc(sd, lb_balanced[CPU_NEWLY_IDLE]);
@ -2894,6 +2904,8 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
/* move a task from busiest_rq to target_rq */ /* move a task from busiest_rq to target_rq */
double_lock_balance(busiest_rq, target_rq); double_lock_balance(busiest_rq, target_rq);
update_rq_clock(busiest_rq);
update_rq_clock(target_rq);
/* Search for an sd spanning us and the target CPU. */ /* Search for an sd spanning us and the target CPU. */
for_each_domain(target_cpu, sd) { for_each_domain(target_cpu, sd) {
@ -2905,8 +2917,8 @@ static void active_load_balance(struct rq *busiest_rq, int busiest_cpu)
if (likely(sd)) { if (likely(sd)) {
schedstat_inc(sd, alb_cnt); schedstat_inc(sd, alb_cnt);
if (move_tasks(target_rq, target_cpu, busiest_rq, 1, if (move_one_task(target_rq, target_cpu, busiest_rq,
ULONG_MAX, sd, CPU_IDLE, NULL)) sd, CPU_IDLE))
schedstat_inc(sd, alb_pushed); schedstat_inc(sd, alb_pushed);
else else
schedstat_inc(sd, alb_failed); schedstat_inc(sd, alb_failed);
@ -3175,8 +3187,7 @@ static int balance_tasks(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *load_moved, int *all_pinned, unsigned long *load_moved,
int this_best_prio, int best_prio, int best_prio_seen, int *this_best_prio, struct rq_iterator *iterator)
struct rq_iterator *iterator)
{ {
*load_moved = 0; *load_moved = 0;
@ -3202,7 +3213,8 @@ unsigned long long task_sched_runtime(struct task_struct *p)
rq = task_rq_lock(p, &flags); rq = task_rq_lock(p, &flags);
ns = p->se.sum_exec_runtime; ns = p->se.sum_exec_runtime;
if (rq->curr == p) { if (rq->curr == p) {
delta_exec = rq_clock(rq) - p->se.exec_start; update_rq_clock(rq);
delta_exec = rq->clock - p->se.exec_start;
if ((s64)delta_exec > 0) if ((s64)delta_exec > 0)
ns += delta_exec; ns += delta_exec;
} }
@ -3298,9 +3310,10 @@ void scheduler_tick(void)
struct task_struct *curr = rq->curr; struct task_struct *curr = rq->curr;
spin_lock(&rq->lock); spin_lock(&rq->lock);
__update_rq_clock(rq);
update_cpu_load(rq);
if (curr != rq->idle) /* FIXME: needed? */ if (curr != rq->idle) /* FIXME: needed? */
curr->sched_class->task_tick(rq, curr); curr->sched_class->task_tick(rq, curr);
update_cpu_load(rq);
spin_unlock(&rq->lock); spin_unlock(&rq->lock);
#ifdef CONFIG_SMP #ifdef CONFIG_SMP
@ -3382,7 +3395,7 @@ static inline void schedule_debug(struct task_struct *prev)
* Pick up the highest-prio task: * Pick up the highest-prio task:
*/ */
static inline struct task_struct * static inline struct task_struct *
pick_next_task(struct rq *rq, struct task_struct *prev, u64 now) pick_next_task(struct rq *rq, struct task_struct *prev)
{ {
struct sched_class *class; struct sched_class *class;
struct task_struct *p; struct task_struct *p;
@ -3392,14 +3405,14 @@ pick_next_task(struct rq *rq, struct task_struct *prev, u64 now)
* the fair class we can call that function directly: * the fair class we can call that function directly:
*/ */
if (likely(rq->nr_running == rq->cfs.nr_running)) { if (likely(rq->nr_running == rq->cfs.nr_running)) {
p = fair_sched_class.pick_next_task(rq, now); p = fair_sched_class.pick_next_task(rq);
if (likely(p)) if (likely(p))
return p; return p;
} }
class = sched_class_highest; class = sched_class_highest;
for ( ; ; ) { for ( ; ; ) {
p = class->pick_next_task(rq, now); p = class->pick_next_task(rq);
if (p) if (p)
return p; return p;
/* /*
@ -3418,7 +3431,6 @@ asmlinkage void __sched schedule(void)
struct task_struct *prev, *next; struct task_struct *prev, *next;
long *switch_count; long *switch_count;
struct rq *rq; struct rq *rq;
u64 now;
int cpu; int cpu;
need_resched: need_resched:
@ -3436,6 +3448,7 @@ need_resched_nonpreemptible:
spin_lock_irq(&rq->lock); spin_lock_irq(&rq->lock);
clear_tsk_need_resched(prev); clear_tsk_need_resched(prev);
__update_rq_clock(rq);
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely((prev->state & TASK_INTERRUPTIBLE) && if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&
@ -3450,9 +3463,8 @@ need_resched_nonpreemptible:
if (unlikely(!rq->nr_running)) if (unlikely(!rq->nr_running))
idle_balance(cpu, rq); idle_balance(cpu, rq);
now = __rq_clock(rq); prev->sched_class->put_prev_task(rq, prev);
prev->sched_class->put_prev_task(rq, prev, now); next = pick_next_task(rq, prev);
next = pick_next_task(rq, prev, now);
sched_info_switch(prev, next); sched_info_switch(prev, next);
@ -3895,17 +3907,16 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
unsigned long flags; unsigned long flags;
int oldprio, on_rq; int oldprio, on_rq;
struct rq *rq; struct rq *rq;
u64 now;
BUG_ON(prio < 0 || prio > MAX_PRIO); BUG_ON(prio < 0 || prio > MAX_PRIO);
rq = task_rq_lock(p, &flags); rq = task_rq_lock(p, &flags);
now = rq_clock(rq); update_rq_clock(rq);
oldprio = p->prio; oldprio = p->prio;
on_rq = p->se.on_rq; on_rq = p->se.on_rq;
if (on_rq) if (on_rq)
dequeue_task(rq, p, 0, now); dequeue_task(rq, p, 0);
if (rt_prio(prio)) if (rt_prio(prio))
p->sched_class = &rt_sched_class; p->sched_class = &rt_sched_class;
@ -3915,7 +3926,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
p->prio = prio; p->prio = prio;
if (on_rq) { if (on_rq) {
enqueue_task(rq, p, 0, now); enqueue_task(rq, p, 0);
/* /*
* Reschedule if we are currently running on this runqueue and * Reschedule if we are currently running on this runqueue and
* our priority decreased, or if we are not currently running on * our priority decreased, or if we are not currently running on
@ -3938,7 +3949,6 @@ void set_user_nice(struct task_struct *p, long nice)
int old_prio, delta, on_rq; int old_prio, delta, on_rq;
unsigned long flags; unsigned long flags;
struct rq *rq; struct rq *rq;
u64 now;
if (TASK_NICE(p) == nice || nice < -20 || nice > 19) if (TASK_NICE(p) == nice || nice < -20 || nice > 19)
return; return;
@ -3947,7 +3957,7 @@ void set_user_nice(struct task_struct *p, long nice)
* the task might be in the middle of scheduling on another CPU. * the task might be in the middle of scheduling on another CPU.
*/ */
rq = task_rq_lock(p, &flags); rq = task_rq_lock(p, &flags);
now = rq_clock(rq); update_rq_clock(rq);
/* /*
* The RT priorities are set via sched_setscheduler(), but we still * The RT priorities are set via sched_setscheduler(), but we still
* allow the 'normal' nice value to be set - but as expected * allow the 'normal' nice value to be set - but as expected
@ -3960,8 +3970,8 @@ void set_user_nice(struct task_struct *p, long nice)
} }
on_rq = p->se.on_rq; on_rq = p->se.on_rq;
if (on_rq) { if (on_rq) {
dequeue_task(rq, p, 0, now); dequeue_task(rq, p, 0);
dec_load(rq, p, now); dec_load(rq, p);
} }
p->static_prio = NICE_TO_PRIO(nice); p->static_prio = NICE_TO_PRIO(nice);
@ -3971,8 +3981,8 @@ void set_user_nice(struct task_struct *p, long nice)
delta = p->prio - old_prio; delta = p->prio - old_prio;
if (on_rq) { if (on_rq) {
enqueue_task(rq, p, 0, now); enqueue_task(rq, p, 0);
inc_load(rq, p, now); inc_load(rq, p);
/* /*
* If the task increased its priority or is running and * If the task increased its priority or is running and
* lowered its priority, then reschedule its CPU: * lowered its priority, then reschedule its CPU:
@ -4208,6 +4218,7 @@ recheck:
spin_unlock_irqrestore(&p->pi_lock, flags); spin_unlock_irqrestore(&p->pi_lock, flags);
goto recheck; goto recheck;
} }
update_rq_clock(rq);
on_rq = p->se.on_rq; on_rq = p->se.on_rq;
if (on_rq) if (on_rq)
deactivate_task(rq, p, 0); deactivate_task(rq, p, 0);
@ -4463,10 +4474,8 @@ long sched_getaffinity(pid_t pid, cpumask_t *mask)
out_unlock: out_unlock:
read_unlock(&tasklist_lock); read_unlock(&tasklist_lock);
mutex_unlock(&sched_hotcpu_mutex); mutex_unlock(&sched_hotcpu_mutex);
if (retval)
return retval;
return 0; return retval;
} }
/** /**
@ -4966,6 +4975,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
on_rq = p->se.on_rq; on_rq = p->se.on_rq;
if (on_rq) if (on_rq)
deactivate_task(rq_src, p, 0); deactivate_task(rq_src, p, 0);
set_task_cpu(p, dest_cpu); set_task_cpu(p, dest_cpu);
if (on_rq) { if (on_rq) {
activate_task(rq_dest, p, 0); activate_task(rq_dest, p, 0);
@ -5198,7 +5208,8 @@ static void migrate_dead_tasks(unsigned int dead_cpu)
for ( ; ; ) { for ( ; ; ) {
if (!rq->nr_running) if (!rq->nr_running)
break; break;
next = pick_next_task(rq, rq->curr, rq_clock(rq)); update_rq_clock(rq);
next = pick_next_task(rq, rq->curr);
if (!next) if (!next)
break; break;
migrate_dead(dead_cpu, next); migrate_dead(dead_cpu, next);
@ -5210,12 +5221,19 @@ static void migrate_dead_tasks(unsigned int dead_cpu)
#if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL) #if defined(CONFIG_SCHED_DEBUG) && defined(CONFIG_SYSCTL)
static struct ctl_table sd_ctl_dir[] = { static struct ctl_table sd_ctl_dir[] = {
{CTL_UNNUMBERED, "sched_domain", NULL, 0, 0755, NULL, }, {
.procname = "sched_domain",
.mode = 0755,
},
{0,}, {0,},
}; };
static struct ctl_table sd_ctl_root[] = { static struct ctl_table sd_ctl_root[] = {
{CTL_UNNUMBERED, "kernel", NULL, 0, 0755, sd_ctl_dir, }, {
.procname = "kernel",
.mode = 0755,
.child = sd_ctl_dir,
},
{0,}, {0,},
}; };
@ -5231,11 +5249,10 @@ static struct ctl_table *sd_alloc_ctl_entry(int n)
} }
static void static void
set_table_entry(struct ctl_table *entry, int ctl_name, set_table_entry(struct ctl_table *entry,
const char *procname, void *data, int maxlen, const char *procname, void *data, int maxlen,
mode_t mode, proc_handler *proc_handler) mode_t mode, proc_handler *proc_handler)
{ {
entry->ctl_name = ctl_name;
entry->procname = procname; entry->procname = procname;
entry->data = data; entry->data = data;
entry->maxlen = maxlen; entry->maxlen = maxlen;
@ -5248,28 +5265,28 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
{ {
struct ctl_table *table = sd_alloc_ctl_entry(14); struct ctl_table *table = sd_alloc_ctl_entry(14);
set_table_entry(&table[0], 1, "min_interval", &sd->min_interval, set_table_entry(&table[0], "min_interval", &sd->min_interval,
sizeof(long), 0644, proc_doulongvec_minmax); sizeof(long), 0644, proc_doulongvec_minmax);
set_table_entry(&table[1], 2, "max_interval", &sd->max_interval, set_table_entry(&table[1], "max_interval", &sd->max_interval,
sizeof(long), 0644, proc_doulongvec_minmax); sizeof(long), 0644, proc_doulongvec_minmax);
set_table_entry(&table[2], 3, "busy_idx", &sd->busy_idx, set_table_entry(&table[2], "busy_idx", &sd->busy_idx,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[3], 4, "idle_idx", &sd->idle_idx, set_table_entry(&table[3], "idle_idx", &sd->idle_idx,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[4], 5, "newidle_idx", &sd->newidle_idx, set_table_entry(&table[4], "newidle_idx", &sd->newidle_idx,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[5], 6, "wake_idx", &sd->wake_idx, set_table_entry(&table[5], "wake_idx", &sd->wake_idx,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[6], 7, "forkexec_idx", &sd->forkexec_idx, set_table_entry(&table[6], "forkexec_idx", &sd->forkexec_idx,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[7], 8, "busy_factor", &sd->busy_factor, set_table_entry(&table[7], "busy_factor", &sd->busy_factor,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[8], 9, "imbalance_pct", &sd->imbalance_pct, set_table_entry(&table[8], "imbalance_pct", &sd->imbalance_pct,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[10], 11, "cache_nice_tries", set_table_entry(&table[10], "cache_nice_tries",
&sd->cache_nice_tries, &sd->cache_nice_tries,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
set_table_entry(&table[12], 13, "flags", &sd->flags, set_table_entry(&table[12], "flags", &sd->flags,
sizeof(int), 0644, proc_dointvec_minmax); sizeof(int), 0644, proc_dointvec_minmax);
return table; return table;
@ -5289,7 +5306,6 @@ static ctl_table *sd_alloc_ctl_cpu_table(int cpu)
i = 0; i = 0;
for_each_domain(cpu, sd) { for_each_domain(cpu, sd) {
snprintf(buf, 32, "domain%d", i); snprintf(buf, 32, "domain%d", i);
entry->ctl_name = i + 1;
entry->procname = kstrdup(buf, GFP_KERNEL); entry->procname = kstrdup(buf, GFP_KERNEL);
entry->mode = 0755; entry->mode = 0755;
entry->child = sd_alloc_ctl_domain_table(sd); entry->child = sd_alloc_ctl_domain_table(sd);
@ -5310,7 +5326,6 @@ static void init_sched_domain_sysctl(void)
for (i = 0; i < cpu_num; i++, entry++) { for (i = 0; i < cpu_num; i++, entry++) {
snprintf(buf, 32, "cpu%d", i); snprintf(buf, 32, "cpu%d", i);
entry->ctl_name = i + 1;
entry->procname = kstrdup(buf, GFP_KERNEL); entry->procname = kstrdup(buf, GFP_KERNEL);
entry->mode = 0755; entry->mode = 0755;
entry->child = sd_alloc_ctl_cpu_table(i); entry->child = sd_alloc_ctl_cpu_table(i);
@ -5379,6 +5394,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
rq->migration_thread = NULL; rq->migration_thread = NULL;
/* Idle task back to normal (off runqueue, low prio) */ /* Idle task back to normal (off runqueue, low prio) */
rq = task_rq_lock(rq->idle, &flags); rq = task_rq_lock(rq->idle, &flags);
update_rq_clock(rq);
deactivate_task(rq, rq->idle, 0); deactivate_task(rq, rq->idle, 0);
rq->idle->static_prio = MAX_PRIO; rq->idle->static_prio = MAX_PRIO;
__setscheduler(rq, rq->idle, SCHED_NORMAL, 0); __setscheduler(rq, rq->idle, SCHED_NORMAL, 0);
@ -6616,12 +6632,13 @@ void normalize_rt_tasks(void)
goto out_unlock; goto out_unlock;
#endif #endif
update_rq_clock(rq);
on_rq = p->se.on_rq; on_rq = p->se.on_rq;
if (on_rq) if (on_rq)
deactivate_task(task_rq(p), p, 0); deactivate_task(rq, p, 0);
__setscheduler(rq, p, SCHED_NORMAL, 0); __setscheduler(rq, p, SCHED_NORMAL, 0);
if (on_rq) { if (on_rq) {
activate_task(task_rq(p), p, 0); activate_task(rq, p, 0);
resched_task(rq->curr); resched_task(rq->curr);
} }
#ifdef CONFIG_SMP #ifdef CONFIG_SMP

View File

@ -29,7 +29,7 @@
} while (0) } while (0)
static void static void
print_task(struct seq_file *m, struct rq *rq, struct task_struct *p, u64 now) print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
{ {
if (rq->curr == p) if (rq->curr == p)
SEQ_printf(m, "R"); SEQ_printf(m, "R");
@ -56,7 +56,7 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p, u64 now)
#endif #endif
} }
static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu, u64 now) static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
{ {
struct task_struct *g, *p; struct task_struct *g, *p;
@ -77,7 +77,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu, u64 now)
if (!p->se.on_rq || task_cpu(p) != rq_cpu) if (!p->se.on_rq || task_cpu(p) != rq_cpu)
continue; continue;
print_task(m, rq, p, now); print_task(m, rq, p);
} while_each_thread(g, p); } while_each_thread(g, p);
read_unlock_irq(&tasklist_lock); read_unlock_irq(&tasklist_lock);
@ -106,7 +106,7 @@ print_cfs_rq_runtime_sum(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
(long long)wait_runtime_rq_sum); (long long)wait_runtime_rq_sum);
} }
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now) void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{ {
SEQ_printf(m, "\ncfs_rq %p\n", cfs_rq); SEQ_printf(m, "\ncfs_rq %p\n", cfs_rq);
@ -124,7 +124,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq, u64 now)
print_cfs_rq_runtime_sum(m, cpu, cfs_rq); print_cfs_rq_runtime_sum(m, cpu, cfs_rq);
} }
static void print_cpu(struct seq_file *m, int cpu, u64 now) static void print_cpu(struct seq_file *m, int cpu)
{ {
struct rq *rq = &per_cpu(runqueues, cpu); struct rq *rq = &per_cpu(runqueues, cpu);
@ -166,9 +166,9 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
P(cpu_load[4]); P(cpu_load[4]);
#undef P #undef P
print_cfs_stats(m, cpu, now); print_cfs_stats(m, cpu);
print_rq(m, rq, cpu, now); print_rq(m, rq, cpu);
} }
static int sched_debug_show(struct seq_file *m, void *v) static int sched_debug_show(struct seq_file *m, void *v)
@ -184,7 +184,7 @@ static int sched_debug_show(struct seq_file *m, void *v)
SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now); SEQ_printf(m, "now at %Lu nsecs\n", (unsigned long long)now);
for_each_online_cpu(cpu) for_each_online_cpu(cpu)
print_cpu(m, cpu, now); print_cpu(m, cpu);
SEQ_printf(m, "\n"); SEQ_printf(m, "\n");

View File

@ -222,21 +222,25 @@ niced_granularity(struct sched_entity *curr, unsigned long granularity)
{ {
u64 tmp; u64 tmp;
/* if (likely(curr->load.weight == NICE_0_LOAD))
* Negative nice levels get the same granularity as nice-0:
*/
if (likely(curr->load.weight >= NICE_0_LOAD))
return granularity; return granularity;
/* /*
* Positive nice level tasks get linearly finer * Positive nice levels get the same granularity as nice-0:
*/
if (likely(curr->load.weight < NICE_0_LOAD)) {
tmp = curr->load.weight * (u64)granularity;
return (long) (tmp >> NICE_0_SHIFT);
}
/*
* Negative nice level tasks get linearly finer
* granularity: * granularity:
*/ */
tmp = curr->load.weight * (u64)granularity; tmp = curr->load.inv_weight * (u64)granularity;
/* /*
* It will always fit into 'long': * It will always fit into 'long':
*/ */
return (long) (tmp >> NICE_0_SHIFT); return (long) (tmp >> WMULT_SHIFT);
} }
static inline void static inline void
@ -281,26 +285,25 @@ add_wait_runtime(struct cfs_rq *cfs_rq, struct sched_entity *se, long delta)
* are not in our scheduling class. * are not in our scheduling class.
*/ */
static inline void static inline void
__update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now) __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{ {
unsigned long delta, delta_exec, delta_fair; unsigned long delta, delta_exec, delta_fair, delta_mine;
long delta_mine;
struct load_weight *lw = &cfs_rq->load; struct load_weight *lw = &cfs_rq->load;
unsigned long load = lw->weight; unsigned long load = lw->weight;
if (unlikely(!load))
return;
delta_exec = curr->delta_exec; delta_exec = curr->delta_exec;
schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max)); schedstat_set(curr->exec_max, max((u64)delta_exec, curr->exec_max));
curr->sum_exec_runtime += delta_exec; curr->sum_exec_runtime += delta_exec;
cfs_rq->exec_clock += delta_exec; cfs_rq->exec_clock += delta_exec;
if (unlikely(!load))
return;
delta_fair = calc_delta_fair(delta_exec, lw); delta_fair = calc_delta_fair(delta_exec, lw);
delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw); delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
if (cfs_rq->sleeper_bonus > sysctl_sched_stat_granularity) { if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
delta = calc_delta_mine(cfs_rq->sleeper_bonus, delta = calc_delta_mine(cfs_rq->sleeper_bonus,
curr->load.weight, lw); curr->load.weight, lw);
if (unlikely(delta > cfs_rq->sleeper_bonus)) if (unlikely(delta > cfs_rq->sleeper_bonus))
@ -321,7 +324,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr, u64 now)
add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec); add_wait_runtime(cfs_rq, curr, delta_mine - delta_exec);
} }
static void update_curr(struct cfs_rq *cfs_rq, u64 now) static void update_curr(struct cfs_rq *cfs_rq)
{ {
struct sched_entity *curr = cfs_rq_curr(cfs_rq); struct sched_entity *curr = cfs_rq_curr(cfs_rq);
unsigned long delta_exec; unsigned long delta_exec;
@ -334,22 +337,22 @@ static void update_curr(struct cfs_rq *cfs_rq, u64 now)
* since the last time we changed load (this cannot * since the last time we changed load (this cannot
* overflow on 32 bits): * overflow on 32 bits):
*/ */
delta_exec = (unsigned long)(now - curr->exec_start); delta_exec = (unsigned long)(rq_of(cfs_rq)->clock - curr->exec_start);
curr->delta_exec += delta_exec; curr->delta_exec += delta_exec;
if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) { if (unlikely(curr->delta_exec > sysctl_sched_stat_granularity)) {
__update_curr(cfs_rq, curr, now); __update_curr(cfs_rq, curr);
curr->delta_exec = 0; curr->delta_exec = 0;
} }
curr->exec_start = now; curr->exec_start = rq_of(cfs_rq)->clock;
} }
static inline void static inline void
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
se->wait_start_fair = cfs_rq->fair_clock; se->wait_start_fair = cfs_rq->fair_clock;
schedstat_set(se->wait_start, now); schedstat_set(se->wait_start, rq_of(cfs_rq)->clock);
} }
/* /*
@ -377,8 +380,7 @@ calc_weighted(unsigned long delta, unsigned long weight, int shift)
/* /*
* Task is being enqueued - update stats: * Task is being enqueued - update stats:
*/ */
static void static void update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
{ {
s64 key; s64 key;
@ -387,7 +389,7 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
* a dequeue/enqueue event is a NOP) * a dequeue/enqueue event is a NOP)
*/ */
if (se != cfs_rq_curr(cfs_rq)) if (se != cfs_rq_curr(cfs_rq))
update_stats_wait_start(cfs_rq, se, now); update_stats_wait_start(cfs_rq, se);
/* /*
* Update the key: * Update the key:
*/ */
@ -407,7 +409,8 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
(WMULT_SHIFT - NICE_0_SHIFT); (WMULT_SHIFT - NICE_0_SHIFT);
} else { } else {
tmp = se->wait_runtime; tmp = se->wait_runtime;
key -= (tmp * se->load.weight) >> NICE_0_SHIFT; key -= (tmp * se->load.inv_weight) >>
(WMULT_SHIFT - NICE_0_SHIFT);
} }
} }
@ -418,11 +421,12 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
* Note: must be called with a freshly updated rq->fair_clock. * Note: must be called with a freshly updated rq->fair_clock.
*/ */
static inline void static inline void
__update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) __update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
unsigned long delta_fair = se->delta_fair_run; unsigned long delta_fair = se->delta_fair_run;
schedstat_set(se->wait_max, max(se->wait_max, now - se->wait_start)); schedstat_set(se->wait_max, max(se->wait_max,
rq_of(cfs_rq)->clock - se->wait_start));
if (unlikely(se->load.weight != NICE_0_LOAD)) if (unlikely(se->load.weight != NICE_0_LOAD))
delta_fair = calc_weighted(delta_fair, se->load.weight, delta_fair = calc_weighted(delta_fair, se->load.weight,
@ -432,7 +436,7 @@ __update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
} }
static void static void
update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
unsigned long delta_fair; unsigned long delta_fair;
@ -442,7 +446,7 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
se->delta_fair_run += delta_fair; se->delta_fair_run += delta_fair;
if (unlikely(abs(se->delta_fair_run) >= if (unlikely(abs(se->delta_fair_run) >=
sysctl_sched_stat_granularity)) { sysctl_sched_stat_granularity)) {
__update_stats_wait_end(cfs_rq, se, now); __update_stats_wait_end(cfs_rq, se);
se->delta_fair_run = 0; se->delta_fair_run = 0;
} }
@ -451,34 +455,34 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
} }
static inline void static inline void
update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
update_curr(cfs_rq, now); update_curr(cfs_rq);
/* /*
* Mark the end of the wait period if dequeueing a * Mark the end of the wait period if dequeueing a
* waiting task: * waiting task:
*/ */
if (se != cfs_rq_curr(cfs_rq)) if (se != cfs_rq_curr(cfs_rq))
update_stats_wait_end(cfs_rq, se, now); update_stats_wait_end(cfs_rq, se);
} }
/* /*
* We are picking a new current task - update its stats: * We are picking a new current task - update its stats:
*/ */
static inline void static inline void
update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
/* /*
* We are starting a new run period: * We are starting a new run period:
*/ */
se->exec_start = now; se->exec_start = rq_of(cfs_rq)->clock;
} }
/* /*
* We are descheduling a task - update its stats: * We are descheduling a task - update its stats:
*/ */
static inline void static inline void
update_stats_curr_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) update_stats_curr_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
se->exec_start = 0; se->exec_start = 0;
} }
@ -487,8 +491,7 @@ update_stats_curr_end(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
* Scheduling class queueing methods: * Scheduling class queueing methods:
*/ */
static void static void __enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
__enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
{ {
unsigned long load = cfs_rq->load.weight, delta_fair; unsigned long load = cfs_rq->load.weight, delta_fair;
long prev_runtime; long prev_runtime;
@ -522,8 +525,7 @@ __enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
schedstat_add(cfs_rq, wait_runtime, se->wait_runtime); schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
} }
static void static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
{ {
struct task_struct *tsk = task_of(se); struct task_struct *tsk = task_of(se);
unsigned long delta_fair; unsigned long delta_fair;
@ -538,7 +540,7 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
se->delta_fair_sleep += delta_fair; se->delta_fair_sleep += delta_fair;
if (unlikely(abs(se->delta_fair_sleep) >= if (unlikely(abs(se->delta_fair_sleep) >=
sysctl_sched_stat_granularity)) { sysctl_sched_stat_granularity)) {
__enqueue_sleeper(cfs_rq, se, now); __enqueue_sleeper(cfs_rq, se);
se->delta_fair_sleep = 0; se->delta_fair_sleep = 0;
} }
@ -546,7 +548,7 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
#ifdef CONFIG_SCHEDSTATS #ifdef CONFIG_SCHEDSTATS
if (se->sleep_start) { if (se->sleep_start) {
u64 delta = now - se->sleep_start; u64 delta = rq_of(cfs_rq)->clock - se->sleep_start;
if ((s64)delta < 0) if ((s64)delta < 0)
delta = 0; delta = 0;
@ -558,7 +560,7 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
se->sum_sleep_runtime += delta; se->sum_sleep_runtime += delta;
} }
if (se->block_start) { if (se->block_start) {
u64 delta = now - se->block_start; u64 delta = rq_of(cfs_rq)->clock - se->block_start;
if ((s64)delta < 0) if ((s64)delta < 0)
delta = 0; delta = 0;
@ -573,26 +575,24 @@ enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
} }
static void static void
enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int wakeup)
int wakeup, u64 now)
{ {
/* /*
* Update the fair clock. * Update the fair clock.
*/ */
update_curr(cfs_rq, now); update_curr(cfs_rq);
if (wakeup) if (wakeup)
enqueue_sleeper(cfs_rq, se, now); enqueue_sleeper(cfs_rq, se);
update_stats_enqueue(cfs_rq, se, now); update_stats_enqueue(cfs_rq, se);
__enqueue_entity(cfs_rq, se); __enqueue_entity(cfs_rq, se);
} }
static void static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int sleep)
int sleep, u64 now)
{ {
update_stats_dequeue(cfs_rq, se, now); update_stats_dequeue(cfs_rq, se);
if (sleep) { if (sleep) {
se->sleep_start_fair = cfs_rq->fair_clock; se->sleep_start_fair = cfs_rq->fair_clock;
#ifdef CONFIG_SCHEDSTATS #ifdef CONFIG_SCHEDSTATS
@ -600,9 +600,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
struct task_struct *tsk = task_of(se); struct task_struct *tsk = task_of(se);
if (tsk->state & TASK_INTERRUPTIBLE) if (tsk->state & TASK_INTERRUPTIBLE)
se->sleep_start = now; se->sleep_start = rq_of(cfs_rq)->clock;
if (tsk->state & TASK_UNINTERRUPTIBLE) if (tsk->state & TASK_UNINTERRUPTIBLE)
se->block_start = now; se->block_start = rq_of(cfs_rq)->clock;
} }
cfs_rq->wait_runtime -= se->wait_runtime; cfs_rq->wait_runtime -= se->wait_runtime;
#endif #endif
@ -629,7 +629,7 @@ __check_preempt_curr_fair(struct cfs_rq *cfs_rq, struct sched_entity *se,
} }
static inline void static inline void
set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now) set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{ {
/* /*
* Any task has to be enqueued before it get to execute on * Any task has to be enqueued before it get to execute on
@ -638,49 +638,46 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 now)
* done a put_prev_task_fair() shortly before this, which * done a put_prev_task_fair() shortly before this, which
* updated rq->fair_clock - used by update_stats_wait_end()) * updated rq->fair_clock - used by update_stats_wait_end())
*/ */
update_stats_wait_end(cfs_rq, se, now); update_stats_wait_end(cfs_rq, se);
update_stats_curr_start(cfs_rq, se, now); update_stats_curr_start(cfs_rq, se);
set_cfs_rq_curr(cfs_rq, se); set_cfs_rq_curr(cfs_rq, se);
} }
static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq, u64 now) static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{ {
struct sched_entity *se = __pick_next_entity(cfs_rq); struct sched_entity *se = __pick_next_entity(cfs_rq);
set_next_entity(cfs_rq, se, now); set_next_entity(cfs_rq, se);
return se; return se;
} }
static void static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev, u64 now)
{ {
/* /*
* If still on the runqueue then deactivate_task() * If still on the runqueue then deactivate_task()
* was not called and update_curr() has to be done: * was not called and update_curr() has to be done:
*/ */
if (prev->on_rq) if (prev->on_rq)
update_curr(cfs_rq, now); update_curr(cfs_rq);
update_stats_curr_end(cfs_rq, prev, now); update_stats_curr_end(cfs_rq, prev);
if (prev->on_rq) if (prev->on_rq)
update_stats_wait_start(cfs_rq, prev, now); update_stats_wait_start(cfs_rq, prev);
set_cfs_rq_curr(cfs_rq, NULL); set_cfs_rq_curr(cfs_rq, NULL);
} }
static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) static void entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{ {
struct rq *rq = rq_of(cfs_rq);
struct sched_entity *next; struct sched_entity *next;
u64 now = __rq_clock(rq);
/* /*
* Dequeue and enqueue the task to update its * Dequeue and enqueue the task to update its
* position within the tree: * position within the tree:
*/ */
dequeue_entity(cfs_rq, curr, 0, now); dequeue_entity(cfs_rq, curr, 0);
enqueue_entity(cfs_rq, curr, 0, now); enqueue_entity(cfs_rq, curr, 0);
/* /*
* Reschedule if another task tops the current one. * Reschedule if another task tops the current one.
@ -785,8 +782,7 @@ static inline int is_same_group(struct task_struct *curr, struct task_struct *p)
* increased. Here we update the fair scheduling stats and * increased. Here we update the fair scheduling stats and
* then put the task into the rbtree: * then put the task into the rbtree:
*/ */
static void static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup)
enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
{ {
struct cfs_rq *cfs_rq; struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se; struct sched_entity *se = &p->se;
@ -795,7 +791,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
if (se->on_rq) if (se->on_rq)
break; break;
cfs_rq = cfs_rq_of(se); cfs_rq = cfs_rq_of(se);
enqueue_entity(cfs_rq, se, wakeup, now); enqueue_entity(cfs_rq, se, wakeup);
} }
} }
@ -804,15 +800,14 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
* decreased. We remove the task from the rbtree and * decreased. We remove the task from the rbtree and
* update the fair scheduling stats: * update the fair scheduling stats:
*/ */
static void static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep)
dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep, u64 now)
{ {
struct cfs_rq *cfs_rq; struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se; struct sched_entity *se = &p->se;
for_each_sched_entity(se) { for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se); cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, sleep, now); dequeue_entity(cfs_rq, se, sleep);
/* Don't dequeue parent if it has other entities besides us */ /* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) if (cfs_rq->load.weight)
break; break;
@ -825,14 +820,14 @@ dequeue_task_fair(struct rq *rq, struct task_struct *p, int sleep, u64 now)
static void yield_task_fair(struct rq *rq, struct task_struct *p) static void yield_task_fair(struct rq *rq, struct task_struct *p)
{ {
struct cfs_rq *cfs_rq = task_cfs_rq(p); struct cfs_rq *cfs_rq = task_cfs_rq(p);
u64 now = __rq_clock(rq);
__update_rq_clock(rq);
/* /*
* Dequeue and enqueue the task to update its * Dequeue and enqueue the task to update its
* position within the tree: * position within the tree:
*/ */
dequeue_entity(cfs_rq, &p->se, 0, now); dequeue_entity(cfs_rq, &p->se, 0);
enqueue_entity(cfs_rq, &p->se, 0, now); enqueue_entity(cfs_rq, &p->se, 0);
} }
/* /*
@ -845,7 +840,8 @@ static void check_preempt_curr_fair(struct rq *rq, struct task_struct *p)
unsigned long gran; unsigned long gran;
if (unlikely(rt_prio(p->prio))) { if (unlikely(rt_prio(p->prio))) {
update_curr(cfs_rq, rq_clock(rq)); update_rq_clock(rq);
update_curr(cfs_rq);
resched_task(curr); resched_task(curr);
return; return;
} }
@ -861,7 +857,7 @@ static void check_preempt_curr_fair(struct rq *rq, struct task_struct *p)
__check_preempt_curr_fair(cfs_rq, &p->se, &curr->se, gran); __check_preempt_curr_fair(cfs_rq, &p->se, &curr->se, gran);
} }
static struct task_struct *pick_next_task_fair(struct rq *rq, u64 now) static struct task_struct *pick_next_task_fair(struct rq *rq)
{ {
struct cfs_rq *cfs_rq = &rq->cfs; struct cfs_rq *cfs_rq = &rq->cfs;
struct sched_entity *se; struct sched_entity *se;
@ -870,7 +866,7 @@ static struct task_struct *pick_next_task_fair(struct rq *rq, u64 now)
return NULL; return NULL;
do { do {
se = pick_next_entity(cfs_rq, now); se = pick_next_entity(cfs_rq);
cfs_rq = group_cfs_rq(se); cfs_rq = group_cfs_rq(se);
} while (cfs_rq); } while (cfs_rq);
@ -880,14 +876,14 @@ static struct task_struct *pick_next_task_fair(struct rq *rq, u64 now)
/* /*
* Account for a descheduled task: * Account for a descheduled task:
*/ */
static void put_prev_task_fair(struct rq *rq, struct task_struct *prev, u64 now) static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
{ {
struct sched_entity *se = &prev->se; struct sched_entity *se = &prev->se;
struct cfs_rq *cfs_rq; struct cfs_rq *cfs_rq;
for_each_sched_entity(se) { for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se); cfs_rq = cfs_rq_of(se);
put_prev_entity(cfs_rq, se, now); put_prev_entity(cfs_rq, se);
} }
} }
@ -930,6 +926,7 @@ static struct task_struct *load_balance_next_fair(void *arg)
return __load_balance_iterator(cfs_rq, cfs_rq->rb_load_balance_curr); return __load_balance_iterator(cfs_rq, cfs_rq->rb_load_balance_curr);
} }
#ifdef CONFIG_FAIR_GROUP_SCHED
static int cfs_rq_best_prio(struct cfs_rq *cfs_rq) static int cfs_rq_best_prio(struct cfs_rq *cfs_rq)
{ {
struct sched_entity *curr; struct sched_entity *curr;
@ -943,12 +940,13 @@ static int cfs_rq_best_prio(struct cfs_rq *cfs_rq)
return p->prio; return p->prio;
} }
#endif
static int static unsigned long
load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest, load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *total_load_moved) int *all_pinned, int *this_best_prio)
{ {
struct cfs_rq *busy_cfs_rq; struct cfs_rq *busy_cfs_rq;
unsigned long load_moved, total_nr_moved = 0, nr_moved; unsigned long load_moved, total_nr_moved = 0, nr_moved;
@ -959,10 +957,10 @@ load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
cfs_rq_iterator.next = load_balance_next_fair; cfs_rq_iterator.next = load_balance_next_fair;
for_each_leaf_cfs_rq(busiest, busy_cfs_rq) { for_each_leaf_cfs_rq(busiest, busy_cfs_rq) {
#ifdef CONFIG_FAIR_GROUP_SCHED
struct cfs_rq *this_cfs_rq; struct cfs_rq *this_cfs_rq;
long imbalance; long imbalances;
unsigned long maxload; unsigned long maxload;
int this_best_prio, best_prio, best_prio_seen = 0;
this_cfs_rq = cpu_cfs_rq(busy_cfs_rq, this_cpu); this_cfs_rq = cpu_cfs_rq(busy_cfs_rq, this_cpu);
@ -976,27 +974,17 @@ load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
imbalance /= 2; imbalance /= 2;
maxload = min(rem_load_move, imbalance); maxload = min(rem_load_move, imbalance);
this_best_prio = cfs_rq_best_prio(this_cfs_rq); *this_best_prio = cfs_rq_best_prio(this_cfs_rq);
best_prio = cfs_rq_best_prio(busy_cfs_rq); #else
#define maxload rem_load_move
/* #endif
* Enable handling of the case where there is more than one task
* with the best priority. If the current running task is one
* of those with prio==best_prio we know it won't be moved
* and therefore it's safe to override the skip (based on load)
* of any task we find with that prio.
*/
if (cfs_rq_curr(busy_cfs_rq) == &busiest->curr->se)
best_prio_seen = 1;
/* pass busy_cfs_rq argument into /* pass busy_cfs_rq argument into
* load_balance_[start|next]_fair iterators * load_balance_[start|next]_fair iterators
*/ */
cfs_rq_iterator.arg = busy_cfs_rq; cfs_rq_iterator.arg = busy_cfs_rq;
nr_moved = balance_tasks(this_rq, this_cpu, busiest, nr_moved = balance_tasks(this_rq, this_cpu, busiest,
max_nr_move, maxload, sd, idle, all_pinned, max_nr_move, maxload, sd, idle, all_pinned,
&load_moved, this_best_prio, best_prio, &load_moved, this_best_prio, &cfs_rq_iterator);
best_prio_seen, &cfs_rq_iterator);
total_nr_moved += nr_moved; total_nr_moved += nr_moved;
max_nr_move -= nr_moved; max_nr_move -= nr_moved;
@ -1006,9 +994,7 @@ load_balance_fair(struct rq *this_rq, int this_cpu, struct rq *busiest,
break; break;
} }
*total_load_moved = max_load_move - rem_load_move; return max_load_move - rem_load_move;
return total_nr_moved;
} }
/* /*
@ -1032,14 +1018,14 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr)
* monopolize the CPU. Note: the parent runqueue is locked, * monopolize the CPU. Note: the parent runqueue is locked,
* the child is not running yet. * the child is not running yet.
*/ */
static void task_new_fair(struct rq *rq, struct task_struct *p, u64 now) static void task_new_fair(struct rq *rq, struct task_struct *p)
{ {
struct cfs_rq *cfs_rq = task_cfs_rq(p); struct cfs_rq *cfs_rq = task_cfs_rq(p);
struct sched_entity *se = &p->se; struct sched_entity *se = &p->se;
sched_info_queued(p); sched_info_queued(p);
update_stats_enqueue(cfs_rq, se, now); update_stats_enqueue(cfs_rq, se);
/* /*
* Child runs first: we let it run before the parent * Child runs first: we let it run before the parent
* until it reschedules once. We set up the key so that * until it reschedules once. We set up the key so that
@ -1072,15 +1058,10 @@ static void task_new_fair(struct rq *rq, struct task_struct *p, u64 now)
*/ */
static void set_curr_task_fair(struct rq *rq) static void set_curr_task_fair(struct rq *rq)
{ {
struct task_struct *curr = rq->curr; struct sched_entity *se = &rq->curr.se;
struct sched_entity *se = &curr->se;
u64 now = rq_clock(rq);
struct cfs_rq *cfs_rq;
for_each_sched_entity(se) { for_each_sched_entity(se)
cfs_rq = cfs_rq_of(se); set_next_entity(cfs_rq_of(se), se);
set_next_entity(cfs_rq, se, now);
}
} }
#else #else
static void set_curr_task_fair(struct rq *rq) static void set_curr_task_fair(struct rq *rq)
@ -1109,12 +1090,11 @@ struct sched_class fair_sched_class __read_mostly = {
}; };
#ifdef CONFIG_SCHED_DEBUG #ifdef CONFIG_SCHED_DEBUG
void print_cfs_stats(struct seq_file *m, int cpu, u64 now) static void print_cfs_stats(struct seq_file *m, int cpu)
{ {
struct rq *rq = cpu_rq(cpu);
struct cfs_rq *cfs_rq; struct cfs_rq *cfs_rq;
for_each_leaf_cfs_rq(rq, cfs_rq) for_each_leaf_cfs_rq(cpu_rq(cpu), cfs_rq)
print_cfs_rq(m, cpu, cfs_rq, now); print_cfs_rq(m, cpu, cfs_rq);
} }
#endif #endif

View File

@ -13,7 +13,7 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p)
resched_task(rq->idle); resched_task(rq->idle);
} }
static struct task_struct *pick_next_task_idle(struct rq *rq, u64 now) static struct task_struct *pick_next_task_idle(struct rq *rq)
{ {
schedstat_inc(rq, sched_goidle); schedstat_inc(rq, sched_goidle);
@ -25,7 +25,7 @@ static struct task_struct *pick_next_task_idle(struct rq *rq, u64 now)
* message if some code attempts to do it: * message if some code attempts to do it:
*/ */
static void static void
dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep, u64 now) dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep)
{ {
spin_unlock_irq(&rq->lock); spin_unlock_irq(&rq->lock);
printk(KERN_ERR "bad: scheduling from the idle thread!\n"); printk(KERN_ERR "bad: scheduling from the idle thread!\n");
@ -33,15 +33,15 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int sleep, u64 now)
spin_lock_irq(&rq->lock); spin_lock_irq(&rq->lock);
} }
static void put_prev_task_idle(struct rq *rq, struct task_struct *prev, u64 now) static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
{ {
} }
static int static unsigned long
load_balance_idle(struct rq *this_rq, int this_cpu, struct rq *busiest, load_balance_idle(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *total_load_moved) int *all_pinned, int *this_best_prio)
{ {
return 0; return 0;
} }

View File

@ -7,7 +7,7 @@
* Update the current task's runtime statistics. Skip current tasks that * Update the current task's runtime statistics. Skip current tasks that
* are not in our scheduling class. * are not in our scheduling class.
*/ */
static inline void update_curr_rt(struct rq *rq, u64 now) static inline void update_curr_rt(struct rq *rq)
{ {
struct task_struct *curr = rq->curr; struct task_struct *curr = rq->curr;
u64 delta_exec; u64 delta_exec;
@ -15,18 +15,17 @@ static inline void update_curr_rt(struct rq *rq, u64 now)
if (!task_has_rt_policy(curr)) if (!task_has_rt_policy(curr))
return; return;
delta_exec = now - curr->se.exec_start; delta_exec = rq->clock - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0)) if (unlikely((s64)delta_exec < 0))
delta_exec = 0; delta_exec = 0;
schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec)); schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec));
curr->se.sum_exec_runtime += delta_exec; curr->se.sum_exec_runtime += delta_exec;
curr->se.exec_start = now; curr->se.exec_start = rq->clock;
} }
static void static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup)
enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
{ {
struct rt_prio_array *array = &rq->rt.active; struct rt_prio_array *array = &rq->rt.active;
@ -37,12 +36,11 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup, u64 now)
/* /*
* Adding/removing a task to/from a priority array: * Adding/removing a task to/from a priority array:
*/ */
static void static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep)
dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep, u64 now)
{ {
struct rt_prio_array *array = &rq->rt.active; struct rt_prio_array *array = &rq->rt.active;
update_curr_rt(rq, now); update_curr_rt(rq);
list_del(&p->run_list); list_del(&p->run_list);
if (list_empty(array->queue + p->prio)) if (list_empty(array->queue + p->prio))
@ -75,7 +73,7 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p)
resched_task(rq->curr); resched_task(rq->curr);
} }
static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now) static struct task_struct *pick_next_task_rt(struct rq *rq)
{ {
struct rt_prio_array *array = &rq->rt.active; struct rt_prio_array *array = &rq->rt.active;
struct task_struct *next; struct task_struct *next;
@ -89,14 +87,14 @@ static struct task_struct *pick_next_task_rt(struct rq *rq, u64 now)
queue = array->queue + idx; queue = array->queue + idx;
next = list_entry(queue->next, struct task_struct, run_list); next = list_entry(queue->next, struct task_struct, run_list);
next->se.exec_start = now; next->se.exec_start = rq->clock;
return next; return next;
} }
static void put_prev_task_rt(struct rq *rq, struct task_struct *p, u64 now) static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
{ {
update_curr_rt(rq, now); update_curr_rt(rq);
p->se.exec_start = 0; p->se.exec_start = 0;
} }
@ -172,28 +170,15 @@ static struct task_struct *load_balance_next_rt(void *arg)
return p; return p;
} }
static int static unsigned long
load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest, load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest,
unsigned long max_nr_move, unsigned long max_load_move, unsigned long max_nr_move, unsigned long max_load_move,
struct sched_domain *sd, enum cpu_idle_type idle, struct sched_domain *sd, enum cpu_idle_type idle,
int *all_pinned, unsigned long *load_moved) int *all_pinned, int *this_best_prio)
{ {
int this_best_prio, best_prio, best_prio_seen = 0;
int nr_moved; int nr_moved;
struct rq_iterator rt_rq_iterator; struct rq_iterator rt_rq_iterator;
unsigned long load_moved;
best_prio = sched_find_first_bit(busiest->rt.active.bitmap);
this_best_prio = sched_find_first_bit(this_rq->rt.active.bitmap);
/*
* Enable handling of the case where there is more than one task
* with the best priority. If the current running task is one
* of those with prio==best_prio we know it won't be moved
* and therefore it's safe to override the skip (based on load)
* of any task we find with that prio.
*/
if (busiest->curr->prio == best_prio)
best_prio_seen = 1;
rt_rq_iterator.start = load_balance_start_rt; rt_rq_iterator.start = load_balance_start_rt;
rt_rq_iterator.next = load_balance_next_rt; rt_rq_iterator.next = load_balance_next_rt;
@ -203,11 +188,10 @@ load_balance_rt(struct rq *this_rq, int this_cpu, struct rq *busiest,
rt_rq_iterator.arg = busiest; rt_rq_iterator.arg = busiest;
nr_moved = balance_tasks(this_rq, this_cpu, busiest, max_nr_move, nr_moved = balance_tasks(this_rq, this_cpu, busiest, max_nr_move,
max_load_move, sd, idle, all_pinned, load_moved, max_load_move, sd, idle, all_pinned, &load_moved,
this_best_prio, best_prio, best_prio_seen, this_best_prio, &rt_rq_iterator);
&rt_rq_iterator);
return nr_moved; return load_moved;
} }
static void task_tick_rt(struct rq *rq, struct task_struct *p) static void task_tick_rt(struct rq *rq, struct task_struct *p)