17720 Commits

Author SHA1 Message Date
Frederic Weisbecker
e0a23b0628 watchdog: Simplify a little the IPI call
In order to remotely restart the watchdog hrtimer, update_timers()
allocates a csd on the stack and pass it to __smp_call_function_single().

There is no partcular need, however, for a specific csd here. Lets
simplify that a little by calling smp_call_function_single()
which can already take care of the csd allocation by itself.

Acked-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-24 14:47:05 -08:00
Frederic Weisbecker
d7877c03f1 smp: Move __smp_call_function_single() below its safe version
Move this function closer to __smp_call_function_single(). These functions
have very similar behavior and should be displayed in the same block
for clarity.

Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-24 14:47:01 -08:00
Frederic Weisbecker
8b28499a71 smp: Consolidate the various smp_call_function_single() declensions
__smp_call_function_single() and smp_call_function_single() share some
code that can be factorized: execute inline when the target is local,
check if the target is online, lock the csd, call generic_exec_single().

Lets move the common parts to generic_exec_single().

Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-24 14:46:58 -08:00
Jan Kara
08eed44c72 smp: Teach __smp_call_function_single() to check for offline cpus
Align __smp_call_function_single() with smp_call_function_single() so
that it also checks whether requested cpu is still online.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-24 14:46:55 -08:00
Jan Kara
5fd77595ec smp: Iterate functions through llist_for_each_entry_safe()
The IPI function llist iteration is open coded. Lets simplify this
with using an llist iterator.

Also we want to keep the iteration safe against possible
csd.llist->next value reuse from the IPI handler. At least the block
subsystem used to do such things so lets stay careful and use
llist_for_each_entry_safe().

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-24 14:46:47 -08:00
Joe Perches
f5645d3575 capability: Use current logging styles
Prefix logging output with "capability: " via pr_fmt.
Convert printks to pr_<level>.
Use pr_<level>_once instead of guard flags.
Coalesce formats.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-02-24 14:44:53 +11:00
Linus Torvalds
b2880eb83d Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "Serialize the registration of a new sched_clock in the currently ARM
  only generic sched_clock facilty to avoid sched_clock havoc"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched_clock: Prevent callers from seeing half-updated data
2014-02-23 14:17:08 -08:00
Paul E. McKenney
e086481baf rcutorture: Add a lock_busted to test the test
This commit adds a maximally broken locking primitive in which
lock acquisition and release are both no-ops.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:43 -08:00
Paul E. McKenney
f881825a73 rcutorture: Gracefully handle NULL cleanup hooks
Although most torture tests will have some cleanup hook, it is possible
that one might not.  This commit therefore enables graceful handling of
a NULL cleanup hook during torture-test shutdown.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:39 -08:00
Paul E. McKenney
ff20e251c4 rcutorture: Add an rcu_busted to test the test
This commit adds a deliberately buggy RCU implementation into rcutorture
to allow easy checking that rcutorture correctly flags buggy RCU
implementations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:30 -08:00
Paul E. McKenney
0af3fe1efa locktorture: Add a lock-torture kernel module
This commit adds the locking counterpart to rcutorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Make n_lock_torture_errors and torture_spinlock static
  as suggested by Fengguang Wu. ]
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:29 -08:00
Paul E. McKenney
bfefc73aa1 rcutorture: Stop generic kthreads in torture_cleanup()
The specific torture modules (like rcutorture) need to call
torture_cleanup() in any case, so this commit makes torture_cleanup()
deal with torture_shutdown_cleanup() and torture_stutter_cleanup() so
that the specific modules don't have to deal with these details.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:27 -08:00
Paul E. McKenney
9c029b8609 rcutorture: Abstract torture_stop_kthread()
Stopping of kthreads is not RCU-specific, so this commit abstracts
out torture_stop_kthread(), saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:04:25 -08:00
Paul E. McKenney
47cf29b9e7 rcutorture: Abstract torture_create_kthread()
Creation of kthreads is not RCU-specific, so this commit abstracts
out torture_create_kthread(), saving a few tens of lines of code in
the process.

This change requires modifying VERBOSE_TOROUT_ERRSTRING() to take a
non-const string, so that _torture_create_kthread() can avoid an
open-coded substitute.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:24 -08:00
Paul E. McKenney
bc8f83e2c0 rcutorture: Fix missing-return bug in rcu_torture_barrier_init()
This commit adds a missing error return to the code path that creates
the rcu_torture_barrier() kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:22 -08:00
Paul E. McKenney
7fafaac5b9 rcutorture: Fix rcutorture shutdown races
Not all of the rcutorture kthreads waited for kthread_should_stop()
before returning from their top-level functions, and none of them
used torture_shutdown_absorb() properly.  These problems can result in
segfaults and hangs at shutdown time, and some recent changes perturbed
timing sufficiently to make them much more probable.  This commit
therefore creates a torture_kthread_stopping() function that does the
proper kthread shutdown dance in one centralized location.

Accommodate this grouping by making VERBOSE_TOROUT_STRING() capable of
taking a non-const string as its argument, which allows the new
torture_kthread_stopping() to pass its "title" argument directly to
the updated version of VERBOSE_TOROUT_STRING().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:03:21 -08:00
Paul E. McKenney
14562d1cf1 rcutorture: Announce task creation
A few "stealth-start rcutorture kthreads" have accumulated over the years,
so this commit adds console-log announcements (but only if the torture
tests are running verbose).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:20 -08:00
Paul E. McKenney
01025ebc99 rcutorture: Clean up rcu_torture_init() error checking
This commit applies some simple cleanups to rcu_torture_init() error
checking.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:19 -08:00
Paul E. McKenney
e991dbc077 rcutorture: Abstract torture_shutdown()
Because auto-shutdown of torture testing is not specific to RCU,
this commit moves the auto-shutdown function to kernel/torture.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:18 -08:00
Paul E. McKenney
57a2fe90fc rcutorture: Apply ACCESS_ONCE() to racy fullstop accesses
Because the fullstop variable can be accessed while it is being updated,
this commit avoids any resulting compiler mischief through use of
ACCESS_ONCE() for non-initialization accesses to this shared variable.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:03:16 -08:00
Paul E. McKenney
628edaa506 rcutorture: Abstract stutter_wait()
Because stuttering the test load (stopping and restarting it) is useful
for non-RCU testing, this commit moves the load-stuttering functionality
to kernel/torture.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:02:54 -08:00
Paul E. McKenney
fac480efcb rcutorture: Add diagnostic for unscheduled system shutdown
Currently, rcutorture can terminate via rmmod, via self-shutdown,
via something else shutting the system down, or of course the usual
catastrophic termination.  The first two get flagged, so this commit adds
a message for the third.  For the fourth, your warranty is void as always.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:13 -08:00
Paul E. McKenney
36970bb91d rcutorture: Privatize fullstop
This commit introduces the torture_must_stop() function in order to
keep use of the fullstop variable local to kernel/torture.c.  There
is also a torture_must_stop_irq() counterpart for use from RCU callbacks,
timeout handlers, and the like.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:12 -08:00
Paul E. McKenney
4622b487ec rcutorture: Abstract torture_shutdown_notify()
Because handling the race between rmmod and system shutdown is not
specific to RCU, this commit abstracts torture_shutdown_notify(),
placing this code into kernel/torture.c.  This change also allows
fullstop_mutex to be private to kernel/torture.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:11 -08:00
Paul E. McKenney
cc47ae0830 rcutorture: Abstract torture-test cleanup
This commit creates a torture_cleanup() that handles the generic
cleanup actions local to kernel/torture.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:08 -08:00
Paul E. McKenney
b5daa8f3b3 rcutorture: Abstract torture-test initialization
This commit creates torture_init_begin() and torture_init_end() functions
to abstract locking and allow the torture_type and verbose variables
in kernel/torture.o to become static.  With a bit more abstraction,
fullstop_mutex will also become static.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:07 -08:00
Paul E. McKenney
2e9e8081d2 rcutorture: Abstract torture_onoff()
Because online/offline torturing is not specific to RCU, this commit
abstracts it into the kernel/torture.c module to allow other torture
tests to use it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:06 -08:00
Paul E. McKenney
3808dc9fab rcutorture: Abstract torture_shuffle()
The torture_shuffle() function forces each CPU in turn to go idle
periodically in order to check for problems interacting with per-CPU
variables and with dyntick-idle mode.  Because this sort of debugging
is not specific to RCU, this commit abstracts that functionality.
This in turn requires abstracting some additional infrastructure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:05 -08:00
Paul E. McKenney
f67a33561e rcutorture: Abstract torture_shutdown_absorb()
Because handling races between rmmod and normal shutdown is not specific
to rcutorture, this commit renames rcutorture_shutdown_absorb() to
torture_shutdown_absorb() and pulls it out into then kernel/torture.c
module.  This implies pulling the fullstop mechanism into kernel/torture.c
as well.

The exporting of fullstop and fullstop_mutex is ugly and must die.
And it does in fact die in later commits that introduce higher-level
APIs that encapsulate both of these variables.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>`
2014-02-23 09:01:04 -08:00
Paul E. McKenney
c2884de38e rcutorture: Abstract TOROUT_STRING() and friends
These diagnostic macros are not confined to torturing RCU, so this commit
makes them available to other torture tests.  Also removed the do-while
from TOROUT_STRING() in response to checkpatch complaints.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:01:02 -08:00
Paul E. McKenney
5ccf60f23d rcutorture: Rename PRINTK to TOROUT
Since it doesn't do printk()s anymore anyway, this commit renames these
macros from PRINTK to TOROUT (short for torture output).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2014-02-23 09:01:01 -08:00
Paul E. McKenney
9e25022541 rcutorture: Abstract torture_param()
Create a torture_param() macro and apply it to rcutorture in order to
save a few lines of code.  This same macro may be applied to other
torture frameworks.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:01:00 -08:00
Paul E. McKenney
51b1130eb5 rcutorture: Abstract rcu_torture_random()
Because rcu_torture_random() will be used by the locking equivalent to
rcutorture, pull it out into its own module.  This new module cannot
be separately configured, instead, use the Kconfig "select" statement
from the Kconfig options of tests depending on it.

Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:00:58 -08:00
Paul E. McKenney
806274c018 rcutorture: Fix checkpatch complaint
This commit does a code-style cleanup so that the first curly brace
of an initializer does not appear at the beginning of a line.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-02-23 09:00:57 -08:00
Mike Galbraith
d987fc7f32 sched, nohz: Exclude isolated cores from load balancing
The user explicitly disabled load balancing, else this core would not be
disconnected.  Don't add these to nohz.idle_cpus_mask.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Lei Wen <leiwen@marvell.com>
Link: http://lkml.kernel.org/n/tip-vmme4f49psirp966pklm5l9j@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:17:22 +01:00
Morten Rasmussen
de91b9cb97 sched: Fix select_task_rq_fair() description comments
Brings select_task_rq_fair() description comments up-to-date.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392732864-10927-1-git-send-email-morten.rasmussen@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:17:04 +01:00
Dongsheng Yang
144818422b workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/6d85138180c00ce86975addab6e34b24b84f00a5.1392103744.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:16:41 +01:00
Dongsheng Yang
c4a4d2f431 sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: http://lkml.kernel.org/r/0261f094b836f1acbcdf52e7166487c0c77323c8.1392103744.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:16:19 +01:00
Dongsheng Yang
75e45d512f sched: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/bd80780f19b4f9b4a765acc353c8dbc130274dd6.1392103744.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:15:54 +01:00
Dongsheng Yang
d277d868da rcu: Use MAX_NICE to replace hardcoding of 19
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/5b3bf232f41b33ab703a1595e94671b303e2d1fc.1392103744.git.yangds.fnst@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:15:24 +01:00
Li Zefan
11c785b79e sched/rt: Make init_sched_rt_calss() __init
It's a bootstrap function.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/52F5CC09.1080502@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:11:10 +01:00
Li Zefan
d82fd25356 sched/rt: Remove 'leaf_rt_rq_list' from 'struct rq'
This is a leftover from commit e23ee74777f389369431d77390c4b09332ce026a
("sched/rt: Simplify pull_rt_task() logic and remove .leaf_rt_rq_list").

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/52F5CBF6.4060901@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:10:43 +01:00
Thomas Gleixner
c365c292d0 sched: Consider pi boosting in setscheduler()
If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.

If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ontop of v3.14-rc1. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:10:04 +01:00
Thomas Gleixner
81a44c5441 sched: Queue RT tasks to head when prio drops
The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:

 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:09:41 +01:00
Thomas Gleixner
d6b1e91197 sched: Adjust p->sched_reset_on_fork when nothing else changes
If the policy and priority remain unchanged a possible modification of
p->sched_reset_on_fork gets lost in the early exit path.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebase ontop of v3.14-rc1. ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-5-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:08:43 +01:00
Thomas Gleixner
8f47b1871b sched: Add better debug output for might_sleep()
might_sleep() can tell us where interrupts have been disabled, but we
have no idea what disabled preemption. Add some debug infrastructure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-4-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:08:08 +01:00
Thomas Gleixner
db273be2a7 sched: Check for idle task in might_sleep()
Idle is not allowed to call sleeping functions ever!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-3-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:07:50 +01:00
Thomas Gleixner
77177856e3 sched: Init idle->on_rq in init_idle()
We stumbled in RT over a SMP bringup issue on ARM where the
idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run
into nada land.

After adding that idle->on_rq = 1; I was able to find the root cause
of the lockup: the idle task on the newly woken up cpu was fiddling
with a sleeping spinlock, which is a nono.

I kept the init of idle->on_rq to keep the state consistent and to
avoid another long lasting debug session.

As a side note, the whole debug mess could have been avoided if
might_sleep() would have yelled when called from the idle task. That's
fixed with patch 2/6 - and that one actually has a changelog :)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391803122-4425-2-git-send-email-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22 18:07:36 +01:00
Peter Zijlstra
cd578abb24 perf/x86: Warn to early_printk() in case irq_work is too slow
On Mon, Feb 10, 2014 at 08:45:16AM -0800, Dave Hansen wrote:
> The reason I coded this up was that NMIs were firing off so fast that
> nothing else was getting a chance to run.  With this patch, at least the
> printk() would come out and I'd have some idea what was going on.

It will start spewing to early_printk() (which is a lot nicer to use
from NMI context too) when it fails to queue the IRQ-work because its
already enqueued.

It does have the false-positive for when two CPUs trigger the warn
concurrently, but that should be rare and some extra clutter on the
early printk shouldn't be a problem.

Cc: hpa@zytor.com
Cc: tglx@linutronix.de
Cc: dzickus@redhat.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: mingo@kernel.org
Fixes: 6a02ad66b2c4 ("perf/x86: Push the duration-logging printk() to IRQ context")
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140211150116.GO27965@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:49:07 +01:00
Peter Zijlstra
dc87734106 sched: Remove some #ifdeffery
Remove a few gratuitous #ifdefs in pick_next_task*().

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21 21:43:18 +01:00