299548 Commits

Author SHA1 Message Date
Paul E. McKenney
dc36be4419 Merge branches 'barrier.2012.05.09a', 'fixes.2012.04.26a', 'inline.2012.05.02b' and 'srcu.2012.05.07b' into HEAD
barrier:  Reduce the amount of disturbance by rcu_barrier() to the rest of
    	the system.  This branch also includes improvements to
    	RCU_FAST_NO_HZ, which are included here due to conflicts.
fixes:  Miscellaneous fixes.
inline:  Remaining changes from an abortive attempt to inline
    	preemptible RCU's __rcu_read_lock().  These are (1) making
    	exit_rcu() avoid unnecessary work and (2) avoiding having
    	preemptible RCU record a blocked thread when the scheduler
    	declines to do a context switch.
srcu:	Lai Jiangshan's algorithmic implementation of SRCU, including
    	call_srcu().
2012-05-11 10:14:21 -07:00
Paul E. McKenney
b1420f1c8b rcu: Make rcu_barrier() less disruptive
The rcu_barrier() primitive interrupts each and every CPU, registering
a callback on every CPU.  Once all of these callbacks have been invoked,
rcu_barrier() knows that every callback that was registered before
the call to rcu_barrier() has also been invoked.

However, there is no point in registering a callback on a CPU that
currently has no callbacks, most especially if that CPU is in a
deep idle state.  This commit therefore makes rcu_barrier() avoid
interrupting CPUs that have no callbacks.  Doing this requires reworking
the handling of orphaned callbacks, otherwise callbacks could slip through
rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
not yet interrupted to a CPU that rcu_barrier() had already interrupted.
This reworking was needed anyway to take a first step towards weaning
RCU from the CPU_DYING notifier's use of stop_cpu().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-09 14:27:54 -07:00
Paul E. McKenney
98248a0e24 rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables
The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
needless and fragile assumptions about the initial value of things like
the jiffies counter.  This commit therefore explicitly initializes all of
them that are better started with a non-zero value.  It also adds some
comments describing the per-CPU state variables.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-09 14:26:57 -07:00
Paul E. McKenney
21e52e1566 rcu: Make RCU_FAST_NO_HZ handle timer migration
The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
CPU goes offline, in which case it assumes that the CPU will have to come
out of dyntick-idle mode (cancelling the timer) in order to go offline.
This is important because when RCU_FAST_NO_HZ permits a CPU to enter
dyntick-idle mode despite having RCU callbacks pending, it posts a timer
on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
CPU will eventually handle the end of the grace period, including invoking
its RCU callbacks.

However, Pascal Chapperon's test setup shows that the timer handler
rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
problematic because this can cause the CPU that entered dyntick-idle
mode despite still having RCU callbacks pending to remain in
dyntick-idle mode indefinitely, which means that its RCU callbacks might
never be invoked.  This situation can result in grace-period delays or
even system hangs, which matches Pascal's observations of slow boot-up
and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:

	https://bugzilla.redhat.com/show_bug.cgi?id=806548

This commit therefore causes the "should never be invoked" timer handler
rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
the CPU for which the timer was intended, allowing that CPU to invoke
its RCU callbacks in a timely manner.

Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-09 14:26:56 -07:00
Paul E. McKenney
9fab97876a rcu: Update RCU maintainership
Split SRCU out and add Lai Jiangshan as SRCU co-maintainer.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-07 09:36:34 -07:00
Paul E. McKenney
9dd8fb16c3 rcu: Make exit_rcu() more precise and consolidate
When running preemptible RCU, if a task exits in an RCU read-side
critical section having blocked within that same RCU read-side critical
section, the task must be removed from the list of tasks blocking a
grace period (perhaps the current grace period, perhaps the next grace
period, depending on timing).  The exit() path invokes exit_rcu() to
do this cleanup.

However, the current implementation of exit_rcu() needlessly does the
cleanup even if the task did not block within the current RCU read-side
critical section, which wastes time and needlessly increases the size
of the state space.  Fix this by only doing the cleanup if the current
task is actually on the list of tasks blocking some grace period.

While we are at it, consolidate the two identical exit_rcu() functions
into a single function.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>

Conflicts:

	kernel/rcupdate.c
2012-05-02 14:48:27 -07:00
Paul E. McKenney
616c310e83 rcu: Move PREEMPT_RCU preemption to switch_to() invocation
Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler.
This is inefficient because enqueuing is required only if there is a
context switch, and entry to the scheduler does not guarantee a context
switch.

The commit therefore moves the enqueuing to immediately precede the
call to switch_to() from the scheduler.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-02 14:43:23 -07:00
Paul E. McKenney
f511fc6246 rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU
Timers are subject to migration, which can lead to the following
system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

1.	CPU 0 executes synchronize_rcu(), which posts an RCU callback.

2.	CPU 0 then goes idle.  It cannot immediately invoke the callback,
	but there is nothing RCU needs from ti, so it enters dyntick-idle
	mode after posting a timer.

3.	The timer gets migrated to CPU 1.

4.	CPU 0 never wakes up, so the synchronize_rcu() never returns, so
	the system hangs.

This commit fixes this problem by using mod_timer_pinned(), as suggested
by Peter Zijlstra, to ensure that the timer is actually posted on the
running CPU.

Reported-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-01 08:22:50 -07:00
Lai Jiangshan
9059c94017 rcu: Add rcutorture test for call_srcu()
Add srcu_torture_deferred_free() for srcu_ops so as to test the new
call_srcu().  Rename the original srcu_ops to srcu_sync_ops.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:26 -07:00
Lai Jiangshan
931ea9d1a6 rcu: Implement per-domain single-threaded call_srcu() state machine
This commit implements an SRCU state machine in support of call_srcu().
The state machine is preemptible, light-weight, and single-threaded,
minimizing synchronization overhead.  In particular, there is no longer
any need for synchronize_srcu() to be guarded by a mutex.

Expedited processing is handled, at least in the absence of concurrent
grace-period operations on that same srcu_struct structure, by having
the synchronize_srcu_expedited() thread take on the role of the
workqueue thread for one iteration.

There is a reasonable probability that a given SRCU callback will
be invoked on the same CPU that registered it, however, there is no
guarantee.  Concurrent SRCU grace-period primitives can cause callbacks
to be executed elsewhere, even in absence of CPU-hotplug operations.

Callbacks execute in process context, but under the influence of
local_bh_disable(), so it is illegal to sleep in an SRCU callback
function.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:25 -07:00
Lai Jiangshan
d9792edd7a rcu: Use single value to handle expedited SRCU grace periods
The earlier algorithm used an "expedited" flag combined with a "trycount"
counter to differentiate between normal and expedited SRCU grace periods.
However, the difference can be encoded into a single counter with a cutoff
value and different initial values for expedited and normal SRCU grace
periods.  This commit makes that change.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Conflicts:

	kernel/srcu.c
2012-04-30 10:48:24 -07:00
Lai Jiangshan
dc87917501 rcu: Improve srcu_readers_active_idx()'s cache locality
Expand the calls to srcu_readers_active_idx() from srcu_readers_active()
inline.  This change improves cache locality by interating over the CPUs
once rather than twice.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:24 -07:00
Lai Jiangshan
966f58c2f6 rcu: Remove unused srcu_barrier()
The old srcu_barrier() macro is now unused.  This commit removes it so
that it may be used for the SRCU flavor of rcu_barrier(), which will in
turn be needed to allow the upcoming call_srcu() to be used from within
modules.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:23 -07:00
Lai Jiangshan
b52ce066c5 rcu: Implement a variant of Peter's SRCU algorithm
This commit implements a variant of Peter's algorithm, which may be found
at https://lkml.org/lkml/2012/2/1/119.

o	Make the checking lock-free to enable parallel checking.
	Parallel checking is required when (1) the original checking
	task is preempted for a long time, (2) sychronize_srcu_expedited()
	starts during an ongoing SRCU grace period, or (3) we wish to
	avoid acquiring a lock.

o	Since the checking is lock-free, we avoid a mutex in state machine
	for call_srcu().

o	Remove the SRCU_REF_MASK and remove the coupling with the flipping.
	This might allow us to remove the preempt_disable() in future
	versions, though such removal will need great care because it
	rescinds the one-old-reader-per-CPU guarantee.

o	Remove a smp_mb(), simplify the comments and make the smp_mb() pairs
	more intuitive.

Inspired-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:22 -07:00
Lai Jiangshan
18108ebfeb rcu: Improve SRCU's wait_idx() comments
The safety of SRCU is provided byy wait_idx() rather than flipping.
The flipping actually prevents starvation.

This commit therefore updates the comments to more accurately and
precisely describe what is going on.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:22 -07:00
Lai Jiangshan
944ce9af47 rcu: Flip ->completed only once per SRCU grace period
This is an optimization of the SRCU grace period.  To guard against
preempted readers with old values of the counter, it suffices to scan the
old counters once more, then flip ->completed only one time.  The reason
this works is that the old readers must have incremented the old set of
counters (if they have not yet incremented, then their critical section
starts after this grace period, so they may be safely ignored).

This commit therefore optimizes the second flip out in favor of a simple
rescan.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:21 -07:00
Lai Jiangshan
440253c17f rcu: Increment upper bit only for srcu_read_lock()
The purpose of the upper bit of SRCU's per-CPU counters is to guarantee
that no reasonable series of srcu_read_lock() and srcu_read_unlock()
operations can return the value of the counter to its original value.
This guarantee is require only after the index has been switched to
the other set of counters, so at most one srcu_read_lock() can affect
a given CPU's counter.  The number of srcu_read_unlock() operations
on a given counter is limited to the number of tasks in the system,
which given the Linux kernel's current structure is limited to far less
than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems.
(Something about a limited number of bytes in the kernel's address space.)

Therefore, if srcu_read_lock() increments the upper bits, then
srcu_read_unlock() need not do so.  In this case, an srcu_read_lock() and
an srcu_read_unlock() will flip the lower bit of the upper field of the
counter.  An unreasonably large additional number of srcu_read_unlock()
operations would be required to return the counter to its initial value,
thus preserving the guarantee.

This commit takes this approach, which further allows it to shrink
the size of the upper field to one bit, making the number of
srcu_read_unlock() operations required to return the counter to its
initial value even more unreasonable than before.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:20 -07:00
Lai Jiangshan
4b7a3e9e32 rcu: Remove fast check path from __synchronize_srcu()
The fastpath in __synchronize_srcu() is designed to handle cases where
there are a large number of concurrent calls for the same srcu_struct
structure.  However, the Linux kernel currently does not use SRCU in
this manner, so remove the fastpath checks for simplicity.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:20 -07:00
Paul E. McKenney
cef50120b6 rcu: Direct algorithmic SRCU implementation
The current implementation of synchronize_srcu_expedited() can cause
severe OS jitter due to its use of synchronize_sched(), which in turn
invokes try_stop_cpus(), which causes each CPU to be sent an IPI.
This can result in severe performance degradation for real-time workloads
and especially for short-interation-length HPC workloads.  Furthermore,
because only one instance of try_stop_cpus() can be making forward progress
at a given time, only one instance of synchronize_srcu_expedited() can
make forward progress at a time, even if they are all operating on
distinct srcu_struct structures.

This commit, inspired by an earlier implementation by Peter Zijlstra
(https://lkml.org/lkml/2012/1/31/211) and by further offline discussions,
takes a strictly algorithmic bits-in-memory approach.  This has the
disadvantage of requiring one explicit memory-barrier instruction in
each of srcu_read_lock() and srcu_read_unlock(), but on the other hand
completely dispenses with OS jitter and furthermore allows SRCU to be
used freely by CPUs that RCU believes to be idle or offline.

The update-side implementation handles the single read-side memory
barrier by rechecking the per-CPU counters after summing them and
by running through the update-side state machine twice.

This implementation has passed moderate rcutorture testing on both
x86 and Power.  Also updated to use this_cpu_ptr() instead of per_cpu_ptr(),
as suggested by Peter Zijlstra.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2012-04-30 10:48:19 -07:00
Paul E. McKenney
fae4b54f28 rcu: Introduce rcutorture testing for rcu_barrier()
Although rcutorture does invoke rcu_barrier() and friends, it cannot
really be called a torture test given that it invokes them only once
at the end of the test.  This commit therefore introduces heavy-duty
rcutorture testing for rcu_barrier(), which may be carried out
concurrently with normal rcutorture testing.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-30 10:48:18 -07:00
Paul E. McKenney
048a0e8f5e timer: Fix mod_timer_pinned() header comment
The mod_timer_pinned() header comment states that it prevents timers
from being migrated to a different CPU.  This is not the case, instead,
it ensures that the timer is posted to the current CPU, but does nothing
to prevent CPU-hotplug operations from migrating the timer.

This commit therefore brings the comment header into alignment with
reality.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2012-04-26 12:28:03 -07:00
Paul E. McKenney
79b9a75fb7 rcu: Add warning for RCU_FAST_NO_HZ timer firing
RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
can remain in dyntick-idle mode.  This timer is cancelled when the CPU
exits idle, and therefore should never fire.  However, if the timer
were migrated to some other CPU for whatever reason (1) the timer could
actually fire and (2) firing on some other CPU would fail to wake up the
CPU with callbacks, possibly resulting in sluggishness or a system hang.

This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
to detect this condition.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-26 08:49:05 -07:00
Paul E. McKenney
37e377d282 rcu: Fixes to rcutorture error handling and cleanup
The rcutorture initialization code ignored the error returns from
rcu_torture_onoff_init() and rcu_torture_stall_init().  The rcutorture
cleanup code failed to NULL out a number of pointers.  These bugs will
normally have no effect, but this commit fixes them nevertheless.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:55:39 -07:00
Paul E. McKenney
c57afe80db rcu: Make RCU_FAST_NO_HZ account for pauses out of idle
Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
macro can cause RCU to momentarily pause out of idle without the rest
of the system being involved.  This can cause rcu_prepare_for_idle()
to run through its state machine too quickly, which can in turn result
in needless scheduling-clock interrupts.

This commit therefore adds code to enable rcu_prepare_for_idle() to
distinguish between an initial entry to idle on the one hand (which needs
to advance the rcu_prepare_for_idle() state machine) and an idle reentry
due to idle-capable trace macros and RCU_NONIDLE() on the other hand
(which should avoid advancing the rcu_prepare_for_idle() state machine).
Additional state is maintained to allow the timer to be correctly reposted
when returning after a momentary pause out of idle, and even more state
is maintained to detect when new non-lazy callbacks have been enqueued
(which may require re-evaluation of the approach to idleness).

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:55:20 -07:00
Paul E. McKenney
2ee3dc8066 rcu: Make RCU_FAST_NO_HZ use timer rather than hrtimer
The RCU_FAST_NO_HZ facility uses an hrtimer to wake up a CPU when
it is allowed to go into dyntick-idle mode, which is almost always
cancelled soon after.  This is not what hrtimers are good at, so
this commit switches to the timer wheel.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:55:19 -07:00
Paul E. McKenney
2fdbb31b66 rcu: Add RCU_FAST_NO_HZ tracing for idle exit
Traces of rcu_prep_idle events can be confusing because
rcu_cleanup_after_idle() does no tracing.  This commit therefore adds
this tracing.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:55:19 -07:00
Paul E. McKenney
6d8133919b rcu: Document why rcu_blocking_is_gp() is safe
The rcu_blocking_is_gp() function tests to see if there is only one
online CPU, and if so, synchronize_sched() and friends become no-ops.
However, for larger systems, num_online_cpus() scans a large vector,
and might be preempted while doing so.  While preempted, any number
of CPUs might come online and go offline, potentially resulting in
num_online_cpus() returning 1 when there never had only been one
CPU online.  This could result in a too-short RCU grace period, which
could in turn result in total failure, except that the only way that
the grace period is too short is if there is an RCU read-side critical
section spanning it.  For RCU-sched and RCU-bh (which are the only
cases using rcu_blocking_is_gp()), RCU read-side critical sections
have either preemption or bh disabled, which prevents CPUs from going
offline.  This in turn prevents actual failures from occurring.

This commit therefore adds a large block comment to rcu_blocking_is_gp()
documenting why it is safe.  This commit also moves rcu_blocking_is_gp()
into kernel/rcutree.c, which should help prevent unwary developers from
mistaking it for a generally useful function.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:53 -07:00
Paul E. McKenney
dabb8aa960 rcu: Document kernel command-line parameters
Bring RCU's kernel command-line parameter documentation up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:53 -07:00
Paul E. McKenney
8932a63d5e rcu: Reduce cache-miss initialization latencies for large systems
Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
needed to reduce lock contention that was induced by the synchronization
of scheduling-clock interrupts, which was in turn needed to improve
energy efficiency for moderate-sized lightly loaded servers.

However, reducing the leaf-level fanout means that there are more
leaf-level rcu_node structures in the tree, which in turn means that
RCU's grace-period initialization incurs more cache misses.  This is
not a problem on moderate-sized servers with only a few tens of CPUs,
but becomes a major source of real-time latency spikes on systems with
many hundreds of CPUs.  In addition, the workloads running on these large
systems tend to be CPU-bound, which eliminates the energy-efficiency
advantages of synchronizing scheduling-clock interrupts.  Therefore,
these systems need maximal values for the rcu_node leaf-level fanout.

This commit addresses this problem by introducing a new kernel parameter
named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
This parameter defaults to 16 to handle the common case of a moderate
sized lightly loaded servers, but may be set higher on larger systems.

Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:52 -07:00
Jan Engelhardt
d8169d4c36 rcu: Make __kfree_rcu() less dependent on compiler choices
Currently, __kfree_rcu() is implemented as an inline function, and
contains a BUILD_BUG_ON() that malfunctions if __kfree_rcu() is compiled
as an out-of-line function.  Unfortunately, there are compiler settings
(e.g., -O0) that can result in __kfree_rcu() being compiled out of line,
resulting in annoying build breakage.  This commit therefore converts
both __kfree_rcu() and __is_kfree_rcu_offset() from inline functions to
macros to prevent such misbehavior on the part of the compiler.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-04-24 20:54:51 -07:00
Paul E. McKenney
c9336643e1 rcu: Clarify help text for RCU_BOOST_PRIO
The old text confused real-time applications with real-time threads, so
that you pretty much needed to understand how this kernel configuration
parameter worked to understand the help text.  This commit therefore
attempts to make the help text human-readable.

Reported-by: Jörn Engel <joern@purestorage.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:50 -07:00
Michel Machado
f88022a4f6 rcu: Replace list_first_entry_rcu() with list_first_or_null_rcu()
The list_first_entry_rcu() macro is inherently unsafe because it cannot
be applied to an empty list.  But because RCU readers do not exclude
updaters, a list might become empty between the time that list_empty()
claimed it was non-empty and the time that list_first_entry_rcu() is
invoked.  Therefore, the list_empty() test cannot be separated from the
list_first_entry_rcu() call.  This commit therefore combines these to
macros to create a new list_first_or_null_rcu() macro that replaces
the old (and unsafe) list_first_entry_rcu() macro.

This patch incorporates Paul's review comments on the previous version of
this patch available here:

https://lkml.org/lkml/2012/4/2/536

This patch cannot break any upstream code because list_first_entry_rcu()
is not being used anywhere in the kernel (tested with grep(1)), and any
external code using it is probably broken as a result of using it.

Signed-off-by: Michel Machado <michel@digirati.com.br>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:49 -07:00
Dave Jones
559f9badd1 rcu: List-debug variants of rcu list routines.
* Make __list_add_rcu check the next->prev and prev->next pointers
  just like __list_add does.
* Make list_del_rcu use __list_del_entry, which does the same checking
  at deletion time.

Has been running for a week here without anything being tripped up,
but it seems worth adding for completeness just in case something
ever does corrupt those lists.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24 20:54:49 -07:00
Linus Torvalds
66f75a5d02 Linux 3.4-rc4 2012-04-21 14:47:52 -07:00
Yong Zhang
e9a5ea1852 sparc32,leon: add notify_cpu_starting()
Otherwise cpu_active_mask will not set, which lead to other issue.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Konrad Eisele <konrad@gaisler.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:35:06 -04:00
Linus Torvalds
8f4f9d4d3c ARM: SoC fixes for 3.4-rc
- at91, ux500, imx, omap and bcmring:
   - at91 fixes for =m driver build issues, irqdomain fixes and config
     dependency fixes
   - ux500 kconfig dependency fixes and a  smp wakeup bugfix
   - imx idle bugfix and build fix due to irq domain changes
   - omap uart pinmux fixes, softreset regression revert and misc fixes
   - bcmring build error regression fix
 
 - ux500 and imx had some small defconfig updates in this branch
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPksdXAAoJEIwa5zzehBx3iqIP/ibJhM5QYWCXCoudOHouvXW7
 FALYsWTJodKf3qN0SQtty3RdmEKjdvCHGPcwtSEtjIc5xngCtYPEd5JWNl1u94nb
 f0+rhELfHljG+5XSyqnZfPmAN74ApvULl52hGXVudLjEwCB4uoYV5BN4c0dyr3Of
 Wm1+9HUyUo/WwXbE9UxxJkLDVsB+eAm2iOeAcerxCqsgKUzUqGP3fgp2eVJ7Q8LT
 f9tiSzaLeQnbYVymNeAiCzk3L9lKFx5r3QoxH2QOW6ieNUqAZC11X3L9anj30joG
 Ns1dMf5jGbWoSbHGMbff+PWj1sRxOzuoksjZ/jEZ2eXgNF4maoFrPqFZe6+o4K36
 pekjYwKfeuKdT9JXPSwJe3yQULxfWMTwvg2ZF86R9KOIcRGI2gwGDDTYZ4LOr8mp
 vQbOMWRGDNFxzWPZA9BfMDnG5AQxuoBlprnZQht2HqLar0dbHTx0qsqoxttOwenG
 GwnLG0ZiwsCkrXcAJ/PSSlHfmhEE37H8NsCzBXMeF1kpWwDJZROjEOdDQREuwtYB
 qcQ7GfQ9u8ysemMbXyrAM+SySsc9r8HXw/J2NMIlEBSdPba6CIgIQigUMYnWgFt8
 oJAPuTsT4/VIJbqCLkEOai3m5oaGllZ0zd5+SxzSbmmhybtFUrttjMP4WCUTqtuX
 lEKPTuNko5mb+2q6MEW0
 =mzPZ
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull "ARM: SoC fixes" from Olof Johansson:
 * at91, ux500, imx, omap and bcmring:
  - at91 fixes for =m driver build issues, irqdomain fixes and config
    dependency fixes
  - ux500 kconfig dependency fixes and a  smp wakeup bugfix
  - imx idle bugfix and build fix due to irq domain changes
  - omap uart pinmux fixes, softreset regression revert and misc fixes
  - bcmring build error regression fix

 * ux500 and imx had some small defconfig updates in this branch

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (27 commits)
  ARM: bcmring: fix UART declarations
  ARM: imx: Fix imx5 idle logic bug
  ARM: imx27-dt: Fix build due to removal of irq_domain_add_simple()
  ARM: imx_v4_v5_defconfig: Add support for CONFIG_REGULATOR_FIXED_VOLTAGE
  ARM: OMAP1: DMTIMER: fix broken timer clock source selection
  ARM: OMAP: serial: Fix the ocp smart idlemode handling bug
  ARM: OMAP2+: UART: Fix incorrect population of default uart pads
  ARM: OMAP: sram: fix BUG in dpll code for !PM case
  dmaengine: Kconfig: fix Atmel at_hdmac entry
  USB: gadget/at91_udc: add gpio_to_irq() function to vbus interrupt
  USB: ohci-at91: change annotations for probe/remove functions
  leds-atmel-pwm.c: Make pwmled_probe() __devinit
  ARM: at91: fix at91sam9261ek Ethernet dm9000 irq
  ARM: at91: fix rm9200ek flash size
  ARM: at91: remove empty at91_init_serial function
  ARM: at91: fix typo in at91_pmc_base assembly declaration
  ARM: at91: Export at91_matrix_base
  ARM: at91: Export at91_pmc_base
  ARM: at91: Export at91_ramc_base
  ARM: at91: Export at91_st_base
  ...
2012-04-21 12:45:52 -07:00
Linus Torvalds
126a3483d6 MMC fixes for 3.4-rc4:
The major fixes here are:
   * Build fix for omap_hsmmc with OF against 3.4-rc1.
   * Fix CONFIG_MMC_UNSAFE_RESUME semantics regression against 3.3,
     which broke hotplug card detection when UNSAFE_RESUME is set.
   * Fix a race condition in omap_hsmmc with runtime PM.
   * Fix two libertas SDIO-powered-resume regressions.
  Also small fixes for discard/sanitize, dw_mmc, cd-gpio and esdhc-imx.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPkh1jAAoJEHNBYZ7TNxYMe0EQAIpvPan5J3eT9rQU2R0Fsgbd
 hd10NVOPQAQaBDc59r/jmTj5iqDQQ1PGkrHti94IECCPWr3dbeTamhmUfYt6+XXZ
 Ae9dlYQ8llKa6ZLSVPmNsMLmFnsqDek3LIsxk/viJIKqSGee/yOHeM6W7v6d2T+z
 GCLFtV3xADU4F9PrZxdfWOJ3kDsV9WCTYnn5/WR78maSURo8iAfck5u+isitO0rn
 NaXfgO29WIHlYmRHRpXMGk2pCmv8cMX8i89mBExeIWdfn40yGCZRi+kUb0VquR1W
 7lqdeKEF0NY1EU9ejuPZK2VepvxEhC1ibOlPkGfgHBd52YLrDaSl3WcqKdzKILIm
 wYKEls8AU5ZnWiuRpJq7QpPemTsbFQpI72O6OgntYhwiKE2TkJ0BY3KtjvFpVCxi
 s+zqZsQwIt6rjJZyQzg1Y5yGA42JF75rkzWnRtRfWcSma83MFiZawAVAV/qGZsrn
 EddUdOodvvO3owC9dD4wI0Q2eCRSZog3sVO9uSQg8zVHvfRGD61GBvqguyR8mbiH
 +CUAMMD9mpKiApu3gvHbjn3OjZuKQ9AGDz0SYj+PbQEwsEBF+qXs5BeytBC8S1Yh
 cNNdpvzjZl7EM/lJWywZ1XQ741Tg1Qgdgtx8njsoilfE51UTlR9TUd04V0bg0Fk2
 fS3P+BRKf+4WR+iGyXJ1
 =Zl8K
 -----END PGP SIGNATURE-----

Merge tag 'mmc-fixes-for-3.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
 - Build fix for omap_hsmmc with OF against 3.4-rc1.
 - Fix CONFIG_MMC_UNSAFE_RESUME semantics regression against 3.3, which
   broke hotplug card detection when UNSAFE_RESUME is set.
 - Fix a race condition in omap_hsmmc with runtime PM.
 - Fix two libertas SDIO-powered-resume regressions.
 - Small fixes for discard/sanitize, dw_mmc, cd-gpio and esdhc-imx.

* tag 'mmc-fixes-for-3.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: core: Do not pre-claim host in suspend
  mmc: dw_mmc: prevent NULL dereference for dma_ops
  mmc: unbreak sdhci-esdhc-imx on i.MX25
  mmc: cd-gpio: Include header to pickup exported symbol prototypes
  mmc: sdhci: refine non-removable card checking for card detection
  mmc: dw_mmc: Fix switch from DMA to PIO
  mmc: remove MMC bus legacy suspend/resume method
  mmc: omap_hsmmc: Get rid of of_have_populated_dt() usage
  mmc: omap_hsmmc: build fix for CONFIG_OF=y and CONFIG_MMC_OMAP_HS=m
  mmc: fixes for eMMC v4.5 sanitize operation
  mmc: fixes for eMMC v4.5 discard operation
2012-04-21 12:44:37 -07:00
Linus Torvalds
8898159650 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 - Fixes a regression at DVB core when switching from DVB-S2 to DVB-S on
   Kaffeine (Fedora 16 Bugzilla #812895);
 - Fixes a mutex unlock at an error condition at drx-k;
 - Fix winbond-cir set mode;
 - mt9m032: Fix a compilation breakage with some random Kconfig;
 - mt9m032: fix two dead locks;
 - xc5000: don't require an special firmware (that won't be provided by
   the vendor) just because the xtal frequency is different;
 - V4L DocBook: fix some typos at multi-plane formats description.

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] xc5000: support 32MHz & 31.875MHz xtal using the 41.024.5 firmware
  [media] V4L: mt9m032: fix compilation breakage
  [media] V4L: DocBook: Fix typos in the multi-plane formats description
  [media] V4L: mt9m032: fix two dead-locks
  [media] rc-core: set mode for winbond-cir
  [media] drxk: Does not unlock mutex if sanity check failed in scu_command()
  [media] dvb_frontend: Fix a regression when switching back to DVB-S
2012-04-21 12:43:23 -07:00
Linus Torvalds
9f24ff6f42 First MFD pull request for 3.4 fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPkXdoAAoJEIqAPN1PVmxKOuMP/3K87kcpwUUI/vA0pSPYf58T
 Q+Sxsd85C6c0SOvE3MOI+1stibLAXeeT+MsmMKYIhmAXbTtKsmMW5TC1aTapJHQx
 kDGuhqiw5Zyk5tPrZ333cLBdgiDDr8qWUBRzcNCK5O1xuDET76JtQwqtehSoDXDh
 Afcg3BLzYA3HIz0nm+Wlll1yeyKrAg20dESOCvl1ptNbb2BVBSfaBpOqTjw6R88J
 BRtua//L9HGHQIRntYnrH6/nzwDAhkrw2m3p1ZGWG+y5j88cQy4s0/dtZ7FJ8ZAE
 qoUx2YqH6dPYGZa2A6XaOkF4hvDC6iAXawWllvsDxcQSYRWR4qxmHYm5KkxyT6y9
 UACk+c7qdRmZgHfPcNNaq5CPDAEFvSFRKfDBpXUJdO6O/bVzBsA/P4fCjYFZ1FOC
 NQtouAbz2BpH1iwCMRWtTsCSwiVXSHQL/jR4vQrtXU6KwX1ArKF5W1zTvnbaK13c
 Bc9E4Se4Hn5Bs+FkJIbBnViAW/9gv7KUe9AtDjhcrUWkxZLswDnXhUd1k2x1Gxfp
 WQf29FZmoLiITA4ffsizqR6wC98lzIrHW29FdoSyTnz9SSoqo6J10l82w8ED45lJ
 wGanen7Txjsc2ub9GYqzCUYHGBitLfaQSkSvBIRSWc43Ju3b0l/esH12ioajjSEu
 sAvMHCkaR7l7NZVEt6rS
 =gHlK
 -----END PGP SIGNATURE-----

Merge tag 'mfd-for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

Pull MFD fixes from Samuel Ortiz:
 "We have 3 build fixes, a OMAP USB host PHY reset fix and the twl6040
  conversion to an i2c driver.  The latter may not sound like a fix but
  the twl6040 MFD driver won't probe without it, triggering an OMAP4
  audio regression."

* tag 'mfd-for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Fix modular builds of rc5t583 regulator support
  mfd: Fix asic3_gpio_to_irq
  ARM: OMAP3: USB: Fix the EHCI ULPI PHY reset issue
  mfd: Convert twl6040 to i2c driver, and separate it from twl core
  mfd : Fix dbx500 compilation error
2012-04-21 12:42:12 -07:00
Al Viro
bfce281c28 kill mm argument of vm_munmap()
it's always current->mm

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-21 01:58:20 -04:00
Al Viro
9f3a4afb27 perfmon: kill some helpers and arguments
pfm_vm_munmap() is simply vm_munmap() and pfm_remove_smpl_mapping()
always get current as the first argument.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-21 01:58:18 -04:00
Al Viro
936af1576e aio: don't bother with unmapping when aio_free_ring() is coming from exit_aio()
... since exit_mmap() is coming and it will munmap() everything anyway.
In all other cases aio_free_ring() has ctx->mm == current->mm; moreover,
all other callers of vm_munmap() have mm == current->mm, so this will
allow us to get rid of mm argument of vm_munmap().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-21 01:58:16 -04:00
Ulf Hansson
7c57091940 mmc: core: Do not pre-claim host in suspend
Since SDIO drivers may want to do some SDIO operations in their suspend
callback functions, we must not keep the host claimed when calling them.

Daniel Drake reported that libertas_sdio encountered a deadlock in its
suspend function.

Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Tested-by: Daniel Drake <dsd@laptop.org>
[stable@: please apply to 3.2-stable and 3.3-stable]
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 21:52:13 -04:00
Jaehoon Chung
e1631f989e mmc: dw_mmc: prevent NULL dereference for dma_ops
Now, dma_ops is assumed that use the IDMAC.  But if dma_ops is assigned
the pdata->dma_ops, we didn't ensure that callback function is defined.

If the callback isn't defined, then we should run in PIO mode.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 21:52:05 -04:00
Eric Bénard
b89152824f mmc: unbreak sdhci-esdhc-imx on i.MX25
This was broken by me in 37865fe91582582a6f6c00652f6a2b1ff71f8a78
("mmc: sdhci-esdhc-imx: fix timeout on i.MX's sdhci") where more
extensive tests would have shown that read or write of data to the
card were failing (even if the partition table was correctly read).

Signed-off-by: Eric Bénard <eric@eukrea.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 20:45:00 -04:00
H Hartley Sweeten
5ca6518832 mmc: cd-gpio: Include header to pickup exported symbol prototypes
Include the linux/mmc/cd-gpio.h header to pickup the prototypes
for the two exported symbols.

This quiets the sparse warnings:

warning: symbol 'mmc_cd_gpio_request' was not declared. Should it be static?
warning: symbol 'mmc_cd_gpio_free' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 20:45:00 -04:00
Daniel Drake
87b87a3fc0 mmc: sdhci: refine non-removable card checking for card detection
Commit c79396c191bc19 ("mmc: sdhci: prevent card detection activity
for non-removable cards") disables card detection where the cards
are marked as non-removable.

This makes sense, but the implementation detail of calling
mmc_card_is_removable() causes some problems, because
mmc_card_is_removable() is overloaded with CONFIG_MMC_UNSAFE_RESUME
semantics.

In the OLPC XO case, we need CONFIG_MMC_UNSAFE_RESUME because our root
filesystem is stored on SD, but we also have external SD card slots
where we want automatic card detection.

Refine the check to only apply to hosts marked as MMC_CAP_NONREMOVABLE,
which is defined to mean that the card is *really* nonremovable. This
could be revisited in future if we find a way to improve
CONFIG_MMC_UNSAFE_RESUME semantics.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Acked-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
[stable@: please apply to 3.3-stable]
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 20:44:25 -04:00
Seungwon Jeon
a99aa9b9b4 mmc: dw_mmc: Fix switch from DMA to PIO
When dw_mci_pre_dma_transfer returns failure in some reasons,
dw_mci_submit_data will prepare to switch the PIO mode from DMA.
After switching to PIO mode, DMA(IDMAC in particular) is still
enabled. This makes the corruption in handling interrupt and
the driver lock-up.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 20:30:37 -04:00
Chuanxiao Dong
32d317c60e mmc: remove MMC bus legacy suspend/resume method
MMC bus is using legacy suspend/resume method, which is not compatible if
runtime pm callbacks are used. In this scenario, MMC bus suspend/resume
callbacks cannot be called when system entering S3. So change to use the
new defined dev_pm_ops for system sleeping mode.

Tested on AM335x Platform. Solves major issue/crash reported at
http://www.mail-archive.com/linux-omap@vger.kernel.org/msg65425.html

Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Tested-by: Hebbar, Gururaja <gururaja.hebbar@ti.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Ulf Hansson <ulf.hansson@stericsson.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
2012-04-20 20:30:19 -04:00
Linus Torvalds
6be5ceb02e VM: add "vm_mmap()" helper function
This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.

This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c.  But that way we don't have
to export our internal do_mmap_pgoff() function.

Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead.  We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-20 17:29:13 -07:00