15123 Commits

Author SHA1 Message Date
Oleg Nesterov
806a98bdf2 uprobes: Rationalize the usage of filter_chain()
filter_chain() was added into install_breakpoint/remove_breakpoint to
simplify the initial changes but this is sub-optimal.

This patch shifts the callsite to the callers, register_for_each_vma()
and uprobe_mmap(). This way:

- It will be easier to add the new arguments. This is the main reason,
  we can do more optimizations later.

- register_for_each_vma(is_register => true) can be optimized, we only
  need to consult the new consumer. The previous consumers were already
  asked when they called uprobe_register().

This patch also moves the MMF_HAS_UPROBES check from remove_breakpoint(),
this allows to avoid the potentionally costly filter_chain(). Note that
register_for_each_vma(is_register => false) doesn't really need to take
->consumer_rwsem, but I don't think it makes sense to optimize this and
introduce filter_chain_lockless().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:10 +01:00
Oleg Nesterov
66d06dffa5 uprobes: Kill uprobes_mutex[], separate alloc_uprobe() and __uprobe_register()
uprobe_register() and uprobe_unregister() are the only users of
mutex_lock(uprobes_hash(inode)), and the only reason why we can't
simply remove it is that we need to ensure that delete_uprobe() is
not possible after alloc_uprobe() and before consumer_add().

IOW, we need to ensure that when we take uprobe->register_rwsem
this uprobe is still valid and we didn't race with _unregister()
which called delete_uprobe() in between.

With this patch uprobe_register() simply checks uprobe_is_active()
and retries if it hits this very unlikely race. uprobes_mutex[] is
no longer needed and can be removed.

There is another reason for this change, prepare_uprobe() should be
folded into alloc_uprobe() and we do not want to hold the extra locks
around read_mapping_page/etc.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:10 +01:00
Oleg Nesterov
06b7bcd8cb uprobes: Introduce uprobe_is_active()
The lifetime of uprobe->rb_node and uprobe->inode is not refcounted,
delete_uprobe() is called when we detect that uprobe has no consumers,
and it would be deadly wrong to do this twice.

Change delete_uprobe() to WARN() if it was already called. We use
RB_CLEAR_NODE() to mark uprobe "inactive", then RB_EMPTY_NODE() can
be used to detect this case.

RB_EMPTY_NODE() is not used directly, we add the trivial helper for
the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:09 +01:00
Oleg Nesterov
441f1eb7db uprobes: Kill uprobe_events, use RB_EMPTY_ROOT() instead
uprobe_events counts the number of uprobes in uprobes_tree but
it is used as a boolean. We can use RB_EMPTY_ROOT() instead.

Probably no_uprobe_events() added by this patch can have more
callers, say, mmf_recalc_uprobes().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:08 +01:00
Oleg Nesterov
d4d3ccc6d1 uprobes: Kill uprobe->copy_mutex
Now that ->register_rwsem is safe under ->mmap_sem we can kill
->copy_mutex and abuse down_write(&uprobe->consumer_rwsem).

This makes prepare_uprobe() even more ugly, but we should kill
it anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:08 +01:00
Oleg Nesterov
bb929284be uprobes: Kill UPROBE_RUN_HANDLER flag
Simply remove UPROBE_RUN_HANDLER and the corresponding code.

It can only help if uprobe has a single consumer, and in fact
it is no longer needed after handler_chain() was changed to use
->register_rwsem, we simply can not race with uprobe_register().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:06 +01:00
Oleg Nesterov
1ff6fee5e6 uprobes: Change filter_chain() to iterate ->consumers list
Now that it safe to use ->consumer_rwsem under ->mmap_sem we can
almost finish the implementation of filter_chain(). It still lacks
the actual uc->filter(...) call but othewrwise it is ready, just
it pretends that ->filter() always returns true.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:05 +01:00
Oleg Nesterov
e591c8d78e uprobes: Introduce uprobe->register_rwsem
Introduce uprobe->register_rwsem. It is taken for writing around
__uprobe_register/unregister.

Change handler_chain() to use this sem rather than consumer_rwsem.

The main reason for this change is that we have the nasty problem
with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
protect uprobe->consumers like handler_chain(), but they can not
use the same lock. filter_chain() can be called under ->mmap_sem
(currently this is always true), but we want to allow ->handler()
to play with the probed task's memory, and this needs ->mmap_sem.

Alternatively we could use srcu, but synchronize_srcu() is very
slow and ->register_rwsem allows us to do more. In particular, we
can teach handler_chain() to do remove_breakpoint() if this bp is
"nacked" by all consumers, we know that we can't race with the
new consumer which does uprobe_register().

See also the next patches. uprobes_mutex[] is almost ready to die.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:03 +01:00
Oleg Nesterov
9a98e03cc1 uprobes: _register() should always do register_for_each_vma(true)
To support the filtering uprobe_register() should do
register_for_each_vma(true) every time the new consumer comes,
we need to install the previously nacked breakpoints.

Note:
	- uprobes_mutex[] should die, what it actually protects is
	  alloc_uprobe().

	- UPROBE_RUN_HANDLER should die too, obviously it can't work
	  unless uprobe has a single consumer. The consumer should
	  serialize with _register/_unregister itself. Or this flag
	  should live in uprobe_consumer->state.

	- Perhaps we can do some optimizations later. For example, if
	  filter_chain() never returns false uprobe can record this
	  fact and avoid the unnecessary register_for_each_vma().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:03 +01:00
Oleg Nesterov
04aab9b200 uprobes: _unregister() should always do register_for_each_vma(false)
uprobe_unregister() removes the breakpoints only if the last consumer
goes away. To support the filtering it should do this every time, we
want to remove the breakpoints which nobody else want to keep.

Note: given that filter_chain() is not actually implemented, this patch
itself doesn't change the behaviour yet, register_for_each_vma(false)
is a heavy "nop" unless there are no more consumers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:03 +01:00
Oleg Nesterov
63633cbf82 uprobes: Introduce filter_chain()
Add the new helper filter_chain(). Currently it is only placeholder,
the comment explains what is should do. We will change it later to
consult every consumer to decide whether we need to install the swbp.
Until then it works as if any consumer returns true, this matches the
current behavior.

Change install_breakpoint() to call filter_chain() instead of checking
uprobe->consumers != NULL. We obviously need this, and this equally
closes the race with _unregister().

Change remove_breakpoint() to call this helper too. Currently this is
pointless because remove_breakpoint() is only called when the last
consumer goes away, but we will change this.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:02 +01:00
Oleg Nesterov
fe20d71f25 uprobes: Kill uprobe_consumer->filter()
uprobe_consumer->filter() is pointless in its current form, kill it.

We will add it back, but with the different signature/semantics. Perhaps
we will even re-introduce the callsite in handler_chain(), but not to
just skip uc->handler().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:02 +01:00
Oleg Nesterov
f0744af7d0 uprobes: Kill the pointless inode/uc checks in register/unregister
register/unregister verifies that inode/uc != NULL. For what?
This really looks like "hide the potential problem", the caller
should pass the valid data.

register() also checks uc->next == NULL, probably to prevent the
double-register but the caller can do other stupid/wrong things.
If we do this check, then we should document that uc->next should
be cleared before register() and add BUG_ON().

Also add the small comment about the i_size_read() check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:47:01 +01:00
Oleg Nesterov
bbc33d0593 uprobes: Move __set_bit(UPROBE_SKIP_SSTEP) into alloc_uprobe()
Cosmetic. __set_bit(UPROBE_SKIP_SSTEP) is the part of initialization,
it is not clear why it is set in insert_uprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-02-08 17:46:59 +01:00
Dave Airlie
5845b81bda Revert "Revert "console: implement lockdep support for console_lock""
This reverts commit ff0d05bf73620eb7dc8aee7423e992ef87870bdf.

Now that we have all the locking fixes in place, we can revert the
revert. This re-enables lockdep tracking for the console lock,
daee779718a319ff9f83e1ba3339334ac650bb22.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-02-08 12:07:01 +10:00
H. Peter Anvin
63a3f60341 timeconst.pl: Eliminate Perl warning
defined(@array) is deprecated in Perl and gives off a warning.
Restructure the code to remove that warning.

[ hpa: it would be interesting to revert to the timeconst.bc script.
  It appears that the failures reported by akpm during testing of
  that script was due to a known broken version of make, not a problem
  with bc.  The Makefile rules could probably be restructured to avoid
  the make bug, or it is probably old enough that it doesn't matter. ]

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@vger.kernel.org>
2013-02-07 17:14:08 -08:00
Lai Jiangshan
7a6b55e710 srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock()
The old SRCU implementation loads sp->completed within an
RCU-sched section, courtesy of preempt_disable().  This was required
due to the use of synchronize_sched() in the old implemenation's
synchronize_srcu().  However, the new implementation does not rely
on synchronize_sched(), so it in turn does not require the load of
sp->completed and the ->c[] counter to be in a single preempt-disabled
region of code.  This commit therefore moves the sp->completed access
outside of the preempt-disabled region and applies ACCESS_ONCE().

The resulting code is almost as the same as before, but it removes the
now-misleading rcu_dereference_index_check() call.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:19:36 -08:00
Lai Jiangshan
49271ca606 srcu: Update synchronize_srcu_expedited()'s comments
Because synchronize_srcu_expedited() no longer uses
synchronize_rcu_sched_expedited(), synchronize_srcu_expedited() no longer
indirectly acquires any CPU-hotplug-related locks.  This commit therefore
updates the comments accordingly.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:17:45 -08:00
Lai Jiangshan
34a64b6bb6 srcu: Update synchronize_srcu()'s comments
The core of SRCU is changed, but synchronize_srcu()'s comments describe
the old algorithm.  This commit therefore updates them to match the
new algorithm.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:17:01 -08:00
Lai Jiangshan
ab4d2986e4 srcu: Simple cleanup for cleanup_srcu_struct()
Pack six lines of code into two lines.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:10:10 -08:00
Lai Jiangshan
6e6f1b307e srcu: Add might_sleep() annotation to synchronize_srcu()
Although synchronize_srcu() can sleep, it will not sleep if the fast
path succeeds, which means that illegal use of synchronize_rcu()
might go unnoticed.  This commit therefore adds might_sleep(), which
unconditionally catches illegal use of synchronize_rcu() from atomic
context.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:08:19 -08:00
Lai Jiangshan
5a41344a3d srcu: Simplify __srcu_read_unlock() via this_cpu_dec()
This commit replaces disabling of preemption and decrement of a per-CPU
variable with this_cpu_dec(), which avoids preemption disabling on x86
and shortens the code on all platforms.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-02-07 15:06:25 -08:00
Lai Jiangshan
8594fade39 workqueue: pick cwq instead of pool in __queue_work()
Currently, __queue_work() chooses the pool to queue a work item to and
then determines cwq from the target wq and the chosen pool.  This is a
bit backwards in that we can determine cwq first and simply use
cwq->pool.  This way, we can skip get_std_worker_pool() in queueing
path which will be a hurdle when implementing custom worker pools.

Update __queue_work() such that it chooses the target cwq and then use
cwq->pool instead of the other way around.  While at it, add missing
{} in an if statement.

This patch doesn't introduce any functional changes.

tj: The original patch had two get_cwq() calls - the first to
    determine the pool by doing get_cwq(cpu, wq)->pool and the second
    to determine the matching cwq from get_cwq(pool->cpu, wq).
    Updated the function such that it chooses cwq instead of pool and
    removed the second call.  Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-07 13:17:51 -08:00
Lai Jiangshan
54d5b7d079 workqueue: make get_work_pool_id() cheaper
get_work_pool_id() currently first obtains pool using get_work_pool()
and then return pool->id.  For an off-queue work item, this involves
obtaining pool ID from worker->data, performing idr_find() to find the
matching pool and then returning its pool->id which of course is the
same as the one which went into idr_find().

Just open code WORK_STRUCT_CWQ case and directly return pool ID from
work->data.

tj: The original patch dropped on-queue work item handling and renamed
    the function to offq_work_pool_id().  There isn't much benefit in
    doing so.  Handling it only requires a single if() and we need at
    least BUG_ON(), which is also a branch, even if we drop on-queue
    handling.  Open code WORK_STRUCT_CWQ case and keep the function in
    line with get_work_pool().  Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-07 13:14:20 -08:00
Tejun Heo
e19e397a85 workqueue: move nr_running into worker_pool
As nr_running is likely to be accessed from other CPUs during
try_to_wake_up(), it was kept outside worker_pool; however, while less
frequent, other fields in worker_pool are accessed from other CPUs
for, e.g., non-reentrancy check.  Also, with recent pool related
changes, accessing nr_running matching the worker_pool isn't as simple
as it used to be.

Move nr_running inside worker_pool.  Keep it aligned to cacheline and
define CPU pools using DEFINE_PER_CPU_SHARED_ALIGNED().  This should
give at least the same cacheline behavior.

get_pool_nr_running() is replaced with direct pool->nr_running
accesses.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
2013-02-07 13:14:20 -08:00
Clark Williams
8bd75c77b7 sched/rt: Move rt specific bits into new header file
Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07 20:51:08 +01:00
Clark Williams
ce0dbbbb30 sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
Add a /proc/sys/kernel scheduler knob named
sched_rr_timeslice_ms that allows global changing of the
SCHED_RR timeslice value. User visable value is in milliseconds
but is stored as jiffies.  Setting to 0 (zero) resets to the
default (currently 100ms).

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094704.13751796@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07 20:51:07 +01:00
Clark Williams
cf4aebc292 sched: Move sched.h sysctl bits into separate header
Move the sysctl-related bits from include/linux/sched.h into
a new file: include/linux/sched/sysctl.h. Then update source
files requiring access to those bits by including the new
header file.

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07 20:50:54 +01:00
Tejun Heo
1606283622 workqueue: cosmetic update in try_to_grab_pending()
With the recent is-work-queued-here test simplification, the nested
if() in try_to_grab_pending() can be collapsed.  Collapse it.

This patch is purely cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-02-06 18:04:53 -08:00
Lai Jiangshan
0b3dae68ac workqueue: simplify is-work-item-queued-here test
Currently, determining whether a work item is queued on a locked pool
involves somewhat convoluted memory barrier dancing.  It goes like the
following.

* When a work item is queued on a pool, work->data is updated before
  work->entry is linked to the pending list with a wmb() inbetween.

* When trying to determine whether a work item is currently queued on
  a pool pointed to by work->data, it locks the pool and looks at
  work->entry.  If work->entry is linked, we then do rmb() and then
  check whether work->data points to the current pool.

This works because, work->data can only point to a pool if it
currently is or were on the pool and,

* If it currently is on the pool, the tests would obviously succeed.

* It it left the pool, its work->entry was cleared under pool->lock,
  so if we're seeing non-empty work->entry, it has to be from the work
  item being linked on another pool.  Because work->data is updated
  before work->entry is linked with wmb() inbetween, work->data update
  from another pool is guaranteed to be visible if we do rmb() after
  seeing non-empty work->entry.  So, we either see empty work->entry
  or we see updated work->data pointin to another pool.

While this works, it's convoluted, to put it mildly.  With recent
updates, it's now guaranteed that work->data points to cwq only while
the work item is queued and that updating work->data to point to cwq
or back to pool is done under pool->lock, so we can simply test
whether work->data points to cwq which is associated with the
currently locked pool instead of the convoluted memory barrier
dancing.

This patch replaces the memory barrier based "are you still here,
really?" test with much simpler "does work->data points to me?" test -
if work->data points to a cwq which is associated with the currently
locked pool, the work item is guaranteed to be queued on the pool as
work->data can start and stop pointing to such cwq only under
pool->lock and the start and stop coincide with queue and dequeue.

tj: Rewrote the comments and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-06 18:04:53 -08:00
Lai Jiangshan
4468a00fd9 workqueue: make work->data point to pool after try_to_grab_pending()
We plan to use work->data pointing to cwq as the synchronization
invariant when determining whether a given work item is on a locked
pool or not, which requires work->data pointing to cwq only while the
work item is queued on the associated pool.

With delayed_work updated not to overload work->data for target
workqueue recording, the only case where we still have off-queue
work->data pointing to cwq is try_to_grab_pending() which doesn't
update work->data after stealing a queued work item.  There's no
reason for try_to_grab_pending() to not update work->data to point to
the pool instead of cwq, like the normal execution does.

This patch adds set_work_pool_and_keep_pending() which makes
work->data point to pool instead of cwq but keeps the pending bit
unlike set_work_pool_and_clear_pending() (surprise!).

After this patch, it's guaranteed that only queued work items point to
cwqs.

This patch doesn't introduce any visible behavior change.

tj: Renamed the new helper function to match
    set_work_pool_and_clear_pending() and rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-06 18:04:53 -08:00
Lai Jiangshan
60c057bca2 workqueue: add delayed_work->wq to simplify reentrancy handling
To avoid executing the same work item from multiple CPUs concurrently,
a work_struct records the last pool it was on in its ->data so that,
on the next queueing, the pool can be queried to determine whether the
work item is still executing or not.

A delayed_work goes through timer before actually being queued on the
target workqueue and the timer needs to know the target workqueue and
CPU.  This is currently achieved by modifying delayed_work->work.data
such that it points to the cwq which points to the target workqueue
and the last CPU the work item was on.  __queue_delayed_work()
extracts the last CPU from delayed_work->work.data and then combines
it with the target workqueue to create new work.data.

The only thing this rather ugly hack achieves is encoding the target
workqueue into delayed_work->work.data without using a separate field,
which could be a trade off one can make; unfortunately, this entangles
work->data management between regular workqueue and delayed_work code
by setting cwq pointer before the work item is actually queued and
becomes a hindrance for further improvements of work->data handling.

This can be easily made sane by adding a target workqueue field to
delayed_work.  While delayed_work is used widely in the kernel and
this does make it a bit larger (<5%), I think this is the right
trade-off especially given the prospect of much saner handling of
work->data which currently involves quite tricky memory barrier
dancing, and don't expect to see any measureable effect.

Add delayed_work->wq and drop the delayed_work->work.data overloading.

tj: Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-06 18:04:53 -08:00
Lai Jiangshan
038366c5cf workqueue: make work_busy() test WORK_STRUCT_PENDING first
Currently, work_busy() first tests whether the work has a pool
associated with it and if not, considers it idle.  This works fine
even for delayed_work.work queued on timer, as __queue_delayed_work()
sets cwq on delayed_work.work - a queued delayed_work always has its
cwq and thus pool associated with it.

However, we're about to update delayed_work queueing and this won't
hold.  Update work_busy() such that it tests WORK_STRUCT_PENDING
before the associated pool.  This doesn't make any noticeable behavior
difference now.

With work_pending() test moved, the function read a lot better with
"if (!pool)" test flipped to positive.  Flip it.

While at it, lose the comment about now non-existent reentrant
workqueues.

tj: Reorganized the function and rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-06 18:04:53 -08:00
Lai Jiangshan
6be195886a workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END
Now that workqueue has moved away from gcwqs, workqueue no longer has
the need to have a CPU identifier indicating "no cpu associated" - we
now use WORK_OFFQ_POOL_NONE instead - and most uses of WORK_CPU_NONE
are gone.

The only left usage is as the end marker for for_each_*wq*()
iterators, where the name WORK_CPU_NONE is confusing w/o actual
WORK_CPU_NONE usages.  Similarly, WORK_CPU_LAST which equals
WORK_CPU_NONE no longer makes sense.

Replace both WORK_CPU_NONE and LAST with WORK_CPU_END.  This patch
doesn't introduce any functional difference.

tj: s/WORK_CPU_LAST/WORK_CPU_END/ and rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-06 18:04:53 -08:00
Michał Mirosław
9f3b795a62 driver-core: constify data for class_find_device()
All in-kernel users of class_find_device() don't really need mutable
data for match callback.

In two places (kernel/power/suspend_test.c, drivers/scsi/osd/osd_uld.c)
this patch changes match callbacks to use const search data.

The const is propagated to rtc_class_open() and power_supply_get_by_name()
parameters.

Note that there's a dev reference leak in suspend_test.c that's not
touched in this patch.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-06 12:18:56 -08:00
Ingo Molnar
0351096eb0 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU build fixlet from Paul E. McKenney.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06 12:43:04 +01:00
Stephen Hemminger
ca2eb5679f tcp: remove Appropriate Byte Count support
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-05 14:51:16 -05:00
Ingo Molnar
b2c77a57e4 This implements the cputime accounting on full dynticks CPUs.
Typical cputime stats infrastructure relies on the timer tick and
 its periodic polling on the CPU to account the amount of time
 spent by the CPUs and the tasks per high level domains such as
 userspace, kernelspace, guest, ...
 
 Now we are preparing to implement full dynticks capability on
 Linux for Real Time and HPC users who want full CPU isolation.
 This feature requires a cputime accounting that doesn't depend
 on the timer tick.
 
 To implement it, this new cputime infrastructure plugs into
 kernel/user/guest boundaries to take snapshots of cputime and
 flush these to the stats when needed. This performs pretty
 much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
 and cputime snaphots are synchronized between write and read
 side such that the latter can safely retrieve the pending tickless
 cputime of a task and add it to its latest cputime snapshot to
 return the correct result to the user.
 
 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu
 L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj
 tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G
 j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa
 1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS
 9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD
 dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB
 Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO
 ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO
 g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE
 xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD
 NLQO5OZXE4oIQmDFb0FV
 =1Tzp
 -----END PGP SIGNATURE-----

Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core

Pull full-dynticks (user-space execution is undisturbed and
receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready,
from Frederic Weisbecker:

 "This implements the cputime accounting on full dynticks CPUs.

  Typical cputime stats infrastructure relies on the timer tick and
  its periodic polling on the CPU to account the amount of time
  spent by the CPUs and the tasks per high level domains such as
  userspace, kernelspace, guest, ...

  Now we are preparing to implement full dynticks capability on
  Linux for Real Time and HPC users who want full CPU isolation.
  This feature requires a cputime accounting that doesn't depend
  on the timer tick.

  To implement it, this new cputime infrastructure plugs into
  kernel/user/guest boundaries to take snapshots of cputime and
  flush these to the stats when needed. This performs pretty
  much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
  and cputime snaphots are synchronized between write and read
  side such that the latter can safely retrieve the pending tickless
  cputime of a task and add it to its latest cputime snapshot to
  return the correct result to the user."

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05 13:10:33 +01:00
Dan Carpenter
c3c186403c sched: Fix signedness bug in yield_to()
In 7b270f6099 "sched: Bail out of yield_to when source and
target runqueue has one task" we changed this to store -ESRCH so
it needs to be signed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kbuild@01.org
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/20130205113751.GA20521@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05 12:59:29 +01:00
Leonid Shatz
b22affe0ae hrtimer: Prevent hrtimer_enqueue_reprogram race
hrtimer_enqueue_reprogram contains a race which could result in
timer.base switch during unlock/lock sequence.

hrtimer_enqueue_reprogram is releasing the lock protecting the timer
base for calling raise_softirq_irqsoff() due to a lock ordering issue
versus rq->lock.

If during that time another CPU calls __hrtimer_start_range_ns() on
the same hrtimer, the timer base might switch, before the current CPU
can lock base->lock again and therefor the unlock_timer_base() call
will unlock the wrong lock.

[ tglx: Added comment and massaged changelog ]

Signed-off-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1359981217-389-1-git-send-email-izik.eidus@ravellosystems.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-05 11:52:41 +01:00
Frederic Weisbecker
077931446b Merge branch 'nohz/printk-v8' into irq/core
Conflicts:
	kernel/irq_work.c

Add support for printk in full dynticks CPU.

* Don't stop tick with irq works pending. This
fix is generally useful and concerns archs that
can't raise self IPIs.

* Flush irq works before CPU offlining.

* Introduce "lazy" irq works that can wait for the
next tick to be executed, unless it's stopped.

* Implement klogd wake up using irq work. This
removes the ad-hoc printk_tick()/printk_needs_cpu()
hooks and make it working even in dynticks mode.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-02-05 00:48:46 +01:00
Greg Kroah-Hartman
16559ae48c kgdb: remove #include <linux/serial_8250.h> from kgdb.h
There's no reason kgdb.h itself needs to include the 8250 serial port
header file.  So push it down to the _very_ limited number of individual
drivers that need the values in that file, and fix up the places where
people really wanted serial_core.h and platform_device.h.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-04 15:35:26 -08:00
Linus Torvalds
2a6f79e8c1 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Three small fixlets"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/debug: Fix format string for 32-bit platforms
  sched: Fix warning in kernel/sched/fair.c
  sched/rt: Use root_domain of rt_rq not current processor
2013-02-05 07:58:24 +11:00
Linus Torvalds
51c1abb95f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Three fixlets and two small (and low risk) hw-enablement changes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix event group context move
  x86/perf: Add IvyBridge EP support
  perf/x86: Fix P6 driver section warning
  arch/x86/tools/insn_sanity.c: Identify source of messages
  perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom support
2013-02-05 07:57:09 +11:00
Linus Torvalds
5dc31b5767 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull two small RCU fixlets from Ingo Molnar.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Make rcu_nocb_poll an early_param instead of module_param
  rcu: Prevent soft-lockup complaints about no-CBs CPUs
2013-02-05 07:56:07 +11:00
Steven Rostedt
e4aa0da39b rcu: Allow rcutorture to be built at low optimization levels
The uses of trace_clock_local() are dead code when CONFIG_RCU_TRACE=n,
but some compilers might nevertheless generate code calling this function.
This commit therefore ensures that trace_clock_local() is invoked only
when CONFIG_RCU_TRACE=y.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-02-04 12:18:20 -08:00
Mike Galbraith
e0a79f529d sched: Fix select_idle_sibling() bouncing cow syndrome
If the previous CPU is cache affine and idle, select it.

The current implementation simply traverses the sd_llc domain,
taking the first idle CPU encountered, which walks buddy pairs
hand in hand over the package, inflicting excruciating pain.

1 tbench pair (worst case) in a 10 core + SMT package:

  pre   15.22 MB/sec 1 procs
  post 252.01 MB/sec 1 procs

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1359371965.5783.127.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04 20:07:24 +01:00
Ingo Molnar
9228b5f243 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

1.	Changes to rcutorture and to RCU documentation. Posted to LKML at
        https://lkml.org/lkml/2013/1/26/188.

2.	Enhancements to uniprocessor handling in tiny RCU. Posted to LKML
        at https://lkml.org/lkml/2013/1/27/2.

3.	Tag RCU callbacks with grace-period number to simplify callback
        advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203.

4.	Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04 19:06:34 +01:00
anish kumar
c02cf5f8ed irq_work: Remove return value from the irq_work_queue() function
As no one is using the return value of irq_work_queue(),
so it is better to just make it void.

Signed-off-by: anish kumar <anish198519851985@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[ Fix stale comments, remove now unnecessary __irq_work_queue() intermediate function ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1359925703-24304-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-04 11:50:59 +01:00
Thomas Gleixner
90889a635a Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux into timers/core
Trivial conflict in arch/x86/Kconfig

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-04 11:03:03 +01:00