8395 Commits

Author SHA1 Message Date
Thomas Gleixner
322a2c100a futex: Move exit_pi_state() call to release_mm()
exit_pi_state() is called from do_exit() but not from do_execve().
Move it to release_mm() so it gets called from do_execve() as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org
Cc: Anirban Sinha <ani@anirban.org>
Cc: Peter Zijlstra <peterz@infradead.org>
2009-10-06 17:00:01 +02:00
Peter Zijlstra
fc6b177dee futex: Nullify robust lists after cleanup
The robust list pointers of user space held futexes are kept intact
over an exec() call. When the exec'ed task exits exit_robust_list() is
called with the stale pointer. The risk of corruption is minimal, but
still it is incorrect to keep the pointers valid. Actually glibc
should uninstall the robust list before calling exec() but we have to
deal with it anyway.

Nullify the pointers after [compat_]exit_robust_list() has been
called.

Reported-by: Anirban Sinha <ani@anirban.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org
2009-10-06 17:00:01 +02:00
Tom Zanussi
26a50744b2 tracing/events: Add 'signed' field to format files
The sign info used for filters in the kernel is also useful to
applications that process the trace stream.  Add it to the format
files and make it available to userspace.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: hch@infradead.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254809398-8078-2-git-send-email-tzanussi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 15:04:45 +02:00
Hiroshi Shimamoto
b0f56f1a63 trace: Fix missing assignment in trace_ctxwake_*
The state char variable S should be reassigned, if S == 0.

We are missing the state of the task that is going to sleep for the
context switch events (in the raw mode).

Fortunately the problem arises with the sched_switch/wake_up
tracers, not the sched trace events.

The formers are legacy now. But still, that was buggy.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4AC43118.6050409@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:28:24 +02:00
Peter Zijlstra
906010b213 perf_event: Provide vmalloc() based mmap() backing
Some architectures such as Sparc, ARM and MIPS (basically
everything with flush_dcache_page()) need to deal with dcache
aliases by carefully placing pages in both kernel and user maps.

These architectures typically have to use vmalloc_user() for this.

However, on other architectures, vmalloc() is not needed and has
the downsides of being more restricted and slower than regular
allocations.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: David Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1254830228.21044.272.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:21:50 +02:00
Tom Zanussi
ee949a86b3 tracing/syscalls: Use long for syscall ret format and field definitions
The syscall event definitions use long for the syscall exit ret
value, but unsigned long for the same thing in the format and field
definitions.  Change them all to long.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: rostedt@goodmis.org
Cc: lizf@cn.fujitsu.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1254808849-7829-4-git-send-email-tzanussi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 12:02:34 +02:00
Thomas Gleixner
eaaea8036d futex: Fix locking imbalance
Rich reported a lock imbalance in the futex code:

   http://bugzilla.kernel.org/show_bug.cgi?id=14288

It's caused by the displacement of the retry_private label in
futex_wake_op(). The code unlocks the hash bucket locks in the
error handling path and retries without locking them again which
makes the next unlock fail.

Move retry_private so we lock the hash bucket locks when we retry.

Reported-by: Rich Ercolany <rercola@acm.jhu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:08:14 +02:00
Aaro Koskinen
d014e8894d panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
Commit ffd71da4e3f ("panic: decrease oops_in_progress only after
having done the panic") moved bust_spinlocks(0) to the end of the
function, which in practice is never reached.

As a result console_unblank() is not called, and on some systems
the user may not see the panic message.

Move it back up to before the unblanking.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1254483680-25578-1-git-send-email-aaro.koskinen@nokia.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:08:09 +02:00
Linus Torvalds
41cb6654eb Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Run generate-cmdlist.sh properly
  perf_event: Clean up perf_event_init_task()
  perf_event: Fix event group handling in __perf_event_sched_*()
  perf timechart: Add a power-only mode
  perf top: Add poll_idle to the skip list
2009-10-05 12:04:41 -07:00
Linus Torvalds
e69a9ac596 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Remove overly verbose "switch to high res mode" message
2009-10-05 12:04:16 -07:00
Linus Torvalds
0f26ec69f0 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kmemtrace: Fix up tracer registration
  tracing: Fix infinite recursion in ftrace_update_pid_func()
2009-10-05 12:03:43 -07:00
Paul E. McKenney
135c8aea55 rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
The rcu_barrier enum causes several problems:

  (1) you have to define the enum somewhere, and there is no
      convenient place,

  (2) the difference between TREE_RCU and TREE_PREEMPT_RCU causes
      problems when you need to map from rcu_barrier enum to struct
      rcu_state,

  (3) the switch statement are large, and

  (4) TINY_RCU really needs a different rcu_barrier() than do the
      treercu implementations.

So replace it with a functionally equivalent but cleaner function
pointer abstraction.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12541998232366-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:02:05 +02:00
Paul E. McKenney
a0b6c9a78c rcu: Clean up code based on review feedback from Josh Triplett, part 4
These issues identified during an old-fashioned face-to-face code
review extending over many hours.  This group improves an existing
abstraction and introduces two new ones.  It also fixes an RCU
stall-warning bug found while making the other changes.

o	Make RCU_INIT_FLAVOR() declare its own variables, removing
	the need to declare them at each call site.

o	Create an rcu_for_each_leaf() macro that scans the leaf
	nodes of the rcu_node tree.

o	Create an rcu_for_each_node_breadth_first() macro that does
	a breadth-first traversal of the rcu_node tree, AKA
	stepping through the array in index-number order.

o	If all CPUs corresponding to a given leaf rcu_node
	structure go offline, then any tasks queued on that leaf
	will be moved to the root rcu_node structure.  Therefore,
	the stall-warning code must dump out tasks queued on the
	root rcu_node structure as well as those queued on the leaf
	rcu_node structures.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12541491934126-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:02:04 +02:00
Paul E. McKenney
3d76c08290 rcu: Clean up code based on review feedback from Josh Triplett, part 3
Whitespace fixes, updated comments, and trivial code movement.

o	Fix whitespace error in RCU_HEAD_INIT()

o	Move "So where is rcu_write_lock()" comment so that it does
	not come between the rcu_read_unlock() header comment and
	the rcu_read_unlock() definition.

o	Move the module_param statements for blimit, qhimark, and
	qlowmark to immediately follow the corresponding
	definitions.

o	In __rcu_offline_cpu(), move the assignment to rdp_me
	inside the "if" statement, given that rdp_me is not used
	outside of that "if" statement.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12541491931164-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:02:02 +02:00
Paul E. McKenney
162cc2794d rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
Move the rcu_lock_map definition from rcutree.c to rcupdate.c so that
TINY_RCU can use lockdep.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 21:01:28 +02:00
Peter Williams
f83f9ac263 sched: Set correct normal_prio and prio values in sched_fork()
normal_prio should be updated if policy changes from RT to
SCHED_MORMAL or if static_prio/nice is changed.

Some paths through sched_fork() ignore this requirement and may
result in normal_prio having an invalid value.

Fixing this issue allows the call to effective_prio() in
wake_up_new_task() to be removed.

Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <f8f46736fd4e7f090ac0.1253774830@mudlark.pw.nest>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-05 13:42:20 +02:00
Frederic Weisbecker
75fb4090b3 tracing: Use free_percpu instead of kfree
In the event->profile_enable() failure path, we release the per cpu
buffers using kfree which is wrong because they are per cpu pointers.
Although free_percpu only wraps kfree for now, that may change in the
future so lets use the correct way.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-05 10:57:56 +02:00
Frederic Weisbecker
fe8e5b5a60 tracing: Check total refcount before releasing bufs in profile_enable failure
When we call the profile_enable() callback of an event, we release the
shared perf event tracing buffers unconditionnaly in the failure path.
This is wrong because there may be other users of these. Then check the
total refcount before doing this.

Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-10-05 10:57:41 +02:00
Linus Torvalds
58e57fbd1c Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (41 commits)
  Revert "Seperate read and write statistics of in_flight requests"
  cfq-iosched: don't delay async queue if it hasn't dispatched at all
  block: Topology ioctls
  cfq-iosched: use assigned slice sync value, not default
  cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'
  cfq-iosched: implement slower async initiate and queue ramp up
  cfq-iosched: delay async IO dispatch, if sync IO was just done
  cfq-iosched: add a knob for desktop interactiveness
  Add a tracepoint for block request remapping
  block: allow large discard requests
  block: use normal I/O path for discard requests
  swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL
  fs/bio.c: move EXPORT* macros to line after function
  Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
  cciss: fix build when !PROC_FS
  block: Do not clamp max_hw_sectors for stacking devices
  block: Set max_sectors correctly for stacking devices
  cciss: cciss_host_attr_groups should be const
  cciss: Dynamically allocate the drive_info_struct for each logical drive.
  cciss: Add usage_count attribute to each logical drive in /sys
  ...
2009-10-04 12:39:14 -07:00
KAMEZAWA Hiroyuki
4e649152cb memcg: some modification to softlimit under hierarchical memory reclaim.
This patch clean up/fixes for memcg's uncharge soft limit path.

Problems:
  Now, res_counter_charge()/uncharge() handles softlimit information at
  charge/uncharge and softlimit-check is done when event counter per memcg
  goes over limit. Now, event counter per memcg is updated only when
  memory usage is over soft limit. Here, considering hierarchical memcg
  management, ancesotors should be taken care of.

  Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
  This is not good.

  Prolems:
  1. memcg's event counter incremented only when softlimit hits. That's bad.
     It makes event counter hard to be reused for other purpose.

  2. At uncharge, only the lowest level rescounter is handled. This is bug.
     Because ancesotor's event counter is not incremented, children should
     take care of them.

  3. res_counter_uncharge()'s 3rd argument is NULL in most case.
     ops under res_counter->lock should be small. No "if" sentense is better.

Fixes:
  * Removed soft_limit_xx poitner and checks in charge and uncharge.
    Do-check-only-when-necessary scheme works enough well without them.

  * make event-counter of memcg incremented at every charge/uncharge.
    (per-cpu area will be accessed soon anyway)

  * All ancestors are checked at soft-limit-check. This is necessary because
    ancesotor's event counter may never be modified. Then, they should be
    checked at the same time.

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:13 -07:00
KAMEZAWA Hiroyuki
3dece8347d cgroup: catch bad css refcnt at css_put
__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:12 -07:00
Alexey Dobriyan
828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Paul Mundt
3ae91c21dd module: fix up CONFIG_KALLSYMS=n build.
Starting from commit 4a4962263f07d14660849ec134ee42b63e95ea9a "reduce
symbol table for loaded modules (v2)", the kernel/module.c build is broken
with CONFIG_KALLSYMS disabled.

  CC      kernel/module.o
kernel/module.c:1995: warning: type defaults to 'int' in declaration of 'Elf_Hdr'
kernel/module.c:1995: error: expected ';', ',' or ')' before '*' token
kernel/module.c: In function 'load_module':
kernel/module.c:2203: error: 'strmap' undeclared (first use in this function)
kernel/module.c:2203: error: (Each undeclared identifier is reported only once
kernel/module.c:2203: error: for each function it appears in.)
kernel/module.c:2239: error: 'symoffs' undeclared (first use in this function)
kernel/module.c:2239: error: implicit declaration of function 'layout_symtab'
kernel/module.c:2240: error: 'stroffs' undeclared (first use in this function)
make[1]: *** [kernel/module.o] Error 1
make: *** [kernel/module.o] Error 2

There are three different issues:

    - layout_symtab() takes a const Elf_Ehdr

    - layout_symtab() needs to return a value

    - symoffs/stroffs/strmap are referenced by the load_module() code
      despite being ifdefed out, which seems unnecessary given the noop
      behaviour of layout_symtab()/add_kallsyms() in the case of
      CONFIG_KALLSYMS=n.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Jun'ichi Nomura
b0da3f0dad Add a tracepoint for block request remapping
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:19:34 +02:00
Zdenek Kabelac
48c0d4d4c0 Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
Release kobject also in case the request_fn is NULL.

Problem was noticed via kmemleak backtrace when some sysfs entries were
note properly destroyed during  device removal:

unreferenced object 0xffff88001aa76640 (size 80):
  comm "lvcreate", pid 2120, jiffies 4294885144
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
  backtrace:
    [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
    [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
    [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
    [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
    [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
    [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
    [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
    [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
    [<ffffffff812447e4>] add_disk+0x94/0x160
    [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
    [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
    [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
    [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:15:46 +02:00
Paul Mundt
f9ac5a69ed kmemtrace: Fix up tracer registration
Commit ddc1637af217dbd8bc51f30e6d24e84476a869a6 ("kmemtrace: Print
binary output only if 'bin' option is set") ended up inverting the
error detection logic. register_tracer() returns 0 on success,
which this change caused to treat as an error, resulting in:

[    0.132000] Warning: could not register the kmem tracer

as well as bailing out of the initcall with an error value. This
restores the old logic.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20090928075540.GD6668@linux-sh.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 11:53:44 +02:00
Ingo Molnar
0aa73ba1c4 Merge branch 'tracing/urgent' into tracing/core
Merge reason: Pick up latest fixes and update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 11:20:48 +02:00
Xiao Guangrong
27f9994c50 perf_event: Clean up perf_event_init_task()
While at it: we can traverse ctx->group_list to get all
group leader, it should be safe since we hold ctx->mutex.

Changlog v1->v2:

  - remove WARN_ON_ONCE() according to Peter Zijlstra's suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4ABC5AF9.6060808@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 09:30:44 +02:00
Xiao Guangrong
8c9ed8e14c perf_event: Fix event group handling in __perf_event_sched_*()
Paul Mackerras says:

 "Actually, looking at this more closely, it has to be a group
 leader anyway since it's at the top level of ctx->group_list.  In
 fact I see four places where we do:

  list_for_each_entry(event, &ctx->group_list, group_entry) {
	if (event == event->group_leader)
		...

 or the equivalent, three of which appear to have been introduced
 by afedadf2 ("perf_counter: Optimize sched in/out of counters")
 back in May by Peter Z.

 As far as I can see the if () is superfluous in each case (a
 singleton event will be a group of 1 and will have its
 group_leader pointing to itself)."

 [ See: http://marc.info/?l=linux-kernel&m=125361238901442&w=2 ]

And Peter Zijlstra points out this is a bugfix:

 "The intent was to call event_sched_{in,out}() for single event
  groups because that's cheaper than group_sched_{in,out}(),
  however..

  - as you noticed, I got the condition wrong, it should have read:

      list_empty(&event->sibling_list)

  - it failed to call group_can_go_on() which deals with ->exclusive.

  - it also doesn't call hw_perf_group_sched_in() which might break
    power."

 [ See: http://marc.info/?l=linux-kernel&m=125369523318583&w=2 ]

Changelog v1->v2:

 - Fix the title name according to Peter Zijlstra's suggestion

 - Remove the comments and WARN_ON_ONCE() as Peter Zijlstra's
   suggestion

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <4ABC5A55.7000208@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 09:30:44 +02:00
Matt Fleming
33974093c0 tracing: Fix infinite recursion in ftrace_update_pid_func()
When CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST is enabled
__ftrace_trace_function contains the current trace function, not
ftrace_trace_function.

In ftrace_update_pid_func() we currently incorrectly assign the
value of ftrace_trace_function to __ftrace_trace_funcion before
returning.

Without this patch it is possible to execute an infinite recursion
whereby ftrace_test_stop_func() calls __ftrace_trace_function,
which was assigned ftrace_test_stop_func() in
ftrace_update_pid_func().

Signed-off-by: Matt Fleming <matthew.fleming@imgtec.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1254152581-18347-1-git-send-email-matt@console-pimps.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 08:19:24 +02:00
Eric Dumazet
152f9d0710 sched_clock: Fix atomicity/continuity bug by using cmpxchg64()
Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.

That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.

Two symptoms are known right now:

 - interactivity problems seen by Arjan: up to 600 msecs
   latencies instead of the expected 20-40 msecs. These
   latencies are very visible on the desktop.

 - incorrect CPU stats: occasionally too high percentages in 'top',
   and crazy CPU usage stats.

Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30 22:56:10 +02:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Linus Torvalds
6f5071020d Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Eliminate needless reprogramming of clock events device
2009-09-27 10:39:04 -07:00
Linus Torvalds
3b383767c4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Add memory barrier commentary to futex_wait_queue_me()
  futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me()
  futex: Correct futex_q woken state commentary
  futex: Make function kernel-doc commentary consistent
  futex: Correct queue_me and unqueue_me commentary
  futex: Correct futex_wait_requeue_pi() commentary
2009-09-26 10:15:53 -07:00
Linus Torvalds
179b9145d5 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Resume clocksource without taking the clocksource mutex
2009-09-26 10:14:41 -07:00
Linus Torvalds
4187e7e9f1 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  modules, tracing: Remove stale struct marker signature from module_layout()
  tracing/workqueue: Use %pf in workqueue trace events
  tracing: Fix a comment and a trivial format issue in tracepoint.h
  tracing: Fix failure path in ftrace_regex_open()
  tracing: Fix failure path in ftrace_graph_write()
  tracing: Check the return value of trace_get_user()
  tracing: Fix off-by-one in trace_get_user()
2009-09-26 10:13:54 -07:00
Roland Dreier
d3f6302e7e hrtimer: Remove overly verbose "switch to high res mode" message
On big systems, printing <number of CPUs> copies of

    Switched to high resolution mode on CPU nnn

clutters up the kernel log for minimal gain.  Just get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <ada1vlw126s.fsf_-_@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-26 16:58:09 +02:00
David S. Miller
8b3f6af863 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/cpc-usb/TODO
	drivers/staging/cpc-usb/cpc-usb_drv.c
	drivers/staging/cpc-usb/cpc.h
	drivers/staging/cpc-usb/cpc_int.h
	drivers/staging/cpc-usb/cpcusb.h
2009-09-24 15:13:11 -07:00
Martin Schwidefsky
89133f9350 clocksource: Resume clocksource without taking the clocksource mutex
git commit 75c5158f70c065b9 converted the clocksource spinlock to a
mutex. This causes the following BUG:

BUG: sleeping function called from invalid context at
kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473,
name: pm-suspend 2 locks held by pm-suspend/2473:
 #0:  (&buffer->mutex){......}, at: [<ffffffff8115ab13>]
sysfs_write_file+0x3c/0x137
 #1:  (pm_mutex){......}, at: [<ffffffff810865b5>]
enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31
#1 Call Trace:
 [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24
 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b
 [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43
 [<ffffffff81073537>] clocksource_resume+0x1c/0x60
 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8
 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf
 [<ffffffff812aef79>] sysdev_resume+0x6d/0xae
 [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af
 [<ffffffff8108665b>] enter_state+0xdf/0x130
 [<ffffffff81085dc3>] state_store+0xb6/0xd3
 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19
 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137
 [<ffffffff811057d2>] vfs_write+0xae/0x10b
 [<ffffffff81208392>] ? __up_read+0x1a/0x7f
 [<ffffffff811058ef>] sys_write+0x4a/0x6e
 [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b

clocksource_resume is called early in the resume process, there is
only one cpu, no processes are running and the interrupts are
disabled. It is therefore possible to resume the clocksources
without taking the clocksource mutex.

Reported-by: Xiaotian Feng <xtfeng@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 22:37:53 +02:00
Darren Hart
9beba3c54d futex: Add memory barrier commentary to futex_wait_queue_me()
The memory barrier semantics of futex_wait_queue_me() are
non-obvious. Add some commentary to try and clarify it.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924185447.694.38948.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 22:30:10 +02:00
Frederic Weisbecker
3f6fe06dbf tracing/filters: Unify the regex parsing helpers
The filter code has stolen the regex parsing function from ftrace to
get the regex support.
We have duplicated this code, so factorize it in the filter area and
make it generally available, as the filter code is the most suited to
host this feature.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-09-24 21:40:13 +02:00
Frederic Weisbecker
1889d20922 tracing/filters: Provide basic regex support
This patch provides basic support for regular expressions in filters.

It supports the following types of regexp:

- *match_beginning
- *match_middle*
- match_end*
- !don't match

Example:
	cd /debug/tracing/events/bkl/lock_kernel
	echo 'file == "*reiserfs*"' > filter
	echo 1 > enable

           gedit-4941  [000]   457.735437: lock_kernel: depth: 0, fs/reiserfs/namei.c:334 reiserfs_lookup()
     sync_supers-227   [001]   461.379985: lock_kernel: depth: 0, fs/reiserfs/super.c:69 reiserfs_sync_fs()
     sync_supers-227   [000]   461.383096: lock_kernel: depth: 0, fs/reiserfs/journal.c:1069 flush_commit_list()
      reiserfs/1-1369  [001]   461.479885: lock_kernel: depth: 0, fs/reiserfs/journal.c:3509 flush_async_commits()

Every string is now handled as a regexp in the filter framework, which
helps to factorize the code for handling both simple strings and
regexp comparisons.

(The regexp parsing code has been wildly cherry picked from ftrace.c
written by Steve.)

v2: Simplify the whole and drop the filter_regex file

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2009-09-24 21:39:27 +02:00
Linus Torvalds
a6b49cb210 Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze: (24 commits)
  microblaze: Disable heartbeat/enable emaclite in defconfigs
  microblaze: Support simpleImage.dts make target
  microblaze: Fix _start symbol to physical address
  microblaze: Use LOAD_OFFSET macro to get correct LMA for all sections
  microblaze: Create the LOAD_OFFSET macro used to compute VMA vs LMA offsets
  microblaze: Copy ppc asm-compat.h for clean handling of constants in asm and C
  microblaze: Actually show KiB rather than pages in "Freeing initrd memory:"
  microblaze: Support ptrace syscall tracing.
  microblaze: Updated CPU version and FPGA family codes in PVR
  microblaze: Generate correct signal and siginfo for integer div-by-zero
  microblaze: Don't be noisy when userspace causes hardware exceptions
  microblaze: Remove ipc.h file which points to non-existing asm-generic file
  microblaze: Clear sticky FSR register after generating exception signals
  microblaze: Ensure CPU usermode is set on new userspace processes
  microblaze: Use correct kbuild variable KBUILD_CFLAGS
  microblaze: Save and restore msr in hw exception
  microblaze: Add architectural support for USB EHCI host controllers
  microblaze: Implement include/asm/syscall.h.
  microblaze: Improve checking mechanism for MSR instruction
  microblaze: Add checking mechanism for MSR instruction
  ...
2009-09-24 09:01:44 -07:00
Linus Torvalds
2c9871de0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  module: don't call percpu_modfree on NULL pointer.
  module: fix memory leak when load fails after srcversion/version allocated
  module: preferred way to use MODULE_AUTHOR
  param: allow whitespace as kernel parameter separator
  module: reduce string table for loaded modules (v2)
  module: reduce symbol table for loaded modules (v2)
2009-09-24 09:01:05 -07:00
Linus Torvalds
6d39b27f0a Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
  lsm: Use a compressed IPv6 string format in audit events
  Audit: send signal info if selinux is disabled
  Audit: rearrange audit_context to save 16 bytes per struct
  Audit: reorganize struct audit_watch to save 8 bytes
2009-09-24 08:31:04 -07:00
Rusty Russell
ffa9f12a41 module: don't call percpu_modfree on NULL pointer.
The general one handles NULL, the static obsolescent
(CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's
commit 720eba31 assumed it did, and various frobbings since then kept
that assumption.

All other callers in module.c all protect it with an if; this effectively
does the same as free_init is only goto if we fail percpu_modalloc().

Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
2009-09-25 00:32:59 +09:30
Rusty Russell
a263f7763c module: fix memory leak when load fails after srcversion/version allocated
Normally the twisty paths of sysfs will free the attributes, but not if
we fail before we hook it into sysfs (which is the last thing we do in
load_module).

(This sysfs code is a turd, no doubt there are other issues lurking too).

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
2009-09-25 00:32:59 +09:30
Peter Oberparleiter
26d052bfce param: allow whitespace as kernel parameter separator
Some boot mechanisms require that kernel parameters are stored in a
separate file which is loaded to memory without further processing
(e.g. the "Load from FTP" method on s390). When such a file contains
newline characters, the kernel parameter preceding the newline might
not be correctly parsed (due to the newline being stuck to the end of
the actual parameter value) which can lead to boot failures.

This patch improves kernel command line usability in such a situation
by allowing generic whitespace characters as separators between kernel
parameters.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:58 +09:30
Jan Beulich
554bdfe5ac module: reduce string table for loaded modules (v2)
Also remove all parts of the string table (referenced by the symbol
table) that are not needed for kallsyms use (i.e. which were only
referenced by symbols discarded by the previous patch, or not
referenced at all for whatever reason).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30
Jan Beulich
4a4962263f module: reduce symbol table for loaded modules (v2)
Discard all symbols not interesting for kallsyms use: absolute,
section, and in the common case (!KALLSYMS_ALL) data ones.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-25 00:32:57 +09:30