8405 Commits

Author SHA1 Message Date
Carsten Emde
41dfba4367 tracing: remove unused local variables in tracer probe functions
When the nsecs_to_usecs() conversion in probe_wakeup_sched_switch() and
check_critical_timing() was moved to a later stage in order to avoid
unnecessary computing, it was overlooked to remove the original
variables, assignments and comments..

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-12 21:44:13 -04:00
Linus Torvalds
483e3cd6a3 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
  ring-buffer: only enable ring_buffer_swap_cpu when needed
  ring-buffer: check for swapped buffers in start of committing
  tracing: report error in trace if we fail to swap latency buffer
  tracing: add trace_array_printk for internal tracers to use
  tracing: pass around ring buffer instead of tracer
  tracing: make tracing_reset safe for external use
  tracing: use timestamp to determine start of latency traces
  tracing: Remove mentioning of legacy latency_trace file from documentation
  tracing/filters: Defer pred allocation, fix memory leak
  tracing: remove users of tracing_reset
  tracing: disable buffers and synchronize_sched before resetting
  tracing: disable update max tracer while reading trace
  tracing: print out start and stop in latency traces
  ring-buffer: disable all cpu buffers when one finds a problem
  ring-buffer: do not count discarded events
  ring-buffer: remove ring_buffer_event_discard
  ring-buffer: fix ring_buffer_read crossing pages
  ring-buffer: remove unnecessary cpu_relax
  ring-buffer: do not swap buffers during a commit
  ring-buffer: do not reset while in a commit
  ...
2009-09-11 13:24:03 -07:00
Linus Torvalds
774a694f8c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
  sched: Fix sched::sched_stat_wait tracepoint field
  sched: Disable NEW_FAIR_SLEEPERS for now
  sched: Keep kthreads at default priority
  sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
  sched: Turn off child_runs_first
  sched: Ensure that a child can't gain time over it's parent after fork()
  sched: enable SD_WAKE_IDLE
  sched: Deal with low-load in wake_affine()
  sched: Remove short cut from select_task_rq_fair()
  sched: Turn on SD_BALANCE_NEWIDLE
  sched: Clean up topology.h
  sched: Fix dynamic power-balancing crash
  sched: Remove reciprocal for cpu_power
  sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
  sched: Try to deal with low capacity
  sched: Scale down cpu_power due to RT tasks
  sched: Implement dynamic cpu_power
  sched: Add smt_gain
  sched: Update the cpu_power sum during load-balance
  sched: Add SD_PREFER_SIBLING
  ...
2009-09-11 13:23:18 -07:00
Linus Torvalds
4f0ac85416 Merge branch 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
  perf tools: Avoid unnecessary work in directory lookups
  perf stat: Clean up statistics calculations a bit more
  perf stat: More advanced variance computation
  perf stat: Use stddev_mean in stead of stddev
  perf stat: Remove the limit on repeat
  perf stat: Change noise calculation to use stddev
  x86, perf_counter, bts: Do not allow kernel BTS tracing for now
  x86, perf_counter, bts: Correct pointer-to-u64 casts
  x86, perf_counter, bts: Fail if BTS is not available
  perf_counter: Fix output-sharing error path
  perf trace: Fix read_string()
  perf trace: Print out in nanoseconds
  perf tools: Seek to the end of the header area
  perf trace: Fix parsing of perf.data
  perf trace: Sample timestamps as well
  perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
  perf trace: Sample the CPU too
  perf tools: Work around strict aliasing related warnings
  perf tools: Clean up warnings list in the Makefile
  perf tools: Complete support for dynamic strings
  ...
2009-09-11 13:22:43 -07:00
Linus Torvalds
d90a7e8640 Merge branch 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Do not mask oneshot edge type interrupts
  genirq: Support nested threaded irq handling
  genirq: Add buslock support
  genirq: Add oneshot support
2009-09-11 13:21:31 -07:00
Linus Torvalds
12a499612e Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  pci/intr_remapping: Allocate irq_iommu on node
  irq: Add irq_node() primitive
  irq: Make sure irq_desc for legacy irq get correct node setting
  genirq: Add prototype for handle_nested_irq()
  irq: Remove superfluous NULL pointer check in check_irq_resend()
  irq: Clean up by removing irqfixup MODULE_PARM_DESC()
  genirq: Fix comment describing suspend_device_irqs()
  genirq: Remove obsolete defines and typedefs
2009-09-11 13:20:42 -07:00
Linus Torvalds
eee2775d99 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
  rcu: Move end of special early-boot RCU operation earlier
  rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
  rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
  rcu: Remove lockdep annotations from RCU's _notrace() API members
  rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
  rcu: Add CPU-offline processing for single-node configurations
  rcu: Add "notrace" to RCU function headers used by ftrace
  rcu: Remove CONFIG_PREEMPT_RCU
  rcu: Merge preemptable-RCU functionality into hierarchical RCU
  rcu: Simplify rcu_pending()/rcu_check_callbacks() API
  rcu: Use debugfs_remove_recursive() simplify code.
  rcu: Merge per-RCU-flavor initialization into pre-existing macro
  rcu: Fix online/offline indication for rcudata.csv trace file
  rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
  rcu: Renamings to increase RCU clarity
  rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
  rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
  rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
  rcu: Fix typo in rcu_irq_exit() comment header
  rcu: Make rcupreempt_trace.c look at offline CPUs
  ...
2009-09-11 13:20:18 -07:00
Linus Torvalds
53e16fbd30 Merge branch 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  printk: Fix "printk: Enable the use of more than one CON_BOOT (early console)"
  printk: Restore previous console_loglevel when re-enabling logging
  printk: Ensure that "console enabled" messages are printed on the console
  printk: Enable the use of more than one CON_BOOT (early console)
2009-09-11 13:19:40 -07:00
Linus Torvalds
4e3408d9f7 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
  locking, m68k/asm-offsets: Rename signal defines
  locking: Inline spinlock code for all locking variants on s390
  locking: Simplify spinlock inlining
  locking: Allow arch-inlined spinlocks
  locking: Move spinlock function bodies to header file
  locking, m68k: Calculate thread_info offset with asm offset
  locking, m68k/asm-offsets: Rename pt_regs offset defines
  locking, sparc: Rename __spin_try_lock() and friends
  locking, powerpc: Rename __spin_try_lock() and friends
  lockdep: Remove recursion stattistics
  lockdep: Simplify lock_stat seqfile code
  lockdep: Simplify lockdep_chains seqfile code
  lockdep: Simplify lockdep seqfile code
  lockdep: Fix missing entries in /proc/lock_chains
  lockdep: Fix missing entry in /proc/lock_stat
  lockdep: Fix memory usage info of BFS
  lockdep: Reintroduce generation count to make BFS faster
  lockdep: Deal with many similar locks
  lockdep: Introduce lockdep_assert_held()
  lockdep: Fix style nits
  ...
2009-09-11 13:17:24 -07:00
Linus Torvalds
7193bea53f Merge branch 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Detect mismatched requeue targets
  futex: Correct futex_wait_requeue_pi() commentary
2009-09-11 13:16:22 -07:00
Linus Torvalds
989aa44a5f Merge branch 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debug lockups: Improve lockup detection, fix generic arch fallback
  debug lockups: Improve lockup detection
2009-09-11 13:15:55 -07:00
Linus Torvalds
4004f02d7a Merge branch 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  workqueues: Improve schedule_work() documentation
2009-09-11 13:13:32 -07:00
jolsa@redhat.com
689fd8b65d tracing: trace parser support for function and graph
Convert the writing to 'set_graph_function', 'set_ftrace_filter'
and 'set_ftrace_notrace' to use the generic trace_parser
'trace_get_user' function.

Removed FTRACE_ITER_CONT flag, since it's not needed after this change.

Minor fix in set_graph_function display - g_show function.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 15:20:18 -04:00
jolsa@redhat.com
489663644c tracing: trace parser support for set_event
Convert the parsing of the file 'set_event' to use the generic
trace_praser 'trace_get_user' function.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 14:47:11 -04:00
jolsa@redhat.com
b63f39ea50 tracing: create generic trace parser
Create a "trace_parser" that can parse the user space input for
separate words.

struct trace_parser is the descriptor.

Generic "trace_get_user" function that can be a helper to read multiple
words passed in by user space.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 14:46:55 -04:00
Steven Rostedt
f81c972d27 tracing: consolidate code between trace_output.c and trace_function_graph.c
Both trace_output.c and trace_function_graph.c do basically the same
thing to handle the printing of the latency-format. This patch moves
the code into one function that both can use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 14:24:13 -04:00
Martin Schwidefsky
f79e0258ea clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
The watchdog timer is started after the watchdog clocksource
and at least one watched clocksource have been registered. The
clocksource work element watchdog_work is initialized just
before the clocksource timer is started. This is too late for
the clocksource_mark_unstable call from native_cpu_up. To fix
this use a static initializer for watchdog_work.

This resolves a boot crash reported by multiple people.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090911153305.3fe9a361@skybase>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-11 20:17:18 +02:00
Steven Rostedt
637e7e8641 tracing: add lock depth to entries
This patch adds the lock depth of the big kernel lock to the generic
entry header. This way we can see the depth of the lock and help
in removing the BKL.

Example:

 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /_--=> lock-depth
 #                |||||/     delay
 #  cmd     pid   |||||| time  |   caller
 #     \   /      ||||||   \   |   /
   <idle>-0       2.N..3 5902255250us+: lock_acquire: read rcu_read_lock
   <idle>-0       2.N..3 5902255253us+: lock_release: rcu_read_lock
   <idle>-0       2dN..3 5902255257us+: lock_acquire: xtime_lock
   <idle>-0       2dN..4 5902255259us : lock_acquire: clocksource_lock
   <idle>-0       2dN..4 5902255261us+: lock_release: clocksource_lock

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 13:55:35 -04:00
Linus Torvalds
a12e4d304c Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-block
* 'writeback' of git://git.kernel.dk/linux-2.6-block:
  writeback: check for registered bdi in flusher add and inode dirty
  writeback: add name to backing_dev_info
  writeback: add some debug inode list counters to bdi stats
  writeback: get rid of pdflush completely
  writeback: switch to per-bdi threads for flushing data
  writeback: move dirty inodes from super_block to backing_dev_info
  writeback: get rid of generic_sync_sb_inodes() export
2009-09-11 09:17:05 -07:00
Steven Rostedt
48659d3119 tracing: move tgid out of generic entry and into userstack
The userstack trace required the recording of the tgid entry.
Unfortunately, it was added to the generic entry where it wasted
4 bytes of every entry and was only used by one entry.

This patch moves it out of the generic field and moves it into the
only user (userstack_entry).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 11:36:23 -04:00
Steven Rostedt
49ff590390 tracing: add latency format to function_graph tracer
While debugging something with the function_graph tracer, I found the
need to see the preempt count of the traces. Unfortunately, since
the function graph tracer has its own output formatting, it does not
honor the latency-format option.

This patch makes the function_graph tracer honor the latency-format
option, but still keeps control of the output. But now we have the
same details that the latency-format supplies.

 # tracer: function_graph
 #
 #      _-----=> irqs-off
 #     / _----=> need-resched
 #    | / _---=> hardirq/softirq
 #    || / _--=> preempt-depth
 #    ||| /
 #    ||||
 # CPU||||  DURATION                  FUNCTION CALLS
 # |  ||||   |   |                     |   |   |   |
  3)  d..1  1.333 us    |        idle_cpu();
  3)  d.h1              |        tick_check_idle() {
  3)  d.h1  0.550 us    |          tick_check_oneshot_broadcast();
  3)  d.h1              |          tick_nohz_stop_idle() {
  3)  d.h1              |            ktime_get() {
  3)  d.h1              |              ktime_get_ts() {

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 10:59:49 -04:00
Jens Axboe
5e605b64a1 block: add blk-iopoll, a NAPI like approach for block devices
This borrows some code from NAPI and implements a polled completion
mode for block devices. The idea is the same as NAPI - instead of
doing the command completion when the irq occurs, schedule a dedicated
softirq in the hopes that we will complete more IO when the iopoll
handler is invoked. Devices have a budget of commands assigned, and will
stay in polled mode as long as they continue to consume their budget
from the iopoll softirq handler. If they do not, the device is set back
to interrupt completion mode.

This patch holds the core bits for blk-iopoll, device driver support
sold separately.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 14:33:31 +02:00
Jens Axboe
d993831fa7 writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
James Morris
a3c8b97396 Merge branch 'next' into for-linus 2009-09-11 08:04:49 +10:00
Ingo Molnar
e1f8450854 sched: Fix sched::sched_stat_wait tracepoint field
This weird perf trace output:

  cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

Is caused by setting one component field of the delta to zero
a bit too early. Move it to later.

( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
  it's just a reporting bug in essence. )

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikos Chantziaras <realnc@arcor.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <4AA93D34.8040500@arcor.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10 20:52:54 +02:00
Ingo Molnar
3f2aa307c4 sched: Disable NEW_FAIR_SLEEPERS for now
Nikos Chantziaras and Jens Axboe reported that turning off
NEW_FAIR_SLEEPERS improves desktop interactivity visibly.

Nikos described his experiences the following way:

  " With this setting, I can do "nice -n 19 make -j20" and
    still have a very smooth desktop and watch a movie at
    the same time.  Various other annoyances (like the
    "logout/shutdown/restart" dialog of KDE not appearing
    at all until the background fade-out effect has finished)
    are also gone.  So this seems to be the single most
    important setting that vastly improves desktop behavior,
    at least here. "

Jens described it the following way, referring to a 10-seconds
xmodmap scheduling delay he was trying to debug:

  " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
    I get:

    Performance counter stats for 'xmodmap .xmodmap-carl':

         9.009137  task-clock-msecs         #      0.447 CPUs
               18  context-switches         #      0.002 M/sec
                1  CPU-migrations           #      0.000 M/sec
              315  page-faults              #      0.035 M/sec

    0.020167093  seconds time elapsed

    Woot! "

So disable it for now. In perf trace output i can see weird
delta timestamps:

  cc1-9943  [001]  2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

That nsec field is not supposed to be that large. More digging
is needed - but lets turn it off while the real bug is found.

Reported-by: Nikos Chantziaras <realnc@arcor.de>
Tested-by: Nikos Chantziaras <realnc@arcor.de>
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <4AA93D34.8040500@arcor.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10 20:34:48 +02:00
Li Zefan
197e2eabc9 tracing: move PRED macros to trace_events_filter.c
Move DEFINE_COMPARISON_PRED() and DEFINE_EQUALITY_PRED()
  to kernel/trace/trace_events_filter.c

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AA8579B.4020706@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:11 -04:00
Li Zefan
a5921c6c37 tracing: remove stats from struct tracer
Remove unused field @stats from struct tracer.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AA8579B.4020706@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:09 -04:00
Li Zefan
bd9cfca9cb tracing: format clean ups
Fix white-space formatting.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AA8579B.4020706@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:07 -04:00
Li Zefan
e0ab5f2dae tracing: remove dead code
Removes unreachable code.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AA8579B.4020706@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:06 -04:00
Steven Rostedt
478142c39c tracing: do not grab lock in wakeup latency function tracing
The wakeup tracer, when enabled, has its own function tracer.
It only traces the functions on the CPU where the task it is following
is on. If a task is woken on one CPU but then migrates to another CPU
before it wakes up, the latency tracer will then start tracing functions
on the other CPU.

To find which CPU the task is on, the wakeup function tracer performs
a task_cpu(wakeup_task). But to make sure the task does not disappear
it grabs the wakeup_lock, which is also taken when the task wakes up.
By taking this lock, the function tracer does not need to worry about
the task being freed as it checks its cpu.

Jan Blunck found a problem with this approach on his 32 CPU box. When
a task is being traced by the wakeup tracer, all functions take this
lock. That means that on all 32 CPUs, each function call is taking
this one lock to see if the task is on that CPU. This lock has just
serialized all functions on all 32 CPUs. Needless to say, this caused
major issues on that box. It would even lockup.

This patch changes the wakeup latency to insert a probe on the migrate task
tracepoint. When a task changes its CPU that it will run on, the
probe will take note. Now the wakeup function tracer no longer needs
to take the lock. It only compares the current CPU with a variable that
holds the current CPU the task is on. We don't worry about races since
it is OK to add or miss a function trace.

Reported-by: Jan Blunck <jblunck@suse.de>
Tested-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:04 -04:00
Robert Richter
d8eeb2d3b2 ring-buffer: consolidate interface of rb_buffer_peek()
rb_buffer_peek() operates with struct ring_buffer_per_cpu *cpu_buffer
only. Thus, instead of passing variables buffer and cpu it is better
to use cpu_buffer directly. This also reduces the risk of races since
cpu_buffer is not calculated twice.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1249045084-3028-1-git-send-email-robert.richter@amd.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-09 23:54:02 -04:00
Rafael J. Wysocki
bf992fa2bc Merge branch 'master' into for-linus 2009-09-10 00:02:02 +02:00
Mike Galbraith
61cbe54d94 sched: Keep kthreads at default priority
Removes kthread/workqueue priority boost, they increase worst-case
desktop latencies.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09 17:30:06 +02:00
Mike Galbraith
172e082a91 sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
Reduce the latency target from 20 msecs to 5 msecs.

Why? Larger latencies increase spread, which is good for scaling,
but bad for worst case latency.

We still have the ilog(nr_cpus) rule to scale up on bigger
server boxes.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09 17:30:06 +02:00
Mike Galbraith
2bba22c50b sched: Turn off child_runs_first
Set child_runs_first default to off.

It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
get preempted by child tasks, reducing parallelism.

Note, this patch might make existing races in user
applications more prominent than before - so breakages
might be bisected to this commit.

Child-runs-first is broken on SMP to begin with, and we
already had it off briefly in v2.6.23 so most of the
offenders ought to be fixed. Would be nice not to revert
this commit but fix those apps finally ...

Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
[ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
  people want to work around broken apps. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09 17:30:05 +02:00
Mike Galbraith
b5d9d734a5 sched: Ensure that a child can't gain time over it's parent after fork()
A fork/exec load is usually "pass the baton", so the child
should never be placed behind the parent.  With START_DEBIT we
make room for the new task, but with child_runs_first, that
room comes out of the _parent's_ hide. There's nothing to say
that the parent wasn't ahead of min_vruntime at fork() time,
which means that the "baton carrier", who is essentially the
parent in drag, can gain time and increase scheduling latencies
for waiters.

With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
enabled, we essentially pass the sleeper fairness off to the
child, which is fine, but if we don't base placement on the
parent's updated vruntime, we can end up compounding latency
woes if the child itself then does fork/exec.  The debit
incurred at fork doesn't hurt the parent who is then going to
sleep and maybe exit, but the child who acquires the error
harms all comers.

This improves latencies of make -j<n> kernel build workloads.

Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-08 13:15:34 +02:00
Peter Zijlstra
71a29aa7b6 sched: Deal with low-load in wake_affine()
wake_affine() would always fail under low-load situations where
both prev and this were idle, because adding a single task will
always be a significant imbalance, even if there's nothing
around that could balance it.

Deal with this by allowing imbalance when there's nothing you
can do about it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07 20:39:06 +02:00
Peter Zijlstra
cdd2ab3de4 sched: Remove short cut from select_task_rq_fair()
select_task_rq_fair() incorrectly skips the wake_affine()
logic, remove this.

When prev_cpu == this_cpu, the code jumps straight to the
wake_idle() logic, this doesn't give the wake_affine() logic
the chance to pin the task to this cpu.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07 20:39:05 +02:00
Jaswinder Singh Rajput
b6d9c25631 Security/SELinux: includecheck fix kernel/sysctl.c
fix the following 'make includecheck' warning:

  kernel/sysctl.c: linux/security.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-09-07 11:41:03 +10:00
Ingo Molnar
d28daf923a Merge branch 'tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core 2009-09-06 06:27:40 +02:00
Ingo Molnar
ed011b22ce Merge commit 'v2.6.31-rc9' into tracing/core
Merge reason: move from -rc5 to -rc9.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-06 06:11:42 +02:00
Linus Torvalds
93697a3cab Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter/powerpc: Fix cache event codes for POWER7
  perf_counter: Fix /0 bug in swcounters
  perf_counters: Increase paranoia level
2009-09-05 13:48:37 -07:00
Steven Rostedt
85bac32c4a ring-buffer: only enable ring_buffer_swap_cpu when needed
Since the ability to swap the cpu buffers adds a small overhead to
the recording of a trace, we only want to add it when needed.

Only the irqsoff and preemptoff tracers use this feature, and both are
not recommended for production kernels. This patch disables its use
when neither irqsoff nor preemptoff is configured.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:42:22 -04:00
Steven Rostedt
62f0b3eb5c ring-buffer: check for swapped buffers in start of committing
Because the irqsoff tracer can swap an internal CPU buffer, it is possible
that a swap happens between the start of the write and before the committing
bit is set (the committing bit will disable swapping).

This patch adds a check for this and will fail the write if it detects it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:38:42 -04:00
Steven Rostedt
e8165dbb03 tracing: report error in trace if we fail to swap latency buffer
The irqsoff tracer will fail to swap the cpu buffer with the max
buffer if it preempts a commit. Instead of ignoring this, this patch
makes the tracer report it if the last max latency failed due to preempting
a current commit.

The output of the latency tracer will look like this:

 # tracer: irqsoff
 #
 # irqsoff latency trace v1.1.5 on 2.6.31-rc5
 # --------------------------------------------------------------------
 # latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: save_args
 #  => ended at:   __do_softirq
 #
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /
 #                |||||     delay
 #  cmd     pid   ||||| time  |   caller
 #     \   /      |||||   \   |   /
    bash-4281    1d.s6  265us : update_max_tr_single: Failed to swap buffers due to commit in progress

Note the latency time and the functions that disabled the irqs or preemption
will still be listed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:22:41 -04:00
Steven Rostedt
659372d3e4 tracing: add trace_array_printk for internal tracers to use
This patch adds a trace_array_printk to allow a tracer to use the
trace_printk on its own trace array.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 19:13:53 -04:00
Steven Rostedt
e77405ad80 tracing: pass around ring buffer instead of tracer
The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:59:39 -04:00
Steven Rostedt
f633903af2 tracing: make tracing_reset safe for external use
Reseting the trace buffer without first disabling the buffer and
waiting for any writers to complete, can corrupt the ring buffer.

This patch makes the external version of tracing_reset safe from
corruption by disabling the ring buffer and calling synchronize_sched.

This version can no longer be called from interrupt context. But all those
callers have been removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:46:51 -04:00
Steven Rostedt
2f26ebd549 tracing: use timestamp to determine start of latency traces
Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-04 18:44:22 -04:00