linux

iv/linux

Author	SHA1	Message	Date
Arnaldo Carvalho de Melo	51a763dd84	tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} Impact: new API These new functions do what previously was being open coded, reducing the number of details ftrace plugin writers have to worry about. It also standardizes the handling of stacktrace, userstacktrace and other trace options we may introduce in the future. With this patch, for instance, the blk tracer (and some others already in the tree) can use the "userstacktrace" /d/tracing/trace_options facility. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -5 trace_graph_return \| -22 trace_graph_entry \| -26 trace_function \| -45 __ftrace_trace_stack \| -27 ftrace_trace_userstack \| -29 tracing_sched_switch_trace \| -66 tracing_stop \| +1 trace_seq_to_user \| -1 ftrace_trace_special \| -63 ftrace_special \| +1 tracing_sched_wakeup_trace \| -70 tracing_reset_online_cpus \| -1 13 functions changed, 2 bytes added, 355 bytes removed, diff: -353 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -58 1 function changed, 58 bytes removed, diff: -58 linux-2.6-tip/kernel/trace/trace.c: trace_buffer_lock_reserve \| +88 trace_buffer_unlock_commit \| +86 2 functions changed, 174 bytes added, diff: +174 /tmp/vmlinux.after: 16 functions changed, 176 bytes added, 413 bytes removed, diff: -237 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-06 01:01:41 +01:00
Arnaldo Carvalho de Melo	0a9877514c	ring_buffer: remove unused flags parameter Impact: API change, cleanup >From ring_buffer_{lock_reserve,unlock_commit}. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -14 trace_graph_return \| -14 trace_graph_entry \| -10 trace_function \| -8 __ftrace_trace_stack \| -8 ftrace_trace_userstack \| -8 tracing_sched_switch_trace \| -8 ftrace_trace_special \| -12 tracing_sched_wakeup_trace \| -8 9 functions changed, 90 bytes removed, diff: -90 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -1 1 function changed, 1 bytes removed, diff: -1 /tmp/vmlinux.after: 10 functions changed, 91 bytes removed, diff: -91 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-06 01:01:40 +01:00
Johannes Weiner	777c6c5f1f	wait: prevent exclusive waiter starvation With exclusive waiters, every process woken up through the wait queue must ensure that the next waiter down the line is woken when it has finished. Interruptible waiters don't do that when aborting due to a signal. And if an aborting waiter is concurrently woken up through the waitqueue, noone will ever wake up the next waiter. This has been observed with __wait_on_bit_lock() used by lock_page_killable(): the first contender on the queue was aborting when the actual lock holder woke it up concurrently. The aborted contender didn't acquire the lock and therefor never did an unlock followed by waking up the next waiter. Add abort_exclusive_wait() which removes the process' wait descriptor from the waitqueue, iff still queued, or wakes up the next waiter otherwise. It does so under the waitqueue lock. Racing with a wake up means the aborting process is either already woken (removed from the queue) and will wake up the next waiter, or it will remove itself from the queue and the concurrent wake up will apply to the next waiter after it. Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and __wait_on_bit_lock() when they were interrupted by other means than a wake up through the queue. [akpm@linux-foundation.org: coding-style fixes] Reported-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Mentored-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Chuck Lever <cel@citi.umich.edu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> ["after some testing"] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-05 12:56:48 -08:00
Andrew Morton	60fd760fb9	revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" Revert commit 0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f because it causes (arguably poorly designed) existing userspace to spend interminable periods closing billions of not-open file descriptors. We could bring this back, with some sort of opt-in tunable in /proc, which defaults to "off". Peter's alanysis follows: : I spent several hours trying to get to the bottom of a serious : performance issue that appeared on one of our servers after upgrading to : 2.6.28. In the end it's what could be considered a userspace bug that : was triggered by a change in 2.6.28. Since this might also affect other : people I figured I'd at least document what I found here, and maybe we : can even do something about it: : : : So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately : the team maintaining our ftp archive complained that one of their : scripts that previously ran in a few minutes still hadn't even come : close to being done after an hour or so. Downgrading to 2.6.27 fixed : that. : : Turns out that script is forking a lot and something in it or python or : whereever closes all the file descriptors it doesn't want to pass on. : That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and : closes them all with a few exceptions. : : Turns out that takes a long time when your limit -n is now 2^20 (1048576). : : With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is : now a thousand times that. : : 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to : RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that : allows, as the title implies, to set the limit for number of files to : infinity. : : Closer investigation showed that the broken default ulimit did not apply : to "system" processes (like stuff started from init). In the end I : could establish that all processes that passed through pam_limit at one : point had the bad resource limit. : : Apparently the pam library in Debian etch (4.0) initializes the limits : to some default values when it doesn't have any settings in limit.conf : to override them. Turns out that for nofiles this is RLIM_INFINITY. : Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam : package version 0.79-5 fixes that - tho I'm not sure what side effects : that has. : : Debian lenny (the upcoming 5.0 version) doesn't have this issue as it : uses a different pam (version). Reported-by: Peter Palfrader <weasel@debian.org> Cc: Adam Tkac <vonsch@gmail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-05 12:56:47 -08:00
Andrew Morton	58763a2974	kernel/async.c: fix printk warnings alpha: kernel/async.c: In function 'run_one_entry': kernel/async.c:141: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t' kernel/async.c:149: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t' kernel/async.c:149: warning: format '%lld' expects type 'long long int', but argument 4 has type 's64' kernel/async.c: In function 'async_synchronize_cookie_special': kernel/async.c:250: warning: format '%lli' expects type 'long long int', but argument 3 has type 's64' Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-05 12:56:46 -08:00
Steven Rostedt	dac7494028	trace: code style clean up Ingo Molnar suggested using goto logic to keep the indentation down and to be able to remove the nasty line breaks. This actually makes the code a bit more readable. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 14:35:47 +01:00
Arnaldo Carvalho de Melo	7be421510b	trace: Remove unused trace_array_cpu parameter Impact: cleanup Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 14:35:47 +01:00
Arnaldo Carvalho de Melo	97e5b191ae	trace_branch: Remove unused function Impact: cleanup Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 14:35:46 +01:00
Arnaldo Carvalho de Melo	268ccda0cb	trace: assign defaults at register_ftrace_event Impact: simplification of tracers As all tracers are doing this we might as well do it in register_ftrace_event and save one branch each time we call these callbacks. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 14:35:46 +01:00
Peter Zijlstra	4cd4c1b40d	timers: split process wide cpu clocks/timers Change the process wide cpu timers/clocks so that we: 1) don't mess up the kernel with too many threads, 2) don't have a per-cpu allocation for each process, 3) have no impact when not used. In order to accomplish this we're going to split it into two parts: - clocks; which can take all the time they want since they run from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID) - timers; which need constant time sampling but since they're explicity used, the user can pay the overhead. The clock readout will go back to a full sum of the thread group, while the timers will run of a global 'clock' that only runs when needed, so only programs that make use of the facility pay the price. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 13:04:33 +01:00
Peter Zijlstra	32bd671d6c	signal: re-add dead task accumulation stats. We're going to split the process wide cpu accounting into two parts: - clocks; which can take all the time they want since they run from user context. - timers; which need constant time tracing but can affort the overhead because they're default off -- and rare. The clock readout will go back to a full sum of the thread group, for this we need to re-add the exit stats that were removed in the initial itimer rework (f06febc9: timers: fix itimer/many thread hang). Furthermore, since that full sum can be rather slow for large thread groups and we have the complete dead task stats, revert the do_notify_parent time computation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 13:04:33 +01:00
Linus Torvalds	647802d6db	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: APIC: enable workaround on AMD Fam10h CPUs xen: disable interrupts before saving in percpu x86: add x86@kernel.org to MAINTAINERS x86: push old stack address on irqstack for unwinder irq, x86: fix lock status with numa_migrate_irq_desc x86: add cache descriptors for Intel Core i7 x86/Voyager: make it build and boot	2009-02-04 13:58:50 -08:00
Suresh Siddha	483b4ee60e	sched: fix nohz load balancer on cpu offline Christian Borntraeger reports: > After a logical cpu offline, even on a complete idle system, there > is one cpu with full ticks. It turns out that nohz.cpu_mask has the > the offlined cpu still set. > > In select_nohz_load_balancer() we check if the system is completely > idle to turn of load balancing. We compare cpu_online_map with > nohz.cpu_mask. Since cpu_online_map is updated on cpu unplug, > but nohz.cpu_mask is not, the check fails and the scheduler believes > that we need an "idle load balancer" even on a fully idle system. > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ. Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask while a cpu is going offline. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 22:31:19 +01:00
Arnaldo Carvalho de Melo	ae7462b4f1	trace: make the trace_event callbacks return enum print_line_t As they actually all return these enumerators. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 20:48:39 +01:00
Arnaldo Carvalho de Melo	d9793bd801	trace: judicious error checking of trace_seq results Impact: bugfix and cleanup Some callsites were returning either TRACE_ITER_PARTIAL_LINE if the trace_seq routines (trace_seq_printf, etc) returned 0 meaning its buffer was full, or zero otherwise. But... /* Return values for print_line callback / enum print_line_t { TRACE_TYPE_PARTIAL_LINE = 0, / Retry after flushing the seq / TRACE_TYPE_HANDLED = 1, TRACE_TYPE_UNHANDLED = 2 / Relay to other output functions */ }; In other cases the return value was not being relayed at all. Most of the time it didn't hurt because the page wasn't get filled, but for correctness sake, handle the return values everywhere. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-04 20:48:30 +01:00
Ingo Molnar	ce70a0b472	Merge branches 'tracing/blktrace', 'tracing/ftrace', 'tracing/urgent' and 'linus' into tracing/core	2009-02-04 20:45:41 +01:00
Ingo Molnar	bb960a1e42	Merge branch 'core/xen' into x86/urgent	2009-02-04 14:54:56 +01:00
Oleg Nesterov	229c4ef8ae	ftrace: do_each_pid_task() needs rcu lock "ftrace: use struct pid" commit 978f3a45d9499c7a447ca7615455cefb63d44165 converted ftrace_pid_trace to "struct pid*". But we can't use do_each_pid_task() without rcu_read_lock() even if we know the pid itself can't go away (it was pinned in ftrace_pid_write). The exiting task can detach itself from this pid at any moment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-03 22:50:58 +01:00
Arnaldo Carvalho de Melo	2c9b238eb3	trace: Change struct trace_event callbacks parameter list Impact: API change The trace_seq and trace_entry are in trace_iterator, where there are more fields that may be needed by tracers, so just pass the tracer_iterator as is already the case for struct tracer->print_line. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-03 14:03:52 +01:00
Frederic Weisbecker	c4a8e8be2d	trace: better manage the context info for events Impact: make trace_event more convenient for tracers All tracers (for the moment) that use the struct trace_event want to have the context info printed before their own output: the pid/cmdline, cpu, and timestamp. But some other tracers that want to implement their trace_event callbacks will not necessary need these information or they may want to format them as they want. This patch adds a new default-enabled trace option: TRACE_ITER_CONTEXT_INFO When disabled through: echo nocontext-info > /debugfs/tracing/trace_options The pid, cpu and timestamps headers will not be printed. IE with the sched_switch tracer with context-info (default): bash-2935 [001] 100.356561: 2935:120:S ==> [001] 0:140:R <idle> <idle>-0 [000] 100.412804: 0:140:R + [000] 11:115:S events/0 <idle>-0 [000] 100.412816: 0:140:R ==> [000] 11:115:R events/0 events/0-11 [000] 100.412829: 11:115:S ==> [000] 0:140:R <idle> Without context-info: 2935:120:S ==> [001] 0:140:R <idle> 0:140:R + [000] 11:115:S events/0 0:140:R ==> [000] 11:115:R events/0 11:115:S ==> [000] 0:140:R <idle> A tracer can disable it at runtime by clearing the bit TRACE_ITER_CONTEXT_INFO in trace_flags. The print routines were renamed to trace_print_context and trace_print_lat_context, so that they can be used by tracers if they want to use them for one of the trace_event callbacks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-03 14:03:52 +01:00
Steven Rostedt	79fb0768fb	trace: let boot trace be chosen by command line Now that we have a working ftrace=<tracer> function, make the boot tracer get activated by it. This way we can turn it on or off without recompiling the kernel, as well as keeping the selftests on. The selftests are disabled whenever a default tracer starts running. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-03 06:26:12 +01:00
Steven Rostedt	b2821ae68b	trace: fix default boot up tracer Peter Zijlstra started the functionality to start up a default tracing at bootup. This patch finishes the work. Now if you add 'ftrace=<tracer>' to the command line, when that tracer is registered on bootup, that tracer is selected and starts tracing. Note, all selftests for tracers that are registered after this tracer is disabled. This prevents the selftests from disturbing the running tracer, or the running tracer from disturbing the selftest. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-03 06:26:12 +01:00
Ingo Molnar	dc573f9b20	Merge branches 'tracing/ftrace', 'tracing/kmemtrace' and 'linus' into tracing/core	2009-02-03 06:25:38 +01:00
Linus Torvalds	31c952dcf8	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched_rt: don't use first_cpu on cpumask created with cpumask_and sched: fix buddie group latency sched: clear buddies more aggressively sched: symmetric sync vs avg_overlap sched: fix sync wakeups cpuset: fix possible deadlock in async_rebuild_sched_domains	2009-02-02 19:26:29 -08:00
Eric Dumazet	720eba31f4	modules: Use a better scheme for refcounting Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is using a lot of memory. Each 'struct module' contains an [NR_CPUS] array of full cache lines. This patch uses existing infrastructure (percpu_modalloc() & percpu_modfree()) to allocate percpu space for the refcount storage. Instead of wasting NR_CPUS128 bytes (on i386), we now use nr_cpu_idssizeof(local_t) bytes. On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce size of module files by about 2 Mbytes. (1Kb per module) Instead of having all refcounters in the same memory node - with TLB misses because of vmalloc() - this new implementation permits to have better NUMA properties, since each CPU will use storage on its preferred node, thanks to percpu storage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-02 19:17:55 -08:00
Yinghai Lu	10b888d6ce	irq, x86: fix lock status with numa_migrate_irq_desc Eric Paris reported: > I have an hp dl785g5 which is unable to successfully run > 2.6.29-0.66.rc3.fc11.x86_64 or 2.6.29-rc2-next-20090126. During bootup > (early in userspace daemons starting) I get the below BUG, which quickly > renders the machine dead. I assume it is because sparse_irq_lock never > gets released when the BUG kills that task. Adjust lock sequence when migrating a descriptor with CONFIG_NUMA_MIGRATE_IRQ_DESC enabled. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 11:36:31 +01:00
Rusty Russell	3d398703ef	sched_rt: don't use first_cpu on cpumask created with cpumask_and cpumask_and() only initializes nr_cpu_ids bits, so the (deprecated) first_cpu() might find one of those uninitialized bits if nr_cpu_ids is less than NR_CPUS (as it can be for CONFIG_CPUMASK_OFFSTACK). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 10:49:52 +01:00
Peter Zijlstra	a571bbeafb	sched: fix buddie group latency Similar to the previous patch, by not clearing buddies we can select entities past their run quota, which can increase latency. This means we have to clear group buddies as well. Do not use the group clear for pick_next_task(), otherwise that'll get O(n^2). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 10:49:51 +01:00
Mike Galbraith	a9f3e2b549	sched: clear buddies more aggressively It was noticed that a task could get re-elected past its run quota due to buddy affinities. This could increase latency a little. Cure it by more aggresively clearing buddy state. We do so in two situations: - when we force preempt - when we select a buddy to run Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 10:49:50 +01:00
Peter Zijlstra	1596e29773	sched: symmetric sync vs avg_overlap Reinstate the weakening of the sync hint if set. This yields a more symmetric usage of avg_overlap. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 10:49:49 +01:00
Peter Zijlstra	d942fb6c7d	sched: fix sync wakeups Pawel Dziekonski reported that the openssl benchmark and his quantum chemistry application both show slowdowns due to the scheduler under-parallelizing execution. The reason are pipe wakeups still doing 'sync' wakeups which overrides the normal buddy wakeup logic - even if waker and wakee are loosely coupled. Fix an inversion of logic in the buddy wakeup code. Reported-by: Pawel Dziekonski <dzieko@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-01 10:49:06 +01:00
Linus Torvalds	1347e965f5	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: use per cpu data for single cpu ipi calls cpumask: convert lib/smp_processor_id to new cpumask ops signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal()	2009-01-31 15:55:05 -08:00
Linus Torvalds	ac56b94f80	Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: export __set_irq_handler() and handle_level_irq()	2009-01-31 15:54:30 -08:00
Linus Torvalds	5b2d3e6d54	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: prevent negative expiry value after clock_was_set() hrtimers: allow the hot-unplugging of all cpus hrtimers: increase clock min delta threshold while interrupt hanging	2009-01-31 15:54:06 -08:00
Linus Torvalds	f6490438fc	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, ds, bts: cleanup/fix DS configuration ring-buffer: reset timestamps when ring buffer is reset trace: set max latency variable to zero on default trace: stop all recording to ring buffer on ftrace_dump trace: print ftrace_dump at KERN_EMERG log level ring_buffer: reset write when reserve buffer fail tracing/function-graph-tracer: fix a regression while suspend to disk ring-buffer: fix alignment problem	2009-01-31 15:53:30 -08:00
Thomas Gleixner	b0a9b5111a	hrtimer: prevent negative expiry value after clock_was_set() Impact: prevent false positive WARN_ON() in clockevents_program_event() clock_was_set() changes the base->offset of CLOCK_REALTIME and enforces the reprogramming of the clockevent device to expire timers which are based on CLOCK_REALTIME. If the clock change is large enough then the subtraction of the timer expiry value and base->offset can become negative which triggers the warning in clockevents_program_event(). Check the subtraction result and set a negative value to 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-01-30 22:35:34 +01:00
Sebastien Dugue	94df7de028	hrtimers: allow the hot-unplugging of all cpus Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 22:35:29 +01:00
Frederic Weisbecker	7f22391cbe	hrtimers: increase clock min delta threshold while interrupt hanging Impact: avoid timer IRQ hanging slow systems While using the function graph tracer on a virtualized system, the hrtimer_interrupt can hang the system on an infinite loop. This can be caused in several situations: - the hardware is very slow and HZ is set too high - something intrusive is slowing the system down (tracing under emulation) ... and the next clock events to program are always before the current time. This patch implements a reasonable compromise: if such a situation is detected, we share the CPUs time in 1/4 to process the hrtimer interrupts. This is enough to let the system running without serious starvation. It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ with function graph tracer launched. On both cases, the clock events were increased until about 25 ms periodic ticks, which means 40 HZ. So we change a hard to debug hang into a warning message and a system that still manages to limp along. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 22:35:10 +01:00
Steven Rostedt	d7240b9880	generic-ipi: use per cpu data for single cpu ipi calls The smp_call_function can be passed a wait parameter telling it to wait for all the functions running on other CPUs to complete before returning, or to return without waiting. Unfortunately, this is currently just a suggestion and not manditory. That is, the smp_call_function can decide not to return and wait instead. The reason for this is because it uses kmalloc to allocate storage to send to the called CPU and that CPU will free it when it is done. But if we fail to allocate the storage, the stack is used instead. This means we must wait for the called CPU to finish before continuing. Unfortunatly, some callers do no abide by this hint and act as if the non-wait option is mandatory. The MTRR code for instance will deadlock if the smp_call_function is set to wait. This is because the smp_call_function will wait for the other CPUs to finish their called functions, but those functions are waiting on the caller to continue. This patch changes the generic smp_call_function code to use per cpu variables if the allocation of the data fails for a single CPU call. The smp_call_function_many will fall back to the smp_call_function_single if it fails its alloc. The smp_call_function_single is modified to not force the wait state. Since we now are using a single data per cpu we must synchronize the callers to prevent a second caller modifying the data before the first called IPI functions complete. To do so, I added a flag to the call_single_data called CSD_FLAG_LOCK. When the single CPU is called (which can be called when a many call fails an alloc), we set the LOCK bit on this per cpu data. When the caller finishes it clears the LOCK bit. The caller must wait till the LOCK bit is cleared before setting it. When it is cleared, there is no IPI function using it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 18:31:08 +01:00
Randy Dunlap	ecf441b593	kmemtrace: fix printk formats, fix Geert Uytterhoeven wrote: > %4zu? Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-30 16:12:33 +01:00
Paul Menage	839ec5452e	cgroup: fix root_count when mount fails due to busy subsystem root_count was being incremented in cgroup_get_sb() after all error checking was complete, but decremented in cgroup_kill_sb(), which can be called on a superblock that we gave up on due to an error. This patch changes cgroup_kill_sb() to only decrement root_count if the root was previously linked into the list of roots. Signed-off-by: Paul Menage <menage@google.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-29 18:04:45 -08:00
Paul Menage	804b3c28a4	cgroups: add cpu_relax() calls in css_tryget() and cgroup_clear_css_refs() css_tryget() and cgroup_clear_css_refs() contain polling loops; these loops should have cpu_relax calls in them to reduce cross-cache traffic. Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-29 18:04:45 -08:00
Li Zefan	1404f06565	cgroups: fix lock inconsistency in cgroup_clone() I fixed a bug in cgroup_clone() in Linus' tree in commit 7b574b7 ("cgroups: fix a race between cgroup_clone and umount") without noticing there was a cleanup patch in -mm tree that should be rebased (now commit 104cbd5, "cgroups: use task_lock() for access tsk->cgroups safe in cgroup_clone()"), thus resulted in lock inconsistency. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-29 18:04:45 -08:00
KAMEZAWA Hiroyuki	baef99a08a	cgroups: use hierarchy mutex in creation failure path Now, cgrp->sibling is handled under hierarchy mutex. error route should do so, too. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-29 18:04:43 -08:00
Arnaldo Carvalho de Melo	b3a8c34886	trace_sched_wakeup: Remove unused variable Impact: cleanup Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-29 14:31:03 +01:00
Arnaldo Carvalho de Melo	f04109bf1b	trace: Use tracing_reset_online_cpus in more places Impact: cleanup Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-29 14:28:31 +01:00
David Daney	97179fd46d	cpumask fallout: Initialize irq_default_affinity earlier Move the initialization of irq_default_affinity to early_irq_init as core_initcall is too late. irq_default_affinity can be used in init_IRQ and potentially timer and SMP init as well. All of these happen before core_initcall. Moving the initialization to early_irq_init ensures that it is initialized before it is used. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-27 16:06:55 -08:00
David Daney	1267a8df20	Make irq_*_affinity depend on CONFIG_GENERIC_HARDIRQS too. In interrupt.h these functions are declared only if CONFIG_GENERIC_HARDIRQS is set. We should define them under identical conditions. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-27 16:06:49 -08:00
Linus Torvalds	490a8d70cd	Merge branch 'hibern_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'hibern_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: SATA PIIX: Blacklist system that spins off disks during ACPI power off SATA Sil: Blacklist system that spins off disks during ACPI power off SATA AHCI: Blacklist system that spins off disks during ACPI power off SATA: Blacklisting of systems that spin off disks during ACPI power off DMI: Introduce dmi_first_match to make the interface more flexible Hibernation: Introduce system_entering_hibernation	2009-01-27 07:50:41 -08:00
Ingo Molnar	4a66a82be7	Merge branches 'tracing/blktrace', 'tracing/kmemtrace' and 'tracing/urgent' into tracing/core	2009-01-27 14:30:57 +01:00

... 2 3 4 5 6 ...

6160 Commits