linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	7193675dc8	Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW	2009-08-04 15:32:08 -07:00
Linus Torvalds	9c66812b6b	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Fix missing function_graph events when we splice_read from trace_pipe tracing: Fix invalid function_graph entry trace: stop tracer in oops_enter() ftrace: Only update $offset when we update $ref_func ftrace: Fix the conditional that updates $ref_func tracing: only truncate ftrace files when O_TRUNC is set tracing: show proper address for trace-printk format	2009-08-04 15:31:51 -07:00
Martin Schwidefsky	97fd9ed48c	timers: Cache __next_timer_interrupt result Each time a cpu goes to sleep on a NOHZ=y system the timer wheel is searched for the next timer interrupt. It can take quite a few cycles to find the next pending timer. This patch adds a field to tvec_base that caches the result of __next_timer_interrupt. The hit ratio is around 80% on my thinkpad under normal use, on a server I've seen hit ratios from 5% to 95% dependent on the workload. -v2: jiffies wrap fixes Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090721202505.7d56a079@skybase> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 20:28:25 +02:00
Jiri Slaby	990a55eb25	irq: Clean up by removing irqfixup MODULE_PARM_DESC() It's wrong (the parm takes no arguments) and compile-time wiped away anyway (due to !MODULE). Signed-off-by: Jiri Slaby <jirislaby@gmail.com> LKML-Reference: <1248991326-26267-1-git-send-email-jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 17:00:11 +02:00
Darren Hart	cc6db4e601	futex: Correct futex_wait_requeue_pi() commentary The state machine described in the comments wasn't updated with a follow-on fix. Address that and cleanup the corresponding commentary in the function. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <4A737C2A.9090001@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 15:59:14 +02:00
Bart Van Assche	5b0f437df0	workqueues: Improve schedule_work() documentation Two important aspects of the schedule_work() function are not yet documented: - that it is allowed to pass a struct work_struct * to this function that is already on the kernel-global workqueue; - the meaning of its return value. The patch below documents both aspects. Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com> Cc: "Greg Kroah-Hartman" <gregkh@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <200907301900.54202.bart.vanassche@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 15:21:16 +02:00
Ingo Molnar	e16852cfc5	Merge branch 'tracing/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/urgent	2009-08-04 13:58:28 +02:00
Hiroshi Shimamoto	70d715fd05	posix-timers: Fix oops in clock_nanosleep() with CLOCK_MONOTONIC_RAW Prevent calling do_nanosleep() with clockid CLOCK_MONOTONIC_RAW, it may cause oops, such as NULL pointer dereference. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: <stable@kernel.org> LKML-Reference: <4A764FF3.50607@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 10:16:41 +02:00
Stanislaw Gruszka	a42548a188	cputime: Optimize jiffies_to_cputime(1) For powerpc with CONFIG_VIRT_CPU_ACCOUNTING jiffies_to_cputime(1) is not compile time constant and run time calculations are quite expensive. To optimize we use precomputed value. For all other architectures is is preprocessor definition. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-5-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-03 14:48:36 +02:00
Stanislaw Gruszka	d1e3b6d195	itimers: Simplify arm_timer() code a bit Don't update values in expiration cache when new ones are equal. Add expire_le() and expire_gt() helpers to simplify the code. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-4-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-03 14:48:36 +02:00
Stanislaw Gruszka	8356b5f9c4	itimers: Fix periodic tics precision Measure ITIMER_PROF and ITIMER_VIRT timers interval error between real ticks and requested by user. Take it into account when scheduling next tick. This patch introduce possibility where time between two consecutive tics is smaller then requested interval, it preserve however dependency that n tick is generated not earlier than n*interval time - counting from the beginning of periodic signal generation. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-3-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-03 14:48:35 +02:00
Stanislaw Gruszka	42c4ab41a1	itimers: Merge ITIMER_VIRT and ITIMER_PROF Both cpu itimers have same data flow in the few places, this patch make unification of code related with VIRT and PROF itimers. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <1248862529-6063-2-git-send-email-sgruszka@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-03 14:48:35 +02:00
Ming Lei	90629209a0	lockdep: Fix memory usage info of BFS The unit is KB, so sizeof(struct circular_queue) should be divided by 1024. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Cc: davem@davemloft.net Cc: Ming Lei <tom.leiming@gmail.com> Cc: a.p.zijlstra@chello.nl LKML-Reference: <1249220616-7190-1-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:45:28 +02:00
Ming Lei	e351b660fd	lockdep: Reintroduce generation count to make BFS faster We still can apply DaveM's generation count optimization to BFS, based on the following idea: - before doing each BFS, increase the global generation id by 1 - if one node in the graph has been visited, mark it as visited by storing the current global generation id into the node's dep_gen_id field - so we can decide if one node has been visited already, by comparing the node's dep_gen_id with the global generation id. By applying DaveM's generation count optimization to current implementation of BFS, we gain the following advantages: - we save MAX_LOCKDEP_ENTRIES/8 bytes memory; - we remove the bitmap_zero(bfs_accessed, MAX_LOCKDEP_ENTRIES); in each BFS, which is very time-consuming since MAX_LOCKDEP_ENTRIES may be very large.(16384UL) Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "David S. Miller" <davem@davemloft.net> LKML-Reference: <1248274089-6358-1-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:41:37 +02:00
Peter Zijlstra	bb97a91e25	lockdep: Deal with many similar locks spin_lock_nest_lock() allows to take many instances of the same class, this can easily lead to overflow of MAX_LOCK_DEPTH. To avoid this overflow, we'll stop accounting instances but start reference counting the class in the held_lock structure. [ We could maintain a list of instances, if we'd move the hlock stuff into __lock_acquired(), but that would require significant modifications to the current code. ] We restrict this mode to spin_lock_nest_lock() only, because it degrades the lockdep quality due to lost of instance. For lockstat this means we don't track lock statistics for any but the first lock in the series. Currently nesting is limited to 11 bits because that was the spare space available in held_lock. This yields a 2048 instances maximium. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:41:35 +02:00
Peter Zijlstra	f607c66857	lockdep: Introduce lockdep_assert_held() Add a lockdep helper to validate that we indeed are the owner of a lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:41:34 +02:00
Peter Zijlstra	98c33eddaf	lockdep: Fix style nits fixes a few comments and whitespaces that annoyed me. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:41:32 +02:00
Peter Zijlstra	4f84f4330a	lockdep: Fix backtraces Truncate stupid -1 entries in backtraces. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248096665.15751.8816.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:41:31 +02:00
Ingo Molnar	bcf08df3b2	sched: Fix cpupri build on !CONFIG_SMP This build bug: In file included from kernel/sched.c:1765: kernel/sched_rt.c: In function ‘has_pushable_tasks’: kernel/sched_rt.c:1069: error: ‘struct rt_rq’ has no member named ‘pushable_tasks’ kernel/sched_rt.c: In function ‘pick_next_task_rt’: kernel/sched_rt.c:1084: error: ‘struct rq’ has no member named ‘post_schedule’ Triggers because both pushable_tasks and post_schedule are SMP-only fields. Move pushable_tasks() to the SMP section and #ifdef the post_schedule use. Cc: Gregory Haskins <ghaskins@novell.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 15:15:51 +02:00
Ingo Molnar	bbfa26229a	lockdep: Fix BFS build Fix: kernel/built-in.o: In function `lockdep_stats_show': lockdep_proc.c:(.text+0x48202): undefined reference to `max_bfs_queue_depth' As max_bfs_queue_depth is only available under CONFIG_PROVE_LOCKING=y. Cc: Ming Lei <tom.leiming@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:51:06 +02:00
Peter Zijlstra	693525e3be	sched: Ensure the migration task doesn't go away during use Like sched_migrate_task(), set_cpus_allowed_ptr() should hold onto the migration thread too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:15 +02:00
Peter Zijlstra	8f48894fcc	sched: Add debug check to task_of() A frequent mistake appears to be to call task_of() on a scheduler entity that is not actually a task, which can result in a wild pointer. Add a check to catch these mistakes. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:14 +02:00
Gregory Haskins	00aec93d10	sched: Fully integrate cpus_active_map and root-domain code Reflect "active" cpus in the rq->rd->online field, instead of the online_map. The motivation is that things that use the root-domain code (such as cpupri) only care about cpus classified as "active" anyway. By synchronizing the root-domain state with the active map, we allow several optimizations. For instance, we can remove an extra cpumask_and from the scheduler hotpath by utilizing rq->rd->online (since it is now a cached version of cpu_active_map & rq->rd->span). Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:12 +02:00
Gregory Haskins	3f029d3c6d	sched: Enhance the pre/post scheduling logic We currently have an explicit "needs_post" vtable method which returns a stack variable for whether we should later run post-schedule. This leads to an awkward exchange of the variable as it bubbles back up out of the context switch. Peter Zijlstra observed that this information could be stored in the run-queue itself instead of handled on the stack. Therefore, we revert to the method of having context_switch return void, and update an internal rq->post_schedule variable when we require further processing. In addition, we fix a race condition where we try to access current->sched_class without holding the rq->lock. This is technically racy, as the sched-class could change out from under us. Instead, we reference the per-rq post_schedule variable with the runqueue unlocked, but with preemption disabled to see if we need to reacquire the rq->lock. Finally, we clean the code up slightly by removing the #ifdef CONFIG_SMP conditionals from the schedule() call, and implement some inline helper functions instead. This patch passes checkpatch, and rt-migrate. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:10 +02:00
Steven Rostedt	c3a2ae3d93	sched: Add new prio to cpupri before removing old prio We need to add the new prio to the cpupri accounting before removing the old prio. This is because removing the old prio first will open a race window where the cpu will be removed from pri_active. In this case the cpu will not be visible for RT push and pulls. This could cause a RT task to not migrate appropriately, and create a very large latency. This bug was found with the use of ftrace sched events and trace_printk. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729042526.438281019@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:09 +02:00
Steven Rostedt	da19ab5103	sched: Check for pushing rt tasks after all scheduling The current method for pushing RT tasks after scheduling only happens after a context switch. But we found cases where a task is set up on a run queue to be pushed but the push never happens because the schedule chooses the same task. This bug was found with the help of Gregory Haskins and the use of ftrace (trace_printk). It tooks several days for both of us analyzing the code and the trace output to find this. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729042526.205923666@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:08 +02:00
Peter Zijlstra	e709715915	sched: Optimize unused cgroup configuration When cgroup group scheduling is built in, skip some code paths if we don't have any (but the root) cgroups configured. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:07 +02:00
Peter Zijlstra	a5004278f0	sched: Fix cgroup smp fairness Commit ec4e0e2fe018992d980910db901637c814575914 ("fix inconsistency when redistribute per-cpu tg->cfs_rq shares") broke cgroup smp fairness. In order to avoid starvation of newly placed tasks, we never quite set the share of an empty cpu group-task to 0, but instead we set it as if there's a single NICE-0 task present. If however we actually set this in cfs_rq[cpu]->shares, that means the total shares for that group will be slightly inflated every time we balance, causing the observed unfairness. Fix this by setting cfs_rq[cpu]->shares to 0 but actually setting the effective weight of the related se to the inflated number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248696557.6987.1615.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:26:06 +02:00
Ingo Molnar	8e9ed8b024	Merge branch 'sched/urgent' into sched/core Merge reason: avoid upcoming patch conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:23:57 +02:00
Gregory Haskins	07903af152	sched: Fix race in cpupri introduced by cpumask_var changes Background: Several race conditions in the scheduler have cropped up recently, which Steven and I have tracked down using ftrace. The most recent one turns out to be a race in how the scheduler determines a suitable migration target for RT tasks, introduced recently with commit: commit 68e74568fbe5854952355e942acca51f138096d9 Date: Tue Nov 25 02:35:13 2008 +1030 sched: convert struct cpupri_vec cpumask_var_t. The original design of cpupri allowed lockless readers to quickly determine a best-estimate target. Races between the pri_active bitmap and the vec->mask were handled in the original code because we would detect and return "0" when this occured. The design was predicated on the effective atomicity () of caching the result of cpus_and() between the cpus_allowed and the vec->mask. Commit 68e74568 changed the behavior such that vec->mask is accessed multiple times. This introduces a subtle race, the result of which means we can have a result that returns "1", but with an empty bitmap. ) yes, we know cpus_and() is not a locked operator across the entire composite array, but it is implicitly atomic on a per-word basis which is all the design required to work. Implementation: Rather than forgoing the lockless design, or reverting to a stack-based cpumask_t, we simply check for when the race has been encountered and continue processing in the event that the race is hit. This renders the removal race as if the priority bit had been atomically cleared as well, and allows the algorithm to execute correctly. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:23:29 +02:00
Peter Zijlstra	e414314cce	sched: Fix latencytop and sleep profiling vs group scheduling The latencytop and sleep accounting code assumes that any scheduler entity represents a task, this is not so. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 14:10:12 +02:00
Peter Zijlstra	9f498cc5be	perf_counter: Full task tracing In order to be able to distinguish between no samples due to inactivity and no samples due to task ended, Arjan asked for PERF_EVENT_EXIT events. This is useful to the boot delay instrumentation (bootchart) app. This patch changes the PERF_EVENT_FORK to be emitted on every clone, and adds PERF_EVENT_EXIT to be emitted on task exit, after the task's counters have been closed. This task tracing is controlled through: attr.comm \|\| attr.mmap and through the new attr.task field. Suggested-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ cleaned up perf_counter.h a bit ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 13:47:56 +02:00
Peter Zijlstra	e53c099470	perf_counter: Collapse inherit on read() Currently the counter value returned by read() is the value of the parent counter, to which child counters are only fed back on child exit. Thus read() can return rather erratic (and meaningless) numbers depending on the state of the child processes. Change this by always iterating the full child hierarchy on read() and sum all counters. Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 13:47:54 +02:00
Ingo Molnar	c1dc0b9c0c	debug lockups: Improve lockup detection When debugging a recent lockup bug i found various deficiencies in how our current lockup detection helpers work: - SysRq-L is not very efficient as it uses a workqueue, hence it cannot punch through hard lockups and cannot see through most soft lockups either. - The SysRq-L code depends on the NMI watchdog - which is off by default. - We dont print backtraces from the RCU code's built-in 'RCU state machine is stuck' debug code. This debug code tends to be one of the first (and only) mechanisms that show that a lockup has occured. This patch changes the code so taht we: - Trigger the NMI backtrace code from SysRq-L instead of using a workqueue (which cannot punch through hard lockups) - Trigger print-all-CPU-backtraces from the RCU lockup detection code Also decouple the backtrace printing code from the NMI watchdog: - Dont use variable size cpumasks (it might not be initialized and they are a bit more fragile anyway) - Trigger an NMI immediately via an IPI, instead of waiting for the NMI tick to occur. This is a lot faster and can produce more relevant backtraces. It will also work if the NMI watchdog is disabled. - Dont print the 'dazed and confused' message when we print a backtrace from the NMI - Do a show_regs() plus a dump_stack() to get maximum info out of the dump. Worst-case we get two stacktraces - which is not a big deal. Sometimes, if register content is corrupted, the precise stack walker in show_regs() wont give us a full backtrace - in this case dump_stack() will do it. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-02 13:27:17 +02:00
Linus Torvalds	0dd8486b5c	do_sigaltstack: small cleanups The previous commit ("do_sigaltstack: avoid copying 'stack_t' as a structure to user space") fixed a real bug. This one just cleans up the copy from user space to that gcc can generate better code for it (and so that it looks the same as the later copy back to user space). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-01 11:18:56 -07:00
Linus Torvalds	0083fc2c50	do_sigaltstack: avoid copying 'stack_t' as a structure to user space Ulrich Drepper correctly points out that there is generally padding in the structure on 64-bit hosts, and that copying the structure from kernel to user space can leak information from the kernel stack in those padding bytes. Avoid the whole issue by just copying the three members one by one instead, which also means that the function also can avoid the need for a stack frame. This also happens to match how we copy the new structure from user space, so it all even makes sense. [ The obvious solution of adding a memset() generates horrid code, gcc does really stupid things. ] Reported-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-01 10:46:52 -07:00
Linus Torvalds	b592972493	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing/stat: Fix seqfile memory leak function-graph: Fix seqfile memory leak trace_stack: Fix seqfile memory leak profile: Suppress warning about large allocations when profile=1 is specified	2009-07-30 16:46:58 -07:00
Masami Hiramatsu	ec30c5f3a1	kprobes: Use kernel_text_address() for checking probe address Use kernel_text_address() for checking probe address instead of __kernel_text_address(), because __kernel_text_address() returns true for init functions even after relaseing those functions. That will hit a BUG() in text_poke(). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-30 16:44:06 -07:00
Mel Gorman	b62f495dad	profile: suppress warning about large allocations when profile=1 is specified When profile= is used, a large buffer is allocated early at boot. This can be larger than what the page allocator can provide so it prints a warning. However, the caller is able to handle the situation so this patch suppresses the warning. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:36 -07:00
KAMEZAWA Hiroyuki	887032670d	cgroup avoid permanent sleep at rmdir After commit ec64f51545fffbc4cb968f0cea56341a4b07e85a ("cgroup: fix frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg) doesn't return -EBUSY by temporary ref counts. That commit expects all refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can wait permanently. This patch tries to fix that and change followings. - set CGRP_WAIT_ON_RMDIR flag before pre_destroy(). - clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case. if there are sleeping ones, wakes them up. - rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set. Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Paul Menage <menage@google.com> Acked-by: Balbir Sigh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:35 -07:00
Li Zefan	096b7fe012	cgroups: fix pid namespace bug The bug was introduced by commit cc31edceee04a7b87f2be48f9489ebb72d264844 ("cgroups: convert tasks file to use a seq_file with shared pid array"). We cache a pid array for all threads that are opening the same "tasks" file, but the pids in the array are always from the namespace of the last process that opened the file, so all other threads will read pids from that namespace instead of their own namespaces. To fix it, we maintain a list of pid arrays, which is keyed by pid_ns. The list will be of length 1 at most time. Reported-by: Paul Menage <menage@google.com> Idea-by: Paul Menage <menage@google.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Reviewed-by: Serge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:35 -07:00
Hidetoshi Seto	11c7da4b0c	kexec: fix omitting offset in extended crashkernel syntax Setting "crashkernel=512M-2G:64M,2G-:128M" does not work but it turns to work if it has a trailing-whitespace, like "crashkernel=512M-2G:64M,2G-:128M ". It was because of a bug in the parser, running over the cmdline. This patch adds a check of the termination. Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:34 -07:00
Rik van Riel	933b787b57	mm: copy over oom_adj value at fork time Fix a post-2.6.31 regression which was introduced by 2ff05b2b4eac2e63d345fc731ea151a060247f53 ("oom: move oom_adj value from task_struct to mm_struct"). After moving the oom_adj value from the task struct to the mm_struct, the oom_adj value was no longer properly inherited by child processes. Copying over the oom_adj value at fork time fixes that bug. [kosaki.motohiro@jp.fujitsu.com: test for current->mm before dereferencing it] Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Paul Menage <manage@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:34 -07:00
Lai Jiangshan	74e7ff8c50	tracing: Fix missing function_graph events when we splice_read from trace_pipe About a half events are missing when we splice_read from trace_pipe. They are unexpectedly consumed because we ignore the TRACE_TYPE_NO_CONSUME return value used by the function graph tracer when it needs to consume the events by itself to walk on the ring buffer. The same problem appears with ftrace_dump() Example of an output before this patch: 1) \| ktime_get_real() { 1) 2.846 us \| read_hpet(); 1) 4.558 us \| } 1) 6.195 us \| } After this patch: 0) \| ktime_get_real() { 0) \| getnstimeofday() { 0) 1.960 us \| read_hpet(); 0) 3.597 us \| } 0) 5.196 us \| } The fix also applies on 2.6.30 Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-28 23:17:23 +02:00
Lai Jiangshan	38ceb592fc	tracing: Fix invalid function_graph entry When print_graph_entry() computes a function call entry event, it needs to also check the next entry to guess if it matches the return event of the current function entry. In order to look at this next event, it needs to consume the current entry before going ahead in the ring buffer. However, if the current event that gets consumed is the last one in the ring buffer head page, the ring_buffer may reuse the page for writers. The consumed entry will then become invalid because of possible racy overwriting. Me must then handle this entry by making a copy of it. The fix also applies on 2.6.30 Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-28 23:17:23 +02:00
Oleg Nesterov	9ae260270c	update the comment in kthread_stop() Commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 ("kthreads: rework kthread_stop()") removed the limitation that the thread function mysr not call do_exit() itself, but forgot to update the comment. Since that commit it is OK to use kthread_stop() even if kthread can exit itself. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:15:46 -07:00
Mike Frysinger	6560dc160f	module: use MODULE_SYMBOL_PREFIX with module_layout The check_modstruct_version() needs to look up the symbol "module_layout" in the kernel, but it does so literally and not by a C identifier. The trouble is that it does not include a symbol prefix for those ports that need it (like the Blackfin and H8300 port). So make sure we tack on the MODULE_SYMBOL_PREFIX define to the front of it. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:15:45 -07:00
Thomas Gleixner	bdff78707f	trace: stop tracer in oops_enter() If trace_printk_on_oops is set we lose interesting trace information when the tracer is enabled across oops handling and printing. We want the trace which might give us information _WHY_ we oopsed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-07-24 15:30:45 -04:00
Peter Zijlstra	af01296145	lockdep: BFS cleanup Some cleanups of the lockdep code after the BFS series: - Remove the last traces of the generation id - Fixup comment style - Move the bfs routines into lockdep.c - Cleanup the bfs routines [ tom.leiming@gmail.com: Fix crash ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1246201486-7308-11-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-24 10:53:29 +02:00
Ming Lei	12f3dfd022	lockdep: Add statistics info for max bfs queue depth Add BFS statistics to the existing lockdep stats. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1246201486-7308-10-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-24 10:52:03 +02:00

... 12 13 14 15 16 ...

8405 Commits