linux

iv/linux

Author	SHA1	Message	Date
Dan Carpenter	528f7ce6e4	PM / Suspend: Off by one in pm_suspend() In enter_state() we use "state" as an offset for the pm_states[] array. The pm_states[] array only has PM_SUSPEND_MAX elements so this test is off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org	2011-10-16 23:27:46 +02:00
Martin Schwidefsky	85055dd805	PM / Hibernate: Include storage keys in hibernation image on s390 For s390 there is one additional byte associated with each page, the storage key. This byte contains the referenced and changed bits and needs to be included into the hibernation image. If the storage keys are not restored to their previous state all original pages would appear to be dirty. This can cause inconsistencies e.g. with read-only filesystems. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-10-16 23:27:46 +02:00
Rafael J. Wysocki	ca123102f6	PM: Fix build issue in main.c for CONFIG_PM_SLEEP unset Suspend statistics should depend on CONFIG_PM_SLEEP, so make that happen. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-10-16 23:27:46 +02:00
ShuoX Liu	2a77c46de1	PM / Suspend: Add statistics debugfs file for suspend to RAM Record S3 failure time about each reason and the latest two failed devices' names in S3 progress. We can check it through 'suspend_stats' entry in debugfs. The motivation of the patch: We are enabling power features on Medfield. Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power might be used up soon. We often find sometimes, a device suspend fails. Then, system retries s3 over and over again. As display is off, testers and developers don't know what happens. Some testers and developers complain they don't know if system tries suspend-2-ram, and what device fails to suspend. They need such info for a quick check. The patch adds suspend_stats under debugfs for users to check suspend to RAM statistics quickly. If not using this patch, we have other methods to get info about what device fails. One is to turn on CONFIG_PM_DEBUG, but users would get too much info and testers need recompile the system. In addition, dynamic debug is another good tool to dump debug info. But it still doesn't match our utilization scenario closely. 1) user need write a user space parser to process the syslog output; 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. No serial console available during the testing. One is because console would be suspended, and the other is serial console connecting with spi or HSU devices would consume power. These devices are powered off at suspend-2-ram. Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-10-16 23:27:45 +02:00
Steven Rostedt	436fc28026	tracing: Fix returning of duplicate data after EOF in trace_pipe_raw The trace_pipe_raw handler holds a cached page from the time the file is opened to the time it is closed. The cached page is used to handle the case of the user space buffer being smaller than what was read from the ring buffer. The left over buffer is held in the cache so that the next read will continue where the data left off. After EOF is returned (no more data in the buffer), the index of the cached page is set to zero. If a user app reads the page again after EOF, the check in the buffer will see that the cached page is less than page size and will return the cached page again. This will cause reading the trace_pipe_raw again after EOF to return duplicate data, making the output look like the time went backwards but instead data is just repeated. The fix is to not reset the index right after all data is read from the cache, but to reset it after all data is read and more data exists in the ring buffer. Cc: stable <stable@kernel.org> Reported-by: Jeremy Eder <jeder@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-10-14 10:44:25 -04:00
Geunsik Lim	9b5f8b31af	ftrace: Fix README to state tracing_on to start/stop tracing tracing_enabled option is deprecated. To start/stop tracing, write to /sys/kernel/debug/tracing/tracing_on without tracing_enabled. This patch is based on Linux 3.1.0-rc1 Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Link: http://lkml.kernel.org/r/1313127022-23830-1-git-send-email-leemgs1@gmail.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-10-14 10:41:33 -04:00
Ingo Molnar	910e94dd0c	Merge branch 'tip/perf/core' of git://github.com/rostedt/linux into perf/core	2011-10-12 17:14:47 +02:00
Ingo Molnar	177e2163fe	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core	2011-10-12 09:07:49 +02:00
Steven Rostedt	d696b58ca2	tracing: Do not allocate buffer for trace_marker When doing intense tracing, the kmalloc inside trace_marker can introduce side effects to what is being traced. As trace_marker() is used by userspace to inject data into the kernel ring buffer, it needs to do so with the least amount of intrusion to the operations of the kernel or the user space application. As the ring buffer is designed to write directly into the buffer without the need to make a temporary buffer, and userspace already went through the hassle of knowing how big the write will be, we can simply pin the userspace pages and write the data directly into the buffer. This improves the impact of tracing via trace_marker tremendously! Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the use of get_user_pages_fast() and kmap_atomic(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-10-11 09:13:53 -04:00
Steven Rostedt	e0a413f619	tracing: Warn on output if the function tracer was found corrupted As the function tracer is very intrusive, lots of self checks are performed on the tracer and if something is found to be strange it will shut itself down keeping it from corrupting the rest of the kernel. This shutdown may still allow functions to be traced, as the tracing only stops new modifications from happening. Trying to stop the function tracer itself can cause more harm as it requires code modification. Although a WARN_ON() is executed, a user may not notice it. To help the user see that something isn't right with the tracing of the system a big warning is added to the output of the tracer that lets the user know that their data may be incomplete. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-10-11 09:13:25 -04:00
Masami Hiramatsu	02ca1521ad	ftrace/kprobes: Fix not to delete probes if in use Fix kprobe-tracer not to delete a probe if the probe is in use. In that case, delete operation will return -EBUSY. This bug can cause a kernel panic if enabled probes are deleted during perf record. (Add some probes on functions) sh-4.2# perf probe --del probe:\* sh-4.2# exit (kernel panic) This is originally reported on the fedora bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=742383 I've also checked that this problem doesn't happen on tracepoints when module removing because perf event locks target module. $ sudo ./perf record -e xfs:\* -aR sh sh-4.2# rmmod xfs ERROR: Module xfs is in use sh-4.2# exit [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.203 MB perf.data (~8862 samples) ] Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frank Ch. Eigler <fche@redhat.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/20111004104438.14591.6553.stgit@fedora15 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-10-10 15:13:03 -04:00
Rafael J. Wysocki	9696cc9007	Merge branch 'pm-qos' into pm-for-linus * pm-qos: PM / QoS: Update Documentation for the pm_qos and dev_pm_qos frameworks PM / QoS: Add function dev_pm_qos_read_value() (v3) PM QoS: Add global notification mechanism for device constraints PM QoS: Implement per-device PM QoS constraints PM QoS: Generalize and export constraints management code PM QoS: Reorganize data structs PM QoS: Code reorganization PM QoS: Minor clean-ups PM QoS: Move and rename the implementation files	2011-10-07 23:17:07 +02:00
Rafael J. Wysocki	d727b60659	Merge branch 'pm-runtime' into pm-for-linus * pm-runtime: PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set PM / Runtime: Replace dev_dbg() with trace_rpm_() PM / Runtime: Introduce trace points for tracing rpm_ functions PM / Runtime: Don't run callbacks under lock for power.irq_safe set USB: Add wakeup info to debugging messages PM / Runtime: pm_runtime_idle() can be called in atomic context PM / Runtime: Add macro to test for runtime PM events PM / Runtime: Add might_sleep() to runtime PM functions	2011-10-07 23:16:55 +02:00
David S. Miller	88c5100c28	Merge branch 'master' of github.com:davem330/net Conflicts: net/batman-adv/soft-interface.c	2011-10-07 13:38:43 -04:00
Ingo Molnar	9d01402023	Merge commit 'v3.1-rc9' into perf/core Merge reason: pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:49:21 +02:00
Thomas Gleixner	510f5acc4f	sched: Don't use tasklist_lock for debug prints Avoid taking locks from debug prints, this avoids latencies on -rt, and improves reliability of the debug code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:47:08 +02:00
Thomas Gleixner	1c83437e80	sched: Warn on rt throttling The default rt-throttling is a source of never ending questions. Warn once when we go into throttling so folks have that info in dmesg. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1110051331480.18778@ionos Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:47:04 +02:00
Peter Zijlstra	4939602a24	sched: Unify the ->cpus_allowed mask copy Currently every sched_class::set_cpus_allowed() implementation has to copy the cpumask into task_struct::cpus_allowed, this is pointless, put this copy in the generic code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-jhl5s9fckd9ptw1fzbqqlrd3@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:47:00 +02:00
Peter Zijlstra	fa17b507f1	sched: Wrap scheduler p->cpus_allowed access This task is preparatory for the migrate_disable() implementation, but stands on its own and provides a cleanup. It currently only converts those sites required for task-placement. Kosaki-san once mentioned replacing cpus_allowed with a proper cpumask_t instead of the NR_CPUS sized array it currently is, that would also require something like this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Link: http://lkml.kernel.org/n/tip-e42skvaddos99psip0vce41o@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:46:56 +02:00
Suresh Siddha	6eb57e0d65	sched: Request for idle balance during nohz idle load balance rq's idle_at_tick is set to idle/busy during the timer tick depending on the cpu was idle or not. This will be used later in the load balance that will be done in the softirq context (which is a process context in -RT kernels). For nohz kernels, for the cpu doing nohz idle load balance on behalf of all the idle cpu's, its rq->idle_at_tick might have a stale value (which is recorded when it got the timer tick presumably when it is busy). As the nohz idle load balancing is also being done at the same place as the regular load balancing, nohz idle load balancing was bailing out when it sees rq's idle_at_tick not set. Thus leading to poor system utilization. Rename rq's idle_at_tick to idle_balance and set it when someone requests for nohz idle balance on an idle cpu. Reported-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111003220934.892350549@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:46:27 +02:00
Suresh Siddha	ca38062e57	sched: Use resched IPI to kick off the nohz idle balance Current use of smp call function to kick the nohz idle balance can deadlock in this scenario. 1. cpu-A did a generic_exec_single() to cpu-B and after queuing its call single data (csd) to the call single queue, cpu-A took a timer interrupt. Actual IPI to cpu-B to process the call single queue is not yet sent. 2. As part of the timer interrupt handler, cpu-A decided to kick cpu-B for the idle load balancing (sets cpu-B's rq->nohz_balance_kick to 1) and __smp_call_function_single() with nowait will queue the csd to the cpu-B's queue. But the generic_exec_single() won't send an IPI to cpu-B as the call single queue was not empty. 3. cpu-A is busy with lot of interrupts 4. Meanwhile cpu-B is entering and exiting idle and noticed that it has it's rq->nohz_balance_kick set to '1'. So it will go ahead and do the idle load balancer and clear its rq->nohz_balance_kick. 5. At this point, csd queued as part of the step-2 above is still locked and waiting to be serviced on cpu-B. 6. cpu-A is still busy with interrupt load and now it got another timer interrupt and as part of it decided to kick cpu-B for another idle load balancing (as it finds cpu-B's rq->nohz_balance_kick cleared in step-4 above) and does __smp_call_function_single() with the same csd that is still locked. 7. And we get a deadlock waiting for the csd_lock() in the __smp_call_function_single(). Main issue here is that cpu-B can service the idle load balancer kick request from cpu-A even with out receiving the IPI and this lead to doing multiple __smp_call_function_single() on the same csd leading to deadlock. To kick a cpu, scheduler already has the reschedule vector reserved. Use that mechanism (kick_process()) instead of using the generic smp call function mechanism to kick off the nohz idle load balancing and avoid the deadlock. [ This issue is present from 2.6.35+ kernels, but marking it -stable only from v3.0+ as the proposed fix depends on the scheduler_ipi() that is introduced recently. ] Reported-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org # v3.0+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111003220934.834943260@sbsiddha-desk.sc.intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-06 12:46:23 +02:00
Thomas Gleixner	68cc3990a5	rtmutex: Add missing rcu_read_unlock() in debug_rt_mutex_print_deadlock() Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-10-05 13:20:24 +02:00
Thomas Gleixner	32cffdde4a	genirq: Fix fatfinered fixup really Putting the argument inside the quote does not really help. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-10-04 18:43:57 +02:00
Thomas Gleixner	908a328372	sched: Fix idle_cpu() On -rt we observed hackbench waking all 400 tasks to a single cpu. This is because of select_idle_sibling()'s interaction with the new ipi based wakeup scheme. The existing idle_cpu() test only checks to see if the current task on that cpu is the idle task, it does not take already queued tasks into account, nor does it take queued to be woken tasks into account. If the remote wakeup IPIs come hard enough, there won't be time to schedule away from the idle task, and would thus keep thinking the cpu was in fact idle, regardless of the fact that there were already several hundred tasks runnable. We couldn't reproduce on mainline, but there's no reason it couldn't happen. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-3o30p18b2paswpc9ohy2gltp@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-04 12:44:07 +02:00
Peter Zijlstra	fa14ff4acc	sched: Convert to struct llist Use the generic llist primitives. We had a private lockless list implementation in the scheduler in the wake-list code, now that we have a generic llist implementation that provides all required operations, switch to it. This patch is not expected to change any behavior. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Huang Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1315836353.26517.42.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-04 12:43:58 +02:00
Peter Zijlstra	924f8f5af3	llist: Add llist_next() So we don't have to expose the struct list_node member. Cc: Huang Ying <ying.huang@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1315836348.26517.41.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-04 12:43:53 +02:00
Huang Ying	38aaf8090d	irq_work: Use llist in the struct irq_work logic Use llist in irq_work instead of the lock-less linked list implementation in irq_work to avoid the code duplication. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1315461646-1379-6-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-04 12:43:49 +02:00
Ingo Molnar	22f92bacbe	Merge branch 'linus' into sched/core Merge reason: pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-10-04 11:09:08 +02:00
Vasily Averin	349d2895cc	ipv4: NET_IPV4_ROUTE_GC_INTERVAL removal removing obsoleted sysctl, ip_rt_gc_interval variable no longer used since 2.6.38 Signed-off-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-03 14:13:01 -04:00
Marc Zyngier	1e7c5fd294	genirq: percpu: allow interrupt type to be set at enable time As request_percpu_irq() doesn't allow for a percpu interrupt to have its type configured (it is generally impossible to configure it on all CPUs at once), add a 'type' argument to enable_percpu_irq(). This allows some low-level, board specific init code to be switched to a generic API. [ tglx: Added WARN_ON argument ] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-10-03 15:35:27 +02:00
Marc Zyngier	31d9d9b6d8	genirq: Add support for per-cpu dev_id interrupts The ARM GIC interrupt controller offers per CPU interrupts (PPIs), which are usually used to connect local timers to each core. Each CPU has its own private interface to the GIC, and only sees the PPIs that are directly connect to it. While these timers are separate devices and have a separate interrupt line to a core, they all use the same IRQ number. For these devices, request_irq() is not the right API as it assumes that an IRQ number is visible by a number of CPUs (through the affinity setting), but makes it very awkward to express that an IRQ number can be handled by all CPUs, and yet be a different interrupt line on each CPU, requiring a different dev_id cookie to be passed back to the handler. The _percpu_irq() functions is designed to overcome these limitations, by providing a per-cpu dev_id vector: int request_percpu_irq(unsigned int irq, irq_handler_t handler, const char devname, void __percpu percpu_dev_id); void free_percpu_irq(unsigned int, void __percpu ); int setup_percpu_irq(unsigned int irq, struct irqaction new); void remove_percpu_irq(unsigned int irq, struct irqaction act); void enable_percpu_irq(unsigned int irq); void disable_percpu_irq(unsigned int irq); The API has a number of limitations: - no interrupt sharing - no threading - common handler across all the CPUs Once the interrupt is requested using setup_percpu_irq() or request_percpu_irq(), it must be enabled by each core that wishes its local interrupt to be delivered. Based on an initial patch by Thomas Gleixner. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-10-03 15:35:26 +02:00
Wu Fengguang	9d823e8f6b	writeback: per task dirty rate limit Add two fields to task_struct. 1) account dirtied pages in the individual tasks, for accuracy 2) per-task balance_dirty_pages() call intervals, for flexibility The balance_dirty_pages() call interval (ie. nr_dirtied_pause) will scale near-sqrt to the safety gap between dirty pages and threshold. The main problem of per-task nr_dirtied is, if 1k+ tasks start dirtying pages at exactly the same time, each task will be assigned a large initial nr_dirtied_pause, so that the dirty threshold will be exceeded long before each task reached its nr_dirtied_pause and hence call balance_dirty_pages(). The solution is to watch for the number of pages dirtied on each CPU in between the calls into balance_dirty_pages(). If it exceeds ratelimit_pages (3% dirty threshold), force call balance_dirty_pages() for a chance to set bdi->dirty_exceeded. In normal situations, this safeguarding condition is not expected to trigger at all. On the sqrt in dirty_poll_interval(): It will serve as an initial guess when dirty pages are still in the freerun area. When dirty pages are floating inside the dirty control scope [freerun, limit], a followup patch will use some refined dirty poll interval to get the desired pause time. thresh-dirty (MB) sqrt 1 16 2 22 4 32 8 45 16 64 32 90 64 128 128 181 256 256 512 362 1024 512 The above table means, given 1MB (or 1GB) gap and the dd tasks polling balance_dirty_pages() on every 16 (or 512) pages, the dirty limit won't be exceeded as long as there are less than 16 (or 512) concurrent dd's. So sqrt naturally leads to less overheads and more safe concurrent tasks for large memory servers, which have large (thresh-freerun) gaps. peter: keep the per-CPU ratelimit for safeguarding the 1k+ tasks case CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Andrea Righi <andrea@betterlinux.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +08:00
Linus Torvalds	f72a209a3e	Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs	2011-10-01 08:37:25 -07:00
Ingo Molnar	048b718029	Merge branch 'rcu/next' of git://github.com/paulmckrcu/linux into core/rcu	2011-10-01 14:21:36 +02:00
Peter Zijlstra	d670ec1317	posix-cpu-timers: Cure SMP wobbles David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void chew_cpu(void arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2011-09-30 14:07:06 +02:00
Ram Pai	47ea91b405	Resource: fix wrong resource window calculation __find_resource() incorrectly returns a resource window which overlaps an existing allocated window. This happens when the parent's resource-window spans 0x00000000 to 0xffffffff and is entirely allocated to all its children resource-windows. __find_resource() looks for gaps in resource allocation among the children resource windows. When it encounters the last child window it blindly tries the range next to one allocated to the last child. Since the last child's window ends at 0xffffffff the calculation overflows, leading the algorithm to believe that any window in the range 0x0000000 to 0xfffffff is available for allocation. This leads to a conflicting window allocation. Michal Ludvig reported this issue seen on his platform. The following patch fixes the problem and has been verified by Michal. I believe this bug has been there for ages. It got exposed by git commit 2bbc6942273b ("PCI : ability to relocate assigned pci-resources") Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-09-29 20:04:34 -07:00
Serge Hallyn	d178bc3a70	user namespace: usb: make usb urbs user namespace aware (v2) Add to the dev_state and alloc_async structures the user namespace corresponding to the uid and euid. Pass these to kill_pid_info_as_uid(), which can then implement a proper, user-namespace-aware uid check. Changelog: Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace, uid, and euid each separately, pass a struct cred. Sep 26: Address Alan Stern's comments: don't define a struct cred at usbdev_open(), and take and put a cred at async_completed() to ensure it lasts for the duration of kill_pid_info_as_cred(). Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-09-29 13:13:08 -07:00
Ming Lei	2a5306cc5f	PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set Do not build kernel/trace/rpm-traces.c if CONFIG_PM_RUNTIME is not set, which avoids a build failure. [rjw: Added the changelog and modified the subject slightly.] Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-09-29 22:07:23 +02:00
Paul E. McKenney	afe24b122e	rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp() It is possible for the CPU that noted the end of the prior grace period to not need a new one, and therefore to decide to propagate ->completed throughout the rcu_node tree without starting another grace period. However, in so doing, it releases the root rcu_node structure's lock, which can allow some other CPU to start another grace period. The first CPU will be propagating ->completed in parallel with the second CPU initializing the rcu_node tree for the new grace period. In theory this is harmless, but in practice we need to keep things simple. This commit therefore moves the propagation of ->completed to rcu_report_qs_rsp(), and refrains from marking the old grace period as having been completed until it has finished doing this. This prevents anyone from starting a new grace period concurrently with marking the old grace period as having been completed. Of course, the optimization where a CPU needing a new grace period doesn't bother marking the old one completed is still in effect: In that case, the marking happens implicitly as part of initializing the new grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:49 -07:00
Paul E. McKenney	e90c53d3e2	rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states The purpose of rcu_needs_cpu_flush() was to iterate on pushing the current grace period in order to help the current CPU enter dyntick-idle mode. However, this can result in failures if the CPU starts entering dyntick-idle mode, but then backs out. In this case, the call to rcu_pending() from rcu_needs_cpu_flush() might end up announcing a non-existing quiescent state. This commit therefore removes rcu_needs_cpu_flush() in favor of letting the dyntick-idle machinery at the end of the softirq handler push the loop along via its call to rcu_pending(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:48 -07:00
Mike Galbraith	5b61b0baa9	rcu: Wire up RCU_BOOST_PRIO for rcutree RCU boost threads start life at RCU_BOOST_PRIO, while others remain at RCU_KTHREAD_PRIO. While here, change thread names to match other kthreads, and adjust rcu_yield() to not override the priority set by the user. This last change sets the stage for runtime changes to priority in the -rt tree. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:47 -07:00
Paul E. McKenney	ab8f11e5f6	rcu: Make rcu_torture_boost() exit loops at end of test One of the loops in rcu_torture_boost() fails to check kthread_should_stop(), and thus might be slowing or even stopping completion of rcutorture tests at rmmod time. This commit adds the kthread_should_stop() check to the offending loop. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:46 -07:00
Paul E. McKenney	93898fb1a3	rcu: Make rcu_torture_fqs() exit loops at end of test The rcu_torture_fqs() function can prevent the rcutorture tests from completing, resulting in a hang. This commit therefore ensures that rcu_torture_fqs() will exit its inner loops at the end of the test, and also applies the newish ULONG_CMP_LT() macro to time comparisons. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:44 -07:00
Paul E. McKenney	5342e269b2	rcu: Permit rt_mutex_unlock() with irqs disabled Create a separate lockdep class for the rt_mutex used for RCU priority boosting and enable use of rt_mutex_lock() with irqs disabled. This prevents RCU priority boosting from falling prey to deadlocks when someone begins an RCU read-side critical section in preemptible state, but releases it with an irq-disabled lock held. Unfortunately, the scheduler's runqueue and priority-inheritance locks still must either completely enclose or be completely enclosed by any overlapping RCU read-side critical section. This version removes a redundant local_irq_restore() noted by Yong Zhang. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:43 -07:00
Paul E. McKenney	06ae115a1d	rcu: Avoid having just-onlined CPU resched itself when RCU is idle CPUs set rdp->qs_pending when coming online to resolve races with grace-period start. However, this means that if RCU is idle, the just-onlined CPU might needlessly send itself resched IPIs. Adjust the online-CPU initialization to avoid this, and also to correctly cause the CPU to respond to the current grace period if needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Christian Hoffmann <email@christianhoffmann.info>	2011-09-28 21:38:42 -07:00
Paul E. McKenney	9bc8b5586f	rcu: Suppress NMI backtraces when stall ends before dump It is possible for an RCU CPU stall to end just as it is detected, in which case the current code will uselessly dump all CPU's stacks. This commit therefore checks for this condition and refrains from sending needless NMIs. And yes, the stall might also end just after we checked all CPUs and tasks, but in that case we would at least have given some clue as to which CPU/task was at fault. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:41 -07:00
Paul E. McKenney	037067a1b6	rcu: Prohibit grace periods during early boot Greater use of RCU during early boot (before the scheduler is operating) is causing RCU to attempt to start grace periods during that time, which in turn is resulting in both RCU and the callback functions attempting to use the scheduler before it is ready. This commit prevents these problems by prohibiting RCU grace periods until after the scheduler has spawned the first non-idle task. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:40 -07:00
Paul E. McKenney	82e78d80fc	rcu: Simplify unboosting checks Commit 7765be (Fix RCU_BOOST race handling current->rcu_read_unlock_special) introduced a new ->rcu_boosted field in the task structure. This is redundant because the existing ->rcu_boost_mutex will be non-NULL at any time that ->rcu_boosted is nonzero. Therefore, this commit removes ->rcu_boosted and tests ->rcu_boost_mutex instead. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:39 -07:00
Paul E. McKenney	5c51dd7349	rcu: Prevent early boot set_need_resched() from __rcu_pending() There isn't a whole lot of point in poking the scheduler before there are other tasks to switch to. This commit therefore adds a check for rcu_scheduler_fully_active in __rcu_pending() to suppress any pre-scheduler calls to set_need_resched(). The downside of this approach is additional runtime overhead in a reasonably hot code path. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:37 -07:00
Paul E. McKenney	4627e240df	rcu: Dump local stack if cannot dump all CPUs' stacks The trigger_all_cpu_backtrace() function is a no-op in architectures that do not define arch_trigger_all_cpu_backtrace. On such architectures, RCU CPU stall warning messages contain no stack trace information, which makes debugging quite difficult. This commit therefore substitutes dump_stack() for architectures that do not define arch_trigger_all_cpu_backtrace, so that at least the local CPU's stack is dumped as part of the RCU CPU stall warning message. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2011-09-28 21:38:36 -07:00

1 2 3 4 5 ...

12402 Commits