linux

iv/linux

Author	SHA1	Message	Date
Peter Zijlstra	2667de81f3	perf_counter: Allow for a wakeup watermark Currently we wake the mmap() consumer once every PAGE_SIZE of data and/or once event wakeup_events when specified. For high speed sampling this results in too many wakeups wrt. the buffer size, hence change this. We move the default wakeup limit to 1/4-th the buffer size, and provide for means to manually specify this limit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 22:08:26 +02:00
Peter Zijlstra	850bc73ffc	perf_counter: Do not throttle single swcounter events We can have swcounter events that contribute more than a single count per event, when used with a non-zero period, those can generate multiple events, which is when we need throttling. However, swcounter that contribute only a single count per event can only come as fast as we can run code, hence don't throttle them. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 22:08:25 +02:00
Li Zefan	5dd4de587f	softirq: add BLOCK_IOPOLL to softirq_to_name With BLOCK_IOPOLL_SOFTIRQ added, softirq_to_name[] and show_softirq_name() needs to be updated. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4AB20398.8070209@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-17 15:53:44 -04:00
Steven Rostedt	b375a11a23	tracing: switch function prints from %pf to %ps For direct function pointers (like what mcount provides) PowerPC64 requires the use of %ps, otherwise nothing is printed. This patch converts all prints of functions retrieved through mcount to use the %ps format from the %pf. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-17 15:53:40 -04:00
Ingo Molnar	45bd00d31d	Merge branch 'linus' into tracing/core Merge reason: Pick up kernel/softirq.c update for dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 20:53:10 +02:00
Peter Zijlstra	29cd8bae39	sched: Fix SD_POWERSAVING_BALANCE\|SD_PREFER_LOCAL vs SD_WAKE_AFFINE The SD_POWERSAVING_BALANCE\|SD_PREFER_LOCAL code can break out of the domain iteration early, making us miss the SD_WAKE_AFFINE bits. Fix this by continuing iteration until there is no need for a larger domain. This also cleans up the cgroup stuff a bit, but not having two update_shares() invocations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 10:40:31 +02:00
Peter Zijlstra	de69a80be3	sched: Stop buddies from hogging the system Clear buddies more agressively. The (theoretical, haven't actually observed any of this) problem is that when we do not select either buddy in pick_next_entity() because they are too far ahead of the left-most task, we do not clear the buddies. This means that as soon as we service the left-most task, these same buddies will be tried again on the next schedule. Now if the left-most task was a pure hog, it wouldn't have done any wakeups and it wouldn't have set buddies of its own. That leads to the old buddies dominating, which would lead to bad latencies. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 10:40:30 +02:00
Peter Zijlstra	ad4b78bbcb	sched: Add new wakeup preemption mode: WAKEUP_RUNNING Create a new wakeup preemption mode, preempt towards tasks that run shorter on avg. It sets next buddy to be sure we actually run the task we preempted for. Test results: root@twins:~# while :; do :; done & [1] 6537 root@twins:~# while :; do :; done & [2] 6538 root@twins:~# while :; do :; done & [3] 6539 root@twins:~# while :; do :; done & [4] 6540 root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 4750 usec Avg 497 usec Stdev 737 usec root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features root@twins:/home/peter# ./latt -c4 sleep 4 Entries: 48 (clients=4) Averages: ------------------------------ Max 14 usec Avg 5 usec Stdev 3 usec Disabled by default - needs more testing. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <new-submission>	2009-09-17 10:17:25 +02:00
Ingo Molnar	eb24073bc1	sched: Fix TASK_WAKING & loadaverage breakage Fix this: top - 21:54:00 up 2:59, 1 user, load average: 432512.33, 426421.74, 417432.74 Which happens because we now set TASK_WAKING before activate_task(). Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-17 09:51:20 +02:00
Linus Torvalds	ab86e5765d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev debugfs: Modify default debugfs directory for debugging pktcdvd. debugfs: Modified default dir of debugfs for debugging UHCI. debugfs: Change debugfs directory of IWMC3200 debugfs: Change debuhgfs directory of trace-events-sample.h debugfs: Fix mount directory of debugfs by default in events.txt hpilo: add poll f_op hpilo: add interrupt handler hpilo: staging for interrupt handling driver core: platform_device_add_data(): use kmemdup() Driver core: Add support for compatibility classes uio: add generic driver for PCI 2.3 devices driver-core: move dma-coherent.c from kernel to driver/base mem_class: fix bug mem_class: use minor as index instead of searching the array driver model: constify attribute groups UIO: remove 'default n' from Kconfig Driver core: Add accessor for device platform data Driver core: move dev_get/set_drvdata to drivers/base/dd.c Driver core: add new device to bus's list before probing	2009-09-16 08:27:10 -07:00
Linus Torvalds	6b7b352f21	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK	2009-09-16 07:46:34 -07:00
Peter Zijlstra	5a9b86f647	sched: Rename flags to wake_flags For consistencies sake, rename the argument (again). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 16:44:33 +02:00
Peter Zijlstra	5158f4e442	sched: Clean up the load_idx selection in select_task_rq_fair Clean up the code a little. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 16:44:32 +02:00
Peter Zijlstra	3b64089422	sched: Optimize cgroup vs wakeup a bit We don't need to call update_shares() for each domain we iterate, just got the largets one. However, we should call it before wake_affine() as well, so that that can use up-to-date values too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 16:44:32 +02:00
Atsushi Tsuji	b36461da2a	tracing: Fix minor bugs for __unregister_ftrace_function_probe Fix the condition of strcmp for "*". Also fix NULL pointer dereference when glob is NULL. Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com> LKML-Reference: <4AAF6726.5090905@bk.jp.nec.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-16 09:08:54 -04:00
Andi Kleen	6a46079cf5	HWPOISON: The high level memory error handler in the VM v7 Add the high level memory handler that poisons pages that got corrupted by hardware (typically by a two bit flip in a DIMM or a cache) on the Linux level. The goal is to prevent everyone from accessing these pages in the future. This done at the VM level by marking a page hwpoisoned and doing the appropriate action based on the type of page it is. The code that does this is portable and lives in mm/memory-failure.c To quote the overview comment: High level machine check handler. Handles pages reported by the hardware as being corrupted usually due to a 2bit ECC memory or cache failure. This focuses on pages detected as corrupted in the background. When the current CPU tries to consume corruption the currently running process can just be killed directly instead. This implies that if the error cannot be handled for some reason it's safe to just ignore it because no corruption has been consumed yet. Instead when that happens another machine check will happen. Handles page cache pages in various states. The tricky part here is that we can access any page asynchronous to other VM users, because memory failures could happen anytime and anywhere, possibly violating some of their assumptions. This is why this code has to be extremely careful. Generally it tries to use normal locking rules, as in get the standard locks, even if that means the error handling takes potentially a long time. Some of the operations here are somewhat inefficient and have non linear algorithmic complexity, because the data structures have not been optimized for this case. This is in particular the case for the mapping from a vma to a process. Since this case is expected to be rare we hope we can get away with this. There are in principle two strategies to kill processes on poison: - just unmap the data and wait for an actual reference before killing - kill as soon as corruption is detected. Both have advantages and disadvantages and should be used in different situations. Right now both are implemented and can be switched with a new sysctl vm.memory_failure_early_kill The default is early kill. The patch does some rmap data structure walking on its own to collect processes to kill. This is unusual because normally all rmap data structure knowledge is in rmap.c only. I put it here for now to keep everything together and rmap knowledge has been seeping out anyways Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu, Nick Piggin (who did a lot of great work) and others. Cc: npiggin@suse.de Cc: riel@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>	2009-09-16 11:50:15 +02:00
Andi Kleen	4db96cf077	HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process This allows processes to override their early/late kill behaviour on hardware memory errors. Typically applications which are memory error aware is better of with early kill (see the error as soon as possible), all others with late kill (only see the error when the error is really impacting execution) There's a global sysctl, but this way an application can set its specific policy. We're using two bits, one to signify that the process stated its intention and that I also made the prctl future proof by enforcing the unused arguments are 0. The state is inherited to children. Note this makes us officially run out of process flags on 32bit, but the next patch can easily add another field. Manpage patch will be supplied separately. Signed-off-by: Andi Kleen <ak@linux.intel.com>	2009-09-16 11:50:14 +02:00
Ingo Molnar	51e0304ce6	sched: Implement a gentler fair-sleepers feature Add back FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS. FAIR_SLEEPERS is the old logic: credit sleepers with their sleep time. GENTLE_FAIR_SLEEPERS dampens this a bit: 50% of their sleep time gets credited. The hope here is to still give the benefits of fair-sleepers logic (quick wakeups, etc.) while not allow them to have 100% of their sleep time as if they were running. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 09:05:20 +02:00
Peter Zijlstra	59abf02644	sched: Add SD_PREFER_LOCAL And turn it on for NUMA and MC domains. This improves locality in balancing decisions by keeping up to capacity amount of tasks local before looking for idle CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE is set.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 08:42:40 +02:00
Jens Axboe	cb684b5bcd	block: fix linkage problem with blk_iopoll and !CONFIG_BLOCK kernel/built-in.o:(.data+0x17b0): undefined reference to `blk_iopoll_enabled' Since the extern declaration makes the compile work, but the actual symbol is missing when block/blk-iopoll.o isn't linked in. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-15 21:53:11 +02:00
Peter Zijlstra	e69b0f1b41	sched: Add a few SYNC hint knobs to play with Currently we use overlap to weaken the SYNC hint, but allow it to set the hint as well. echo NO_SYNC_WAKEUP > /debug/sched_features echo SYNC_MORE > /debug/sched_features preserves pipe-test behaviour without using the WF_SYNC hint. Worth playing with on more workloads... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 19:47:23 +02:00
Peter Zijlstra	63859d4fe4	sched: Fix sync wakeups again The sync argument rename to introduce WF_* broke stuff by missing a local alias for an argument in __wake_up_common, fix it by using the more descriptive wake_flags name. This restores WF_SYNC propagation, which fixes wake_affine() behaviour, which fixes pipe-test. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 19:47:22 +02:00
Linus Torvalds	723e9db7a4	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c	2009-09-15 09:51:09 -07:00
Ming Lei	a56af87648	driver-core: move dma-coherent.c from kernel to driver/base Placing dma-coherent.c in driver/base is better than in kernel, since it contains code to do per-device coherent dma memory handling. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-15 09:50:47 -07:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Linus Torvalds	f199fd9906	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Fix buffer overflow in perf_copy_attr()	2009-09-15 09:34:27 -07:00
Steven Rostedt	6ca6cca31d	tracing: optimize global_trace_clock cachelines The prev_trace_clock_time is only read or written to when the trace_clock_lock is taken. For better perfomance, they should share the same cache line. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-09-15 12:24:22 -04:00
Linus Torvalds	227423904c	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Fix cacheflush address in change_page_attr_set_clr() mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set x86: Fix earlyprintk=dbgp for machines without NX x86, pat: Sanity check remap_pfn_range for RAM region x86, pat: Lookup the protection from memtype list on vm_insert_pfn() x86, pat: Add lookup_memtype to get the current memtype of a paddr x86, pat: Use page flags to track memtypes of RAM pages x86, pat: Generalize the use of page flag PG_uncached x86, pat: Add rbtree to do quick lookup in memtype tracking x86, pat: Add PAT reserve free to io_mapping* APIs x86, pat: New i/f for driver to request memtype for IO regions x86, pat: ioremap to follow same PAT restrictions as other PAT users x86, pat: Keep identity maps consistent with mmaps even when pat_disabled x86, mtrr: make mtrr_aps_delayed_init static bool x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled x86: Fix an incorrect argument of reserve_bootmem() x86: Fix system crash when loading with "reservetop" parameter	2009-09-15 09:19:38 -07:00
Linus Torvalds	1aaf2e5913	Merge branch 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, intel_txt: clean up the impact on generic code, unbreak non-x86 x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE x86, intel_txt: Fix typos in Kconfig help x86, intel_txt: Factor out the code for S3 setup x86, intel_txt: tboot.c needs <asm/fixmap.h> intel_txt: Force IOMMU on for Intel TXT launch x86, intel_txt: Intel TXT Sx shutdown support x86, intel_txt: Intel TXT reboot/halt shutdown support x86, intel_txt: Intel TXT boot support	2009-09-15 09:19:20 -07:00
Ashwin Chaugule	7403f41f19	hrtimer: Eliminate needless reprogramming of clock events device On NOHZ systems the following timers, - tick_nohz_restart_sched_tick (tick_sched_timer) - hrtimer_start (tick_sched_timer) are reprogramming the clock events device far more often than needed. No specific test case was required to observe this effect. This occurres because there was no check to see if the currently removed or restarted hrtimer was: 1) the one which previously armed the clock events device. 2) going to be replaced by another timer which has the same expiry time. Avoid the reprogramming in hrtimer_force_reprogram when the new expiry value which is evaluated from the clock bases is equal to cpu_base->expires_next. This results in faster application startup time by ~4%. [ tglx: simplified initial solution ] Signed-off-by: Ashwin Chaugule <ashwinc@quicinc.com> LKML-Reference: <4AA00165.90609@codeaurora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-09-15 17:09:44 +02:00
Peter Zijlstra	a7558e0105	sched: Add WF_FORK Avoid the cache buddies from biasing the time distribution away from fork()ers. Normally the next buddy will be the preferred scheduling target, but this makes fork()s prefer to run the new child, whereas we prefer to run the parent, since that will generate more work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:31 +02:00
Peter Zijlstra	7d47872146	sched: Rename sync arguments In order to extend the functions to have more than 1 flag (sync), rename the argument to flags, and explicitly define a WF_ space for individual flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:30 +02:00
Peter Zijlstra	0763a660a8	sched: Rename select_task_rq() argument In order to be able to rename the sync argument, we need to rename the current flag argument. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:29 +02:00
Peter Zijlstra	8e6598af3f	sched: Feature to disable APERF/MPERF cpu_power I suspect a feed-back loop between cpuidle and the aperf/mperf cpu_power bits, where when we have idle C-states lower the ratio, which leads to lower cpu_power and then less load, which generates more idle time, etc.. Put in a knob to disable it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:28 +02:00
Peter Zijlstra	d6a59aa3a2	sched: Provide arch_scale_freq_power Provide an ach specific hook for cpufreq based scaling of cpu_power. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ego@in.ibm.com: spotting bugs] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:24 +02:00
Mike Galbraith	0ec9fab3d1	sched: Improve latencies and throughput Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:16 +02:00
Peter Zijlstra	78e7ed53c9	sched: Tweak wake_idx When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:07 +02:00
Peter Zijlstra	d7c33c4930	sched: Fix task affinity for select_task_rq_fair While merging select_task_rq_fair() and sched_balance_self() I made a mistake that leads to testing the wrong task affinty. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:07 +02:00
Peter Zijlstra	83f54960c1	sched: for_each_domain() vs RCU for_each_domain() uses RCU to serialize the sched_domains, except it doesn't actually use rcu_read_lock() and instead relies on disabling preemption -> FAIL. XXX: audit other sched_domain code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:06 +02:00
Peter Zijlstra	ae154be1f3	sched: Weaken SD_POWERSAVINGS_BALANCE One of the problems of power-saving balancing is that under certain scenarios it is too slow and allows tons of real work to pile up. Avoid this by ignoring the powersave stuff when there's real work to be done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:06 +02:00
Peter Zijlstra	c88d591089	sched: Merge select_task_rq_fair() and sched_balance_self() The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:05 +02:00
Peter Zijlstra	e9c8431185	sched: Add TASK_WAKING We're going to want to drop rq->lock in try_to_wake_up() for a longer period of time, however we also want to deal with concurrent waking of the same task, which is currently handled by holding rq->lock. So introduce a new TASK state, namely TASK_WAKING, which indicates someone is already waking the task (other wakers will fail p->state & state). We also keep preemption disabled over the whole ttwu(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:05 +02:00
Peter Zijlstra	5f3edc1b1e	sched: Hook sched_balance_self() into sched_class::select_task_rq() Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:04 +02:00
Peter Zijlstra	aaee1203ca	sched: Move sched_balance_self() into sched_fair.c Move the sched_balance_self() code into sched_fair.c This facilitates the merger of sched_balance_self() and sched_fair::select_task_rq(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:04 +02:00
Peter Zijlstra	f5f08f39ee	sched: Move code around In preparation to other code movement, move weighted_cpuload(), source_load() and target_load() before the class includes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:03 +02:00
Peter Zijlstra	e26af0e8b2	sched: Add come comments to the sched features Add text... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:03 +02:00
Mike Galbraith	3cb63d527f	sched: Complete buddy switches Add a NEXT_BUDDY feature flag to aid in debugging. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:02 +02:00
Peter Zijlstra	e6b1b2c9c0	sched: Split WAKEUP_OVERLAP It consists of two conditions, split them out in separate toggles so we can test them independently. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:02 +02:00
Peter Zijlstra	b78bb868c5	sched: Fix double_rq_lock() compile warning Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:01 +02:00
Thomas Gleixner	12e09337fe	time: Prevent 32 bit overflow with set_normalized_timespec() set_normalized_timespec() nsec argument is of type long. The recent timekeeping changes of ktime_get_ts() feed ts->tv_nsec + tomono.tv_nsec + nsecs to set_normalized_timespec(). On 32 bit machines that sum can be larger than (1 << 31) and therefor result in a negative value which screws up the result completely. Make the nsec argument of set_normalized_timespec() s64 to fix the problem at hand. This also prevents similar problems for future users of set_normalized_timespec(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Carsten Emde <carsten.emde@osadl.org> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>	2009-09-15 10:17:30 +02:00

... 2 3 4 5 6 ...

8288 Commits