linux

iv/linux

Author	SHA1	Message	Date
Hannes Frederic Sowa	c4b2c0c5f6	static_key: WARN on usage before jump_label_init was called Usage of the static key primitives to toggle a branch must not be used before jump_label_init() is called from init/main.c. jump_label_init reorganizes and wires up the jump_entries so usage before that could have unforeseen consequences. Following primitives are now checked for correct use: * static_key_slow_inc * static_key_slow_dec * static_key_slow_dec_deferred * jump_label_rate_limit The x86 architecture already checks this by testing if the default_nop was already replaced with an optimal nop or with a branch instruction. It will panic then. Other architectures don't check for this. Because we need to relax this check for the x86 arch to allow code to transition from default_nop to the enabled state and other architectures did not check for this at all this patch introduces checking on the static_key primitives in a non-arch dependent manner. All checked functions are considered slow-path so the additional check does no harm to performance. The warnings are best observed with earlyprintk. Based on a patch from Andi Kleen. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-19 19:45:35 -04:00
Greg Kroah-Hartman	a7204d72db	Merge 3.12-rc6 into driver-core-next We want these fixes here too. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-19 13:05:38 -07:00
Namhyung Kim	29ad23b004	ftrace: Add set_graph_notrace filter The set_graph_notrace filter is analogous to set_ftrace_notrace and can be used for eliminating uninteresting part of function graph trace output. It also works with set_graph_function nicely. # cd /sys/kernel/debug/tracing/ # echo do_page_fault > set_graph_function # perf ftrace live true 2) \| do_page_fault() { 2) \| __do_page_fault() { 2) 0.381 us \| down_read_trylock(); 2) 0.055 us \| __might_sleep(); 2) 0.696 us \| find_vma(); 2) \| handle_mm_fault() { 2) \| handle_pte_fault() { 2) \| __do_fault() { 2) \| filemap_fault() { 2) \| find_get_page() { 2) 0.033 us \| __rcu_read_lock(); 2) 0.035 us \| __rcu_read_unlock(); 2) 1.696 us \| } 2) 0.031 us \| __might_sleep(); 2) 2.831 us \| } 2) \| _raw_spin_lock() { 2) 0.046 us \| add_preempt_count(); 2) 0.841 us \| } 2) 0.033 us \| page_add_file_rmap(); 2) \| _raw_spin_unlock() { 2) 0.057 us \| sub_preempt_count(); 2) 0.568 us \| } 2) \| unlock_page() { 2) 0.084 us \| page_waitqueue(); 2) 0.126 us \| __wake_up_bit(); 2) 1.117 us \| } 2) 7.729 us \| } 2) 8.397 us \| } 2) 8.956 us \| } 2) 0.085 us \| up_read(); 2) + 12.745 us \| } 2) + 13.401 us \| } ... # echo handle_mm_fault > set_graph_notrace # perf ftrace live true 1) \| do_page_fault() { 1) \| __do_page_fault() { 1) 0.205 us \| down_read_trylock(); 1) 0.041 us \| __might_sleep(); 1) 0.344 us \| find_vma(); 1) 0.069 us \| up_read(); 1) 4.692 us \| } 1) 5.311 us \| } ... Link: http://lkml.kernel.org/r/1381739066-7531-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-18 22:23:16 -04:00
Namhyung Kim	6a10108bdb	ftrace: Narrow down the protected area of graph_lock The parser set up is just a generic utility that uses local variables allocated by the function. There's no need to hold the graph_lock for this set up. This also makes the code simpler. Link: http://lkml.kernel.org/r/1381739066-7531-4-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-18 22:20:33 -04:00
Namhyung Kim	faf982a60f	ftrace: Introduce struct ftrace_graph_data The struct ftrace_graph_data is for generalizing the access to set_graph_function file. This is a preparation for adding support to set_graph_notrace. Link: http://lkml.kernel.org/r/1381739066-7531-3-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-18 22:17:51 -04:00
Namhyung Kim	9aa72b4bf8	ftrace: Get rid of ftrace_graph_filter_enabled The ftrace_graph_filter_enabled means that user sets function filter and it always has same meaning of ftrace_graph_count > 0. Link: http://lkml.kernel.org/r/1381739066-7531-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-18 22:15:25 -04:00
Steven Rostedt	057db8488b	tracing: Fix potential out-of-bounds in trace_get_user() Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-18 21:02:56 -04:00
Patrick Palka	891292a767	time: Fix signedness bug in sysfs_get_uname() and its callers sysfs_get_uname() is erroneously declared as returning size_t even though it may return a negative value, specifically -EINVAL. Its callers then check whether its return value is less than zero and indeed that is never the case for size_t. This patch changes sysfs_get_uname() to return ssize_t and makes sure its callers use ssize_t accordingly. Signed-off-by: Patrick Palka <patrick@parcs.ath.cx> [jstultz: Didn't apply cleanly, as a similar partial fix was also applied so had to resolve the collisions] Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-10-18 16:45:58 -07:00
Xie XiuQi	b7bc50e451	timekeeping: Fix some trivial typos in comments Fix some typos in timekeeping comments. Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> [jstultz: Commit message tweaks] Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-10-18 16:30:17 -07:00
KOSAKI Motohiro	98d6f4dd84	alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist Fedora Ruby maintainer reported latest Ruby doesn't work on Fedora Rawhide on ARM. (http://bugs.ruby-lang.org/issues/9008) Because of, commit 1c6b39ad3f (alarmtimers: Return -ENOTSUPP if no RTC device is present) intruduced to return ENOTSUPP when clock_get{time,res} can't find a RTC device. However this is incorrect. First, ENOTSUPP isn't exported to userland (ENOTSUP or EOPNOTSUP are the closest userland equivlents). Second, Posix and Linux man pages agree that clock_gettime and clock_getres should return EINVAL if clk_id argument is invalid. While the arugment that the clockid is valid, but just not supported on this hardware could be made, this is just a technicality that doesn't help userspace applicaitons, and only complicates error handling. Thus, this patch changes the code to use EINVAL. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable <stable@vger.kernel.org> #3.0 and up Reported-by: Vit Ondruch <v.ondruch@tiscali.cz> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [jstultz: Tweaks to commit message to include full rational] Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-10-18 16:23:58 -07:00
Rafael J. Wysocki	7bc9b1cffc	PM / Hibernate: Use bool for boolean fields of struct snapshot_data The snapshot_data structure used internally by the hibernate user space interface code in user.c has three char fields that are used to store boolean values. Change their data type to bool and use true and false instead of 1 and 0, respectively, in assignments involving those fields. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-18 22:20:40 +02:00
Tetsuo Handa	b0267507df	mutex: Avoid gcc version dependent __builtin_constant_p() usage Commit 040a0a37 ("mutex: Add support for wound/wait style locks") used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot handle such expression correctly, leading to boot failure when built with CONFIG_DEBUG_MUTEXES=y. Fix it by explicitly passing a bool which tells whether p != NULL or not. [ PeterZ: This is a sad patch, but provided it actually generates similar code I suppose its the best we can do bar whole sale deprecating gcc-3. ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: peterz@infradead.org Cc: imirkin@alum.mit.edu Cc: daniel.vetter@ffwll.ch Cc: robdclark@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-18 21:58:54 +02:00
Xie XiuQi	1e4cfed127	timekeeping: Fix some trivial typos in comments Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-10-18 14:50:02 +02:00
Xie XiuQi	f788e7bf05	irq: Fix some trivial typos in comments Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> [jkosina@suse.cz: fix 'explicitly', noticed by Randy Dunlap] Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-10-18 14:49:30 +02:00
Benoit Goby	70fea60d88	PM / Sleep: Detect device suspend/resume lockup and log event Rather than hard-lock the kernel, dump the suspend/resume thread stack and panic() to capture a message in pstore when a driver takes too long to suspend/resume. Default suspend/resume watchdog timeout is set to 12 seconds to be longer than the usbhid 10 second timeout, but could be changed at compile time. Exclude from the watchdog the time spent waiting for children that are resumed asynchronously and time every device, whether or not they resumed synchronously. This patch is targeted for mobile devices where a suspend/resume lockup could cause a system reboot. Information about failing device can be retrieved in subsequent boot session by mounting pstore and inspecting the log. Laptops with EFI-enabled pstore could also benefit from this feature. The hardware watchdog timer is likely suspended during this time and couldn't be relied upon. The soft-lockup detector would eventually tell that tasks are not scheduled, but would provide little context as to why. The patch hence uses system timer and assumes it is still active while the devices are suspended/resumed. This feature can be enabled/disabled during kernel configuration. This change is based on earlier work by San Mehat. Signed-off-by: Benoit Goby <benoit@android.com> Signed-off-by: Zoran Markovic <zoran.markovic@linaro.org> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-18 13:33:08 +02:00
Ingo Molnar	0e95c69bde	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney. Major changes: " 1. Update RCU documentation. These were posted to LKML at http://article.gmane.org/gmane.linux.kernel/1566994. 2. Miscellaneous fixes. These were posted to LKML at http://article.gmane.org/gmane.linux.kernel/1567027. 3. Grace-period-related changes, primarily to aid in debugging, inspired by a -rt debugging session. These were posted to LKML at http://article.gmane.org/gmane.linux.kernel/1567076. 4. Idle entry/exit changes, primarily to address issues located by Tibor Billes. These were posted to LKML at http://article.gmane.org/gmane.linux.kernel/1567096. 5. Code reorganization moving RCU's source files from kernel to kernel/rcu. This was posted to LKML at http://article.gmane.org/gmane.linux.kernel/1577344." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-18 12:46:14 +02:00
Andy Shevchenko	d4f7ecf728	PM / QoS: simplify pm_qos_power_write() Let kstrtos32_from_user() do the necessary calls and checks. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-17 22:52:20 +02:00
Stephane Eranian	3090ffb5a2	perf: Disable PERF_RECORD_MMAP2 support For now, we disable the extended MMAP record support (MMAP2). We have identified cases where it would not report the correct mapping information, clone(VM_CLONE) but with separate pids. We will revisit the support once we find a solution for this case. The patch changes the kernel to return EINVAL if attr->mmap2 is set. The patch also modifies the perf tool to use regular PERF_RECORD_MMAP for synthetic events and it also prevents the tool from requesting attr->mmap2 mode because the kernel would reject it. The support will be revisited once the kenrel interface is updated. In V2, we reduce the patch to the strict minimum. In V3, we avoid calling perf_event_open() with mmap2 set because we know it will fail and require fallback retry. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2013-10-17 16:27:14 -03:00
Frantisek Hrbata	eb3057df73	kernel: add support for init_array constructors This adds the .init_array section as yet another section with constructors. This is needed because gcc could add __gcov_init calls to .init_array or .ctors section, depending on gcc (and binutils) version . v2: - reuse mod->ctors for .init_array section for modules, because gcc uses .ctors or .init_array, but not both at the same time v3: - fail to load if that does happen somehow. Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2013-10-17 15:05:17 +10:30
Ben Hutchings	8eaede49df	sysrq: Allow magic SysRq key functions to be disabled through Kconfig Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE) into a Kconfig variable. Original version by Bastian Blank <waldi@debian.org>. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-10-16 13:01:44 -07:00
Oleg Nesterov	c2d816443e	sched/wait: Introduce prepare_to_wait_event() Add the new helper, prepare_to_wait_event() which should only be used by ___wait_event(). prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state() is true, otherwise it does prepare_to_wait/exclusive. This allows to uninline the signal-pending checks in wait_event() macros. Also, it can initialize wait->private/func. We do not care if they were already initialized, the values are the same. This also shaves a couple of insns from the inlined code. This obviously makes prepare_() path a little bit slower, but we are likely going to sleep anyway, so I think it makes sense to shrink .text: text data bss dec hex filename =================================================== before: 5126092 2959248 10117120 18202460 115bf5c vmlinux after: 5124618 2955152 10117120 18196890 115a99a vmlinux on my build. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20131007161824.GA29757@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-16 14:22:18 +02:00
Peter Zijlstra	6acce3ef84	sched: Remove get_online_cpus() usage Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-16 14:22:16 +02:00
Peter Zijlstra	746023159c	sched: Fix race in migrate_swap_stop() There is a subtle race in migrate_swap, when task P, on CPU A, decides to swap places with task T, on CPU B. Task P: - call migrate_swap Task T: - go to sleep, removing itself from the runqueue Task P: - double lock the runqueues on CPU A & B Task T: - get woken up, place itself on the runqueue of CPU C Task P: - see that task T is on a runqueue, and pretend to remove it from the runqueue on CPU B Now CPUs B & C both have corrupted scheduler data structures. This patch fixes it, by holding the pi_lock for both of the tasks involved in the migrate swap. This prevents task T from waking up, and placing itself onto another runqueue, until after migrate_swap has released all locks. This means that, when migrate_swap checks, task T will be either on the runqueue where it was originally seen, or not on any runqueue at all. Migrate_swap deals correctly with of those cases. Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: hannes@cmpxchg.org Cc: aarcange@redhat.com Cc: srikar@linux.vnet.ibm.com Cc: tglx@linutronix.de Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/20131010181722.GO13848@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-16 14:22:14 +02:00
Peter Zijlstra	7c3f2ab7b8	sched/rt: Add missing rmb() While discussing the proposed SCHED_DEADLINE patches which in parts mimic the existing FIFO code it was noticed that the wmb in rt_set_overloaded() didn't have a matching barrier. The only site using rt_overloaded() to test the rto_count is pull_rt_task() and we should issue a matching rmb before then assuming there's an rto_mask bit set. Without that smp_rmb() in there we could actually miss seeing the rto_mask bit. Also, change to using smp_[wr]mb(), even though this is SMP only code; memory barriers without smp_ always make me think they're against hardware of some sort. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: vincent.guittot@linaro.org Cc: luca.abeni@unitn.it Cc: bruce.ashfield@windriver.com Cc: dhaval.giani@gmail.com Cc: rostedt@goodmis.org Cc: hgu1972@gmail.com Cc: oleg@redhat.com Cc: fweisbec@gmail.com Cc: darren@dvhart.com Cc: johan.eker@ericsson.com Cc: p.faure@akatech.ch Cc: paulmck@linux.vnet.ibm.com Cc: raistlin@linux.it Cc: claudio@evidence.eu.com Cc: insop.song@gmail.com Cc: michael@amarulasolutions.com Cc: liming.wang@windriver.com Cc: fchecconi@gmail.com Cc: jkacur@redhat.com Cc: tommaso.cucinotta@sssup.it Cc: Juri Lelli <juri.lelli@gmail.com> Cc: harald.gustafsson@ericsson.com Cc: nicola.manica@disi.unitn.it Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-16 14:22:13 +02:00
Paul E. McKenney	4102adab91	rcu: Move RCU-related source code to kernel/rcu directory Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org>	2013-10-15 12:53:31 -07:00
Paul E. McKenney	2529973309	Merge branch 'idle.2013.09.25a' into HEAD idle.2013.09.25a: Topic branch for idle entry-/exit-related changes.	2013-10-15 12:49:59 -07:00
Paul E. McKenney	25e03a74e4	Merge branch 'gp.2013.09.25a' into HEAD gp.2013.09.25a: Topic branch for grace-period updates.	2013-10-15 12:47:04 -07:00
Ingo Molnar	426ee9e3bb	Linux 3.12-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAABAgAGBQJSWyGgAAoJEHm+PkMAQRiGA8MH/35upHXImoRCsI5uC1qvHJtI QvQAhDFxoEXbFUKeaYgTfcM8q9FgnqnfjhLf8eYa4Q7tDZeqLXOE8bkI807mSZMl yECr3jcwlV+zyhV2MP/HdwTjzy25bwxLM3Zy43S7QROrYoMHZYznil/QPfyMATCJ XLPuXZC1FtuUen89n4BoDIuL8QaVrIR/zLqFklAQcdTcGpLHSOwFtH8gb2WaRLhv +4IikFRFgTNZiMR5tP0GPc6UH6TVTvRb4QKSqqa7J8OmfAIvOzAUdhqWSPOIwWwt Z/+JFxFDczAcNmpv4gE6jkgc2vR8CVeHsvh0j61RDSFObBWspwk337CSyUZxYSA= =w4VQ -----END PGP SIGNATURE----- Merge tag 'v3.12-rc5' into perf/core Merge Linux v3.12-rc5, to pick up the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-15 07:05:18 +02:00
Geert Uytterhoeven	002ace782c	kexec: Typo s/the/then/ Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-10-14 15:37:41 +02:00
Kamalesh Babulal	ed1b773286	sched/fair: Fix trivial typos in comments - 'load_icx' => 'load_idx' - 'calculcate_imbalance' => 'calculate_imbalance' Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1381685775-3544-1-git-send-email-kamalesh@linux.vnet.ibm.com [ Also, don't capitalize 'idle' unnecessarily. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-14 09:22:55 +02:00
Anjana V Kumar	ea84753c98	cgroup: fix to break the while loop in cgroup_attach_task() correctly Both Anjana and Eunki reported a stall in the while_each_thread loop in cgroup_attach_task(). It's because, when we attach a single thread to a cgroup, if the cgroup is exiting or is already in that cgroup, we won't break the loop. If the task is already in the cgroup, the bug can lead to another thread being attached to the cgroup unexpectedly: # echo 5207 > tasks # cat tasks 5207 # echo 5207 > tasks # cat tasks 5207 5215 What's worse, if the task to be attached isn't the leader of the thread group, we might never exit the loop, hence cpu stall. Thanks for Oleg's analysis. This bug was introduced by commit 081aa458c38ba576bdd4265fc807fa95b48b9e79 ("cgroup: consolidate cgroup_attach_task() and cgroup_attach_proc()") [ lizf: - fixed the first continue, pointed out by Oleg, - rewrote changelog. ] Cc: <stable@vger.kernel.org> # 3.9+ Reported-by: Eunki Kim <eunki_kim@samsung.com> Reported-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Anjana V Kumar <anjanavk12@gmail.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-10-13 16:07:10 -04:00
Ramkumar Ramachandra	62e947cb0c	sched: Remove bogus parameter in structured comment The balance parameter was removed by 23f0d20 ("sched: Factor out code to should_we_balance()", 2013-08-06). Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381400433-2030-1-git-send-email-artagnon@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-12 19:01:24 +02:00
Ingo Molnar	ec0ad3d01f	Merge branch 'core/urgent' into sched/core Merge in asm goto fix, to be able to apply the asm/rmwcc.h fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-11 07:39:37 +02:00
Dong Zhu	2cb763614c	timer stats: Add a 'Collection: active/inactive' line to timer usage statistics We can enable/disable timer statistics collection via: echo [1\|0] > /proc/timers_stats and it would be nice if apps had the ability to check what the current collection status is. This patch adds a 'Collection: active/inactive' line to display the current timer collection status. Also bump up the timer stats version to v0.3. Signed-off-by: Dong Zhu <bluezhudong@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/20131010075618.GH2139@zhudong.nay.redhat.com [ Improved the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-10 09:59:25 +02:00
Ingo Molnar	8a749de5e3	Merge branch 'fortglx/3.13/time' of git://git.linaro.org/people/jstultz/linux into timers/core Pull more timekeeping items for v3.13 from John Stultz: * Small cleanup in the clocksource code. * Fix for rtc-pl031 to let it work with alarmtimers. * Move arm64 to using the generic sched_clock framework & resulting cleanup in the generic sched_clock code. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-10 06:25:23 +02:00
Wang YanQing	b9be6d026d	tracing: Show more exact help information about snapshot The current "help" that comes out of the snapshot file when it is not allocated looks like this: # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Echo 2 says that it does not allocate the buffer, which is correct, but to be more consistent with "echo 0" it should also state that it does not free. Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-10-09 21:38:22 -04:00
Stephen Boyd	b4042ceaab	sched_clock: Remove sched_clock_func() hook Nobody is using sched_clock_func() anymore now that sched_clock supports up to 64 bits. Remove the hook so that new code only uses sched_clock_register(). Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org>	2013-10-09 16:54:39 -07:00
Peter Zijlstra	3354781a21	sched/numa: Reflow task_numa_group() to avoid a compiler warning Reflow the function a bit because GCC gets confused: kernel/sched/fair.c: In function ‘task_numa_fault’: kernel/sched/fair.c:1448:3: warning: ‘my_grp’ may be used uninitialized in this function [-Wmaybe-uninitialized] kernel/sched/fair.c:1463:27: note: ‘my_grp’ was declared here Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-6ebt6x7u64pbbonq1khqu2z9@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:27 +02:00
Rik van Riel	2739d3eef3	sched/numa: Retry task_numa_migrate() periodically Short spikes of CPU load can lead to a task being migrated away from its preferred node for temporary reasons. It is important that the task is migrated back to where it belongs, in order to avoid migrating too much memory to its new location, and generally disturbing a task's NUMA location. This patch fixes NUMA placement for 4 specjbb instances on a 4 node system. Without this patch, things take longer to converge, and processes are not always completely on their own node. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-64-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:25 +02:00
Mel Gorman	989348b5fc	sched/numa: Use unsigned longs for numa group fault stats As Peter says "If you're going to hold locks you can also do away with all that atomic_long_*() nonsense". Lock aquisition moved slightly to protect the updates. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-63-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:23 +02:00
Rik van Riel	de1c9ce6f0	sched/numa: Skip some page migrations after a shared fault Shared faults can lead to lots of unnecessary page migrations, slowing down the system, and causing private faults to hit the per-pgdat migration ratelimit. This patch adds sysctl numa_balancing_migrate_deferred, which specifies how many shared page migrations to skip unconditionally, after each page migration that is skipped because it is a shared fault. This reduces the number of page migrations back and forth in shared fault situations. It also gives a strong preference to the tasks that are already running where most of the memory is, and to moving the other tasks to near the memory. Testing this with a much higher scan rate than the default still seems to result in fewer page migrations than before. Memory seems to be somewhat better consolidated than previously, with multi-instance specjbb runs on a 4 node system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:21 +02:00
Rik van Riel	1e3646ffc6	mm: numa: Revert temporarily disabling of NUMA migration With the scan rate code working (at least for multi-instance specjbb), the large hammer that is "sched: Do not migrate memory immediately after switching node" can be replaced with something smarter. Revert temporarily migration disabling and all traces of numa_migrate_seq. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-61-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:20 +02:00
Mel Gorman	930aa174fc	sched/numa: Remove the numa_balancing_scan_period_reset sysctl With scan rate adaptions based on whether the workload has properly converged or not there should be no need for the scan period reset hammer. Get rid of it. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:18 +02:00
Rik van Riel	04bb2f9475	sched/numa: Adjust scan rate in task_numa_placement Adjust numa_scan_period in task_numa_placement, depending on how much useful work the numa code can do. The more local faults there are in a given scan window the longer the period (and hence the slower the scan rate) during the next window. If there are excessive shared faults then the scan period will decrease with the amount of scaling depending on whether the ratio of shared/private faults. If the preferred node changes then the scan rate is reset to recheck if the task is properly placed. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-59-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:16 +02:00
Mel Gorman	3e6a9418cf	sched/numa: Take false sharing into account when adapting scan rate Scan rate is altered based on whether shared/private faults dominated. task_numa_group() may detect false sharing but that information is not taken into account when adapting the scan rate. Take it into account. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-58-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:14 +02:00
Rik van Riel	dabe1d9924	sched/numa: Be more careful about joining numa groups Due to the way the pid is truncated, and tasks are moved between CPUs by the scheduler, it is possible for the current task_numa_fault to group together tasks that do not actually share memory together. This patch adds a few easy sanity checks to task_numa_fault, joining tasks together if they share the same tsk->mm, or if the fault was on a page with an elevated mapcount, in a shared VMA. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-57-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:12 +02:00
Peter Zijlstra	0ec8aa00f2	sched/numa: Avoid migrating tasks that are placed on their preferred node This patch classifies scheduler domains and runqueues into types depending the number of tasks that are about their NUMA placement and the number that are currently running on their preferred node. The types are regular: There are tasks running that do not care about their NUMA placement. remote: There are tasks running that care about their placement but are currently running on a node remote to their ideal placement all: No distinction To implement this the patch tracks the number of tasks that are optimally NUMA placed (rq->nr_preferred_running) and the number of tasks running that care about their placement (nr_numa_running). The load balancer uses this information to avoid migrating idea placed NUMA tasks as long as better options for load balancing exists. For example, it will not consider balancing between a group whose tasks are all perfectly placed and a group with remote tasks. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1381141781-10992-56-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:10 +02:00
Rik van Riel	ca28aa53dd	sched/numa: Fix task or group comparison This patch separately considers task and group affinities when searching for swap candidates during NUMA placement. If tasks are part of the same group, or no group at all, the task weights are considered. Some hysteresis is added to prevent tasks within one group from getting bounced between NUMA nodes due to tiny differences. If tasks are part of different groups, the code compares group weights, in order to favor grouping task groups together. The patch also changes the group weight multiplier to be the same as the task weight multiplier, since the two are no longer added up like before. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-55-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:08 +02:00
Rik van Riel	887c290e82	sched/numa: Decide whether to favour task or group weights based on swap candidate relationships This patch separately considers task and group affinities when searching for swap candidates during task NUMA placement. If tasks are not part of a group or the same group then the task weights are considered. Otherwise the group weights are compared. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-54-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-09 14:48:06 +02:00
Ingo Molnar	b32e86b430	sched/numa: Add debugging Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1381141781-10992-53-git-send-email-mgorman@suse.de	2013-10-09 14:48:04 +02:00

... 3 4 5 6 7 ...

16928 Commits