linux

iv/linux

Author	SHA1	Message	Date
Paul E. McKenney	59ee0326cc	rcutorture: Suppress forward-progress complaints during early boot Some larger systems can take in excess of 50 seconds to complete their early boot initcalls prior to spawing init. This does not in any way help the forward-progress judgments of built-in rcutorture (when rcutorture is built as a module, the insmod or modprobe command normally cannot happen until some time after boot completes). This commit therefore suppresses such complaints until about the time that init is spawned. This also includes a fix to a stupid error located by kbuild test robot. [ paulmck: Apply kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fix to nohz_full slow-expediting recovery logic, per bpetkov. ] [ paulmck: Restrict splat to CONFIG_PREEMPT_RT=y kernels and simplify. ] Tested-by: Borislav Petkov <bp@alien8.de>	2020-02-20 16:03:30 -08:00
Paul E. McKenney	0e247386d9	Merge branches 'doc.2019.12.10a', 'exp.2019.12.09a', 'fixes.2020.01.24a', 'kfree_rcu.2020.01.24a', 'list.2020.01.10a', 'preempt.2020.01.24a' and 'torture.2019.12.09a' into HEAD doc.2019.12.10a: Documentations updates exp.2019.12.09a: Expedited grace-period updates fixes.2020.01.24a: Miscellaneous fixes kfree_rcu.2020.01.24a: Batch kfree_rcu() work list.2020.01.10a: RCU-protected-list updates preempt.2020.01.24a: Preemptible RCU updates torture.2019.12.09a: Torture-test updates	2020-01-24 10:37:27 -08:00
Ben Dooks	e1350e8e0e	rcu: Move rcu_{expedited,normal} definitions into rcupdate.h This commit moves the rcu_{expedited,normal} definitions from kernel/rcu/update.c to include/linux/rcupdate.h to make sure they are in sync, and also to avoid the following warning from sparse: kernel/ksysfs.c:150:5: warning: symbol 'rcu_expedited' was not declared. Should it be static? kernel/ksysfs.c:167:5: warning: symbol 'rcu_normal' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-01-24 10:33:50 -08:00
Byungchul Park	a35d16905e	rcu: Add basic support for kfree_rcu() batching Recently a discussion about stability and performance of a system involving a high rate of kfree_rcu() calls surfaced on the list [1] which led to another discussion how to prepare for this situation. This patch adds basic batching support for kfree_rcu(). It is "basic" because we do none of the slab management, dynamic allocation, code moving or any of the other things, some of which previous attempts did [2]. These fancier improvements can be follow-up patches and there are different ideas being discussed in those regards. This is an effort to start simple, and build up from there. In the future, an extension to use kfree_bulk and possibly per-slab batching could be done to further improve performance due to cache-locality and slab-specific bulk free optimizations. By using an array of pointers, the worker thread processing the work would need to read lesser data since it does not need to deal with large rcu_head(s) any longer. Torture tests follow in the next patch and show improvements of around 5x reduction in number of grace periods on a 16 CPU system. More details and test data are in that patch. There is an implication with rcu_barrier() with this patch. Since the kfree_rcu() calls can be batched, and may not be handed yet to the RCU machinery in fact, the monitor may not have even run yet to do the queue_rcu_work(), there seems no easy way of implementing rcu_barrier() to wait for those kfree_rcu()s that are already made. So this means a kfree_rcu() followed by an rcu_barrier() does not imply that memory will be freed once rcu_barrier() returns. Another implication is higher active memory usage (although not run-away..) until the kfree_rcu() flooding ends, in comparison to without batching. More details about this are in the second patch which adds an rcuperf test. Finally, in the near future we will get rid of kfree_rcu() special casing within RCU such as in rcu_do_batch and switch everything to just batching. Currently we don't do that since timer subsystem is not yet up and we cannot schedule the kfree_rcu() monitor as the timer subsystem's lock are not initialized. That would also mean getting rid of kfree_call_rcu_nobatch() entirely. [1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org [2] https://lkml.org/lkml/2017/12/19/824 Cc: kernel-team@android.com Cc: kernel-team@lge.com Co-developed-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ] [ paulmck: Make it work during early boot. ] [ paulmck: Add a crude early boot self-test. ] [ paulmck: Style adjustments and experimental docbook structure header. ] Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59 Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2020-01-24 10:17:03 -08:00
Lai Jiangshan	b3e627d3d5	rcu: Make PREEMPT_RCU be a modifier to TREE_RCU Currently PREEMPT_RCU and TREE_RCU are mutually exclusive Kconfig options. But PREEMPT_RCU actually specifies a kind of TREE_RCU, namely a preemptible TREE_RCU. This commit therefore makes PREEMPT_RCU be a modifer to the TREE_RCU Kconfig option. This has the benefit of simplifying several of the #if expressions that formerly needed to check both, but now need only check one or the other. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-12-09 12:37:51 -08:00
Paul E. McKenney	31da067023	Merge branches 'consolidate.2019.08.01b', 'fixes.2019.08.12a', 'lists.2019.08.13a' and 'torture.2019.08.01b' into HEAD consolidate.2019.08.01b: Further consolidation cleanups fixes.2019.08.12a: Miscellaneous fixes lists.2019.08.13a: Optional lockdep arguments for RCU list macros torture.2019.08.01b: Torture-test updates	2019-08-13 14:30:30 -07:00
Paul E. McKenney	b823cafa75	rcu: Remove redundant "if" condition from rcu_gp_is_expedited() Because rcu_expedited_nesting is initialized to 1 and not decremented until just before init is spawned, rcu_expedited_nesting is guaranteed to be non-zero whenever rcu_scheduler_active == RCU_SCHEDULER_INIT. This commit therefore removes this redundant "if" equality test. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2019-08-12 11:25:06 -07:00
Joel Fernandes (Google)	28875945ba	rcu: Add support for consolidated-RCU reader checking This commit adds RCU-reader checks to list_for_each_entry_rcu() and hlist_for_each_entry_rcu(). These checks are optional, and are indicated by a lockdep expression passed to a new optional argument to these two macros. If this optional lockdep expression is omitted, these two macros act as before, checking for an RCU read-side critical section. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Update to eliminate return within macro and update comment. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-09 11:00:35 -07:00
Joel Fernandes (Google)	9147089bee	rcu: Remove redundant debug_locks check in rcu_read_lock_sched_held() The debug_locks flag can never be true at the end of rcu_read_lock_sched_held() because it is already checked by the earlier call todebug_lockdep_rcu_enabled(). This commit therefore removes this redundant check. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:17:01 -07:00
Paul E. McKenney	cdc694b235	rcu: Add kernel parameter to dump trace after RCU CPU stall warning This commit adds a rcu_cpu_stall_ftrace_dump kernel boot parameter, that, when set, causes the trace buffer to be dumped after an RCU CPU stall warning is printed. This kernel boot parameter is disabled by default, maintaining compatibility with previous behavior. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-01 14:05:51 -07:00
Paul E. McKenney	c682db558e	rcutorture: Add trivial RCU implementation I have been showing off a trivial RCU implementation for non-preemptive environments for some time now: #define rcu_read_lock() #define rcu_read_unlock() #define rcu_dereference(p) READ_ONCE(p) #define rcu_assign_pointer(p, v) smp_store_release(&(p), (v)) void synchronize_rcu(void) { int cpu; for_each_online_cpu(cpu) sched_setaffinity(current->pid, cpumask_of(cpu)); } Trivial or not, as the old saying goes, "if it ain't tested, it don't work!". This commit therefore adds a "trivial" flavor to rcutorture and a corresponding TRIVIAL test scenario. This variant does not handle CPU hotplug, which is unconditionally enabled on x86 for post-v5.1-rc3 kernels, which is why the TRIVIAL.boot says "rcutorture.onoff_interval=0". This commit actually does handle CONFIG_PREEMPT=y kernels, but only because it turns back the Linux-kernel clock in order to provide these alternative definitions (or the moral equivalent thereof): #define rcu_read_lock() preempt_disable() #define rcu_read_unlock() preempt_enable() In CONFIG_PREEMPT=n kernels without debugging, these are equivalent to empty macros give or take a compiler barrier. However, the have been successfully tested with actual empty macros as well. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix symbol issue reported by kbuild test robot <lkp@intel.com>. ] [ paulmck: Work around sched_setaffinity() issue noted by Andrea Parri. ] [ paulmck: Add rcutorture.shuffle_interval=0 to TRIVIAL.boot to fix interaction with shuffler task noted by Peter Zijlstra. ] Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	10462d6f58	rcu: Move RCU CPU stall-warning code out of update.c The RCU CPU stall-warning code for normal grace periods is currently scattered across three files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit starts this process by moving stall-warning code from kernel/rcu/update.c to a new kernel/rcu/tree_stall.h file. Note that the definitions of rcu_cpu_stall_suppress and rcu_cpu_stall_timeout must remain in kernel/rcu/update.h to provide compatibility for kernel boot parameter lists. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Linus Torvalds	203b6609e0	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of tooling updates - too many to list, here's a few highlights: - Various subcommand updates to 'perf trace', 'perf report', 'perf record', 'perf annotate', 'perf script', 'perf test', etc. - CPU and NUMA topology and affinity handling improvements, - HW tracing and HW support updates: - Intel PT updates - ARM CoreSight updates - vendor HW event updates - BPF updates - Tons of infrastructure updates, both on the build system and the library support side - Documentation updates. - ... and lots of other changes, see the changelog for details. Kernel side updates: - Tighten up kprobes blacklist handling, reduce the number of places where developers can install a kprobe and hang/crash the system. - Fix/enhance vma address filter handling. - Various PMU driver updates, small fixes and additions. - refcount_t conversions - BPF updates - error code propagation enhancements - misc other changes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits) perf script python: Add Python3 support to syscall-counts-by-pid.py perf script python: Add Python3 support to syscall-counts.py perf script python: Add Python3 support to stat-cpi.py perf script python: Add Python3 support to stackcollapse.py perf script python: Add Python3 support to sctop.py perf script python: Add Python3 support to powerpc-hcalls.py perf script python: Add Python3 support to net_dropmonitor.py perf script python: Add Python3 support to mem-phys-addr.py perf script python: Add Python3 support to failed-syscalls-by-pid.py perf script python: Add Python3 support to netdev-times.py perf tools: Add perf_exe() helper to find perf binary perf script: Handle missing fields with -F +.. perf data: Add perf_data__open_dir_data function perf data: Add perf_data__(create_dir\|close_dir) functions perf data: Fail check_backup in case of error perf data: Make check_backup work over directories perf tools: Add rm_rf_perf_data function perf tools: Add pattern name checking to rm_rf perf tools: Add depth checking to rm_rf perf data: Add global path holder ...	2019-03-06 07:59:36 -08:00
Masami Hiramatsu	a39f15b964	kprobes: Prohibit probing on RCU debug routine Since kprobe itself depends on RCU, probing on RCU debug routine can cause recursive breakpoint bugs. Prohibit probing on RCU debug routines. int3 ->do_int3() ->ist_enter() ->RCU_LOCKDEP_WARN() ->debug_lockdep_rcu_enabled() -> int3 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998807741.31052.11229157537816341591.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-02-13 08:16:40 +01:00
Paul E. McKenney	38b4df649e	rcu/update: Convert to SPDX license identifier Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>	2019-02-09 08:44:27 -08:00
Paul E. McKenney	eaaf055f27	Merge branches 'bug.2018.11.12a', 'consolidate.2018.12.01a', 'doc.2018.11.12a', 'fixes.2018.11.12a', 'initrd.2018.11.08b', 'sil.2018.11.12a' and 'srcu.2018.11.27a' into HEAD bug.2018.11.12a: Get rid of BUG_ON() and friends consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup doc.2018.11.12a: Documentation updates fixes.2018.11.12a: Miscellaneous fixes initrd.2018.11.08b: Automate creation of rcutorture initrd sil.2018.11.12a: Remove more spin_unlock_wait() calls	2018-12-01 12:43:16 -08:00
Paul E. McKenney	f0ad56e876	rcu: Eliminate BUG_ON() for kernel/rcu/update.c The update.c file has a number of calls to BUG_ON(), which panics the kernel, which is not a good strategy for devices (like embedded) that don't have a way to capture console output. This commit therefore converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2018-11-12 08:15:59 -08:00
Paul E. McKenney	309ba859b9	rcu: Eliminate synchronize_rcu_mult() Now that synchronize_rcu() waits for both RCU read-side critical sections and preempt-disabled regions of code, the sole caller of synchronize_rcu_mult() can be replaced by synchronize_rcu(). This patch makes this change and removes synchronize_rcu_mult(). Note that _wait_rcu_gp() still supports synchronize_rcu_mult(), and thus might be simplified in the future to take only take a single call_rcu() function rather than the current list of them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-11-08 21:43:20 -08:00
Paul E. McKenney	b56ada1209	Merge branches 'doc.2018.08.30a', 'dynticks.2018.08.30b', 'srcu.2018.08.30b' and 'torture.2018.08.29a' into HEAD doc.2018.08.30a: Documentation updates dynticks.2018.08.30b: RCU flavor consolidation updates and cleanups srcu.2018.08.30b: SRCU updates torture.2018.08.29a: Torture-test updates	2018-08-30 16:12:53 -07:00
Paul E. McKenney	e0fcba9ac0	srcu: Make call_srcu() available during very early boot Event tracing is moving to SRCU in order to take advantage of the fact that SRCU may be safely used from idle and even offline CPUs. However, event tracing can invoke call_srcu() very early in the boot process, even before workqueue_init_early() is invoked (let alone rcu_init()). Therefore, call_srcu()'s attempts to queue work fail miserably. This commit therefore detects this situation, and refrains from attempting to queue work before rcu_init() time, but does everything else that it would have done, and in addition, adds the srcu_struct to a global list. The rcu_init() function now invokes a new srcu_init() function, which is empty if CONFIG_SRCU=n. Otherwise, srcu_init() queues work for each srcu_struct on the list. This all happens early enough in boot that there is but a single CPU with interrupts disabled, which allows synchronization to be dispensed with. Of course, the queued work won't actually be invoked until after workqueue_init() is invoked, which happens shortly after the scheduler is up and running. This means that although call_srcu() may be invoked any time after per-CPU variables have been set up, there is still a very narrow window when synchronize_srcu() won't work, and this window extends from the time that the scheduler starts until the time that workqueue_init() returns. This can be fixed in a manner similar to the fix for synchronize_rcu_expedited() and friends, but until someone actually needs to use synchronize_srcu() during this window, this fix is added churn for no benefit. Finally, note that Tree SRCU's new srcu_init() function invokes queue_work() rather than the queue_delayed_work() function that is invoked post-boot. The reason is that queue_delayed_work() will (as you would expect) post a timer, and timers have not yet been initialized. So use of queue_work() avoids the complaints about use of uninitialized spinlocks that would otherwise result. Besides, some delay is already provide by the aforementioned fact that the queued work won't actually be invoked until after the scheduler is up and running. Requested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-08-30 16:10:19 -07:00
Paul E. McKenney	06462efc80	rcu: Clean up flavor-related definitions and comments in update.c Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:36 -07:00
Paul E. McKenney	72ce30dd1f	rcu: Stop testing RCU-bh and RCU-sched Now that the RCU-bh and RCU-sched update-side functions are simple wrappers around their RCU counterparts, there isn't a whole lot of point in testing them. This commit therefore removes the self-test capability and removes the corresponding kernel-boot parameters. It also updates the various rcutorture .boot files to remove the kernel boot parameters that call for testing RCU-bh and RCU-sched. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:03:29 -07:00
Paul E. McKenney	82fcecfa81	rcu: Update comments and help text for no more RCU-bh updaters This commit updates comments and help text to account for the fact that RCU-bh update-side functions are now simple wrappers for their RCU or RCU-sched counterparts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-30 16:02:42 -07:00
Paul E. McKenney	1b27291b1e	rcutorture: Add forward-progress tests for RCU grace periods This commit adds a kthread that loops going into and out of RCU read-side critical sections, but also including a cond_resched(), optionally guarded by a check of need_resched(), in that same loop. This commit relies solely on rcu_torture_writer() progress to judge the forward progress of grace periods. Note that Tasks RCU and SRCU are exempted from forward-progress testing due their (intentionally) less-robust forward-progress guarantees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-08-29 09:20:48 -07:00
Steven Rostedt (VMware)	cd23ac8ddb	rcu: Add comment to the last sleep in the rcu tasks loop At the end of rcu_tasks_kthread() there's a lonely schedule_timeout_uninterruptible() call with no apparent rationale for its existence. But there is. It is to keep the thread from going into a tight loop if there's some anomaly. That really needs a comment. Link: http://lkml.kernel.org/r/20180524223839.GU3803@linux.vnet.ibm.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:21 -07:00
Steven Rostedt (VMware)	c03be752d3	rcu: Speed up calling of RCU tasks callbacks Joel Fernandes found that the synchronize_rcu_tasks() was taking a significant amount of time. He demonstrated it with the following test: # cd /sys/kernel/tracing # while [ 1 ]; do x=1; done & # echo '__schedule_bug:traceon' > set_ftrace_filter # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m1.064s user 0m0.000s sys 0m0.004s Where it takes a little over a second to perform the synchronize, because there's a loop that waits 1 second at a time for tasks to get through their quiescent points when there's a task that must be waited for. After discussion we came up with a simple way to wait for holdouts but increase the time for each iteration of the loop but no more than a full second. With the new patch we have: # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m0.131s user 0m0.000s sys 0m0.004s Which drops it down to 13% of what the original wait time was. Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:21 -07:00
Paul E. McKenney	6f56f714db	rcu: Improve RCU-tasks naming and comments The naming and comments associated with some RCU-tasks code make the faulty assumption that context switches due to cond_resched() are voluntary. As several people pointed out, this is not the case. This commit therefore updates function names and comments to better reflect current reality. Reported-by: Byungchul Park <byungchul.park@lge.com> Reported-by: Joel Fernandes <joel@joelfernandes.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-07-12 15:39:15 -07:00
Paul E. McKenney	0e5da22e3f	rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h The __rcu_read_lock() and __rcu_read_unlock() functions were moved to kernel/rcu/update.c in order to implement tiny preemptible RCU. However, tiny preemptible RCU was removed from the kernel a long time ago, so this commit belatedly moves them back into the only remaining preemptible-RCU code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:41 -07:00
Paul E. McKenney	cee4393989	rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs() Commit `e31d28b6ab` ("trace: Eliminate cond_resched_rcu_qs() in favor of cond_resched()") substituted cond_resched() for the earlier call to cond_resched_rcu_qs(). However, the new-age cond_resched() does not do anything to help RCU-tasks grace periods because (1) RCU-tasks is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a complete no-op when preemption is enabled. This situation results in hangs when running the trace benchmarks. A number of potential fixes were discussed on LKML (https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home), including making cond_resched() not be a no-op; making cond_resched() not be a no-op, but only when running tracing benchmarks; reverting the aforementioned commit (which works because cond_resched_rcu_qs() does provide an RCU-tasks quiescent state; and adding a call to the scheduler/RCU rcu_note_voluntary_context_switch() function. All were deemed unsatisfactory, either due to added cond_resched() overhead or due to magic functions inviting cargo culting. This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(), which provides a clear hint as to what this function is doing and why and where it should be used, and then replaces the call to cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's benchmark_event_kthread() function. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>	2018-05-15 10:27:29 -07:00
Paul E. McKenney	156baec397	rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modules Use of init_rcu_head() and destroy_rcu_head() from modules results in the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y: ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined! This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to be used by GPL-licensed kernel modules. Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2017-12-07 19:51:49 -05:00
Linus Torvalds	3e2014637c	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main updates in this cycle were: - Group balancing enhancements and cleanups (Brendan Jackman) - Move CPU isolation related functionality into its separate kernel/sched/isolation.c file, with related 'housekeeping_()' namespace and nomenclature et al. (Frederic Weisbecker) - Improve the interactive/cpu-intense fairness calculation (Josef Bacik) - Improve the PELT code and related cleanups (Peter Zijlstra) - Improve the logic of pick_next_task_fair() (Uladzislau Rezki) - Improve the RT IPI based balancing logic (Steven Rostedt) - Various micro-optimizations: - better !CONFIG_SCHED_DEBUG optimizations (Patrick Bellasi) - better idle loop (Cheng Jian) - ... plus misc fixes, cleanups and updates" 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds sched/sysctl: Fix attributes of some extern declarations sched/isolation: Document isolcpus= boot parameter flags, mark it deprecated sched/isolation: Add basic isolcpus flags sched/isolation: Move isolcpus= handling to the housekeeping code sched/isolation: Handle the nohz_full= parameter sched/isolation: Introduce housekeeping flags sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL sched/isolation: Rename is_housekeeping_cpu() to housekeeping_cpu() sched/isolation: Use its own static key sched/isolation: Make the housekeeping cpumask private sched/isolation: Provide a dynamic off-case to housekeeping_any_cpu() sched/isolation, watchdog: Use housekeeping_cpumask() instead of ad-hoc version sched/isolation: Move housekeeping related code to its own file sched/idle: Micro-optimize the idle loop sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK x86/tsc: Append the 'tsc=' description for the 'tsc=unstable' boot parameter sched/rt: Simplify the IPI based RT balancing logic block/ioprio: Use a helper to check for RT prio sched/rt: Add a helper to test for a RT task ...	2017-11-13 13:37:52 -08:00
Frederic Weisbecker	de201559df	sched/isolation: Introduce housekeeping flags Before we implement isolcpus under housekeeping, we need the isolation features to be more finegrained. For example some people want NOHZ_FULL without the full scheduler isolation, others want full scheduler isolation without NOHZ_FULL. So let's cut all these isolation features piecewise, at the risk of overcutting it right now. We can still merge some flags later if they always make sense together. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-9-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-27 09:55:29 +02:00
Frederic Weisbecker	7863406143	sched/isolation: Move housekeeping related code to its own file The housekeeping code is currently tied to the NOHZ code. As we are planning to make housekeeping independent from it, start with moving the relevant code to its own file. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-10-27 09:55:24 +02:00
Paul E. McKenney	ad4e25a3a1	Merge branches 'doc.2017.10.20a', 'fixes.2017.10.19a', 'stall.2017.10.09a' and 'torture.2017.10.09a' into HEAD doc.2017.10.20a: Documentation updates. fixes.2017.10.19a: Miscellaneous fixes. stall.2017.10.09a: RCU CPU stall-warning updates. torture.2017.10.09a: Torture-test updates.	2017-10-20 11:11:15 -07:00
Paul E. McKenney	f22ce09157	rcu: Suppress RCU CPU stall warnings while dumping trace Currently, RCU emits Suppress RCU CPU stall warnings during its automatically initiated ftrace_dump() calls after detecting an error condition, which can result in excessively excessive console output and lost trace events. This commit therefore suppresses RCU CPU stall warnings across any of these ftrace_dump() calls. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:25:17 -07:00
Paul E. McKenney	c63eb17ff0	rcu: Create call_rcu_tasks() kthread at boot time Currently the call_rcu_tasks() kthread is created upon first invocation of call_rcu_tasks(). This has the advantage of avoiding creation if there are never any invocations of call_rcu_tasks() and of synchronize_rcu_tasks(), but it requires an unreliable heuristic to determine when it is safe to create the kthread. For example, it is not safe to create the kthread when call_rcu_tasks() is invoked with a spinlock held, but there is no good way to detect this in !PREEMPT kernels. This commit therefore creates this kthread unconditionally at core_initcall() time. If you don't want this kthread created, then build with CONFIG_TASKS_RCU=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-10-09 14:24:14 -07:00
Paul E. McKenney	ccdd29ffff	rcu: Create reasonable API for do_exit() TASKS_RCU processing Currently, the exit-time support for TASKS_RCU is open-coded in do_exit(). This commit creates exit_tasks_rcu_start() and exit_tasks_rcu_finish() APIs for do_exit() use. This has the benefit of confining the use of the tasks_rcu_exit_srcu variable to one file, allowing it to become static. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-08-17 07:26:05 -07:00
Paul E. McKenney	3caec62fbb	rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h The rcu_expedited and rcu_normal variables are used only by sysctl and kernel/rcu/update.c, so it does not make sense to their extern declarations in rcupdate.h. This commit therefore moves these extern declarations to update.c. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:27 -07:00
Paul E. McKenney	a68a2bb28b	rcu: Move docbook comments out of rcupdate.h The include/linux/rcupdate.h file is included by more than 200 files, so shrinking it should provide some build-time benefits. This commit therefore moves several docbook comments from rcupdate.h to kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus reducing the number of times that the compiler has to scan these comments. This likely provides only a small benefit, but every little bit helps. This commit also fixes a malformed bulleted list noted by the 0day Test Robot. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 18:52:27 -07:00
Paul E. McKenney	68ab0b4263	rcu: Make synchronize_rcu_mult() check for duplicates Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might (or might not) wait for two RCU grace periods. One approach is of course "don't do that!", but in CONFIG_PREEMPT=n kernels, synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that. This results in an ugly #ifdef in sched_cpu_deactivate(). This commit therefore makes __wait_rcu_gp() check for duplicates, which in turn allows duplicates to be passed to synchronize_rcu_mult() without risk of waiting twice on the same type of grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:39 -07:00
Paul E. McKenney	59d80fd835	rcu: Print out rcupdate.c non-default boot-time settings This commit adds a rcupdate_announce_bootup_oddness() function to print out non-default values of significant kernel boot parameter settings to aid in debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-06-08 08:25:35 -07:00
Paul E. McKenney	bcbfdd01dc	rcu: Make non-preemptive schedule be Tasks RCU quiescent state Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-21 05:59:27 -07:00
Paul E. McKenney	900b1028ec	srcu: Allow SRCU to access rcu_scheduler_active This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2017-04-18 11:38:18 -07:00
Ingo Molnar	b17b01533b	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:34 +01:00
Ingo Molnar	3f07c01441	sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:29 +01:00
Ingo Molnar	f9411ebe3d	rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> So rcupdate.h is a pretty complex header, in particular it includes <linux/completion.h> which includes <linux/wait.h> - creating a dependency that includes <linux/wait.h> in <linux/sched.h>, which prevents the isolation of <linux/sched.h> from the derived <linux/wait.h> header. Solve part of the problem by decoupling rcupdate.h from completions: this can be done by separating out the rcu_synchronize types and APIs, and updating their usage sites. Since this is a mostly RCU-internal types this will not just simplify <linux/sched.h>'s dependencies, but will make all the hundreds of .c files that include rcupdate.h but not completions or wait.h build faster. ( For rcutiny this means that two dependent APIs have to be uninlined, but that shouldn't be much of a problem as they are rare variants. ) Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:24 +01:00
Sebastian Andrzej Siewior	7c6094db59	rcu: update: Make RCU_EXPEDITE_BOOT be the default RCU_EXPEDITE_BOOT should speed up the boot process by enforcing synchronize_rcu_expedited() instead of synchronize_rcu() during the boot process. There should be no reason why one does not want this and there is no need worry about real time latency at this point. Therefore make it default. Note that users wishing to avoid expediting entirely, for example when bringing up new hardware possibly having flaky IPIs, can use the rcu_normal boot parameter to override boot-time expediting. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Reworded commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-01-16 16:56:39 -08:00
Paul E. McKenney	52d7e48b86	rcu: Narrow early boot window of illegal synchronous grace periods The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit `8b355e3bc1` ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() \|-> acpi_put_table(ivrs_base); \|-> acpi_tb_put_table(table_desc); \|-> acpi_tb_invalidate_table(table_desc); \|-> acpi_tb_release_table(...) \|-> acpi_os_unmap_memory \|-> acpi_os_unmap_iomem \|-> acpi_os_map_cleanup \|-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: `8b355e3bc1` ("rcu: Drive expedited grace periods from workqueue") Reported-by: "Zheng, Lv" <lv.zheng@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stan Kain <stan.kain@gmail.com> Tested-by: Ivan <waffolz@hotmail.com> Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com> Tested-by: Bruno Pesavento <bpesavento@infinito.it> Tested-by: Borislav Petkov <bp@suse.de> Tested-by: Frederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-	2017-01-14 21:23:48 -08:00
Paul Gortmaker	e77b704125	rcu: Don't use modular infrastructure in non-modular code The Kconfig currently controlling compilation of tree.c is: init/Kconfig:config TREE_RCU init/Kconfig: bool ...and update.c and sync.c are "obj-y" meaning that none are ever built as a module by anyone. Since MODULE_ALIAS is a no-op for non-modular code, we can remove them from these files. We leave moduleparam.h behind since the files instantiate some boot time configuration parameters with module_param() still. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-08-22 09:35:27 -07:00
Paul E. McKenney	4929c913bd	rcu: Make call_rcu_tasks() tolerate first call with irqs disabled Currently, if the very first call to call_rcu_tasks() has irqs disabled, it will create the rcu_tasks_kthread with irqs disabled, which will result in a splat in the memory allocator, which kthread_run() invokes with the expectation that irqs are enabled. This commit fixes this problem by deferring kthread creation if called with irqs disabled. The first call to call_rcu_tasks() that has irqs enabled will create the kthread. This bug was detected by rcutorture changes that were motivated by Iftekhar Ahmed's mutation-testing efforts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2016-06-15 15:45:00 -07:00

1 2

95 Commits