linux

iv/linux

Author	SHA1	Message	Date
Paul E. McKenney	11ca7a9d54	Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations. doc.2019.05.28a: Documentation updates. fixes.2019.06.13a: Miscellaneous fixes. srcu.2019.05.28a: SRCU updates. sync.2019.05.28a: RCU-sync flavor consolidation. torture.2019.05.28a: Torture-test updates.	2019-06-19 09:21:46 -07:00
Paul E. McKenney	96050c68be	rcu: Upgrade sync_exp_work_done() to smp_mb() The sync_exp_work_done() function uses smp_mb__before_atomic(), but there is no obvious atomic in the ensuing code. The ordering is absolutely required for grace periods to work correctly, so this commit upgrades the smp_mb__before_atomic() to smp_mb(). Fixes: 6fba2b3767ea ("rcu: Remove deprecated RCU debugfs tracing code") Reported-by: Andrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-06-13 15:33:19 -07:00
Paul E. McKenney	354ea05d02	rcutorture: Upper case solves the case of the vanishing NULL pointer Various security techniques can obfuscate pointer printouts on the console. Unfortunately, rcutorture relies on either "null" or all zeroes to identify the last few statistics printouts at the end of the test. These need to be identified because failing to do so will results in false-positive complaints about grace-period hangs. This commit therefore prints the "ver:" in capitals ("VER:") when the RCU-protected pointer has been set to NULL, which causes rcutorture's parse-console.sh script to correctly ignore these lines. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	34aa34b818	rcutorture: Dump trace buffer for callback pipe drain failures Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	c682db558e	rcutorture: Add trivial RCU implementation I have been showing off a trivial RCU implementation for non-preemptive environments for some time now: #define rcu_read_lock() #define rcu_read_unlock() #define rcu_dereference(p) READ_ONCE(p) #define rcu_assign_pointer(p, v) smp_store_release(&(p), (v)) void synchronize_rcu(void) { int cpu; for_each_online_cpu(cpu) sched_setaffinity(current->pid, cpumask_of(cpu)); } Trivial or not, as the old saying goes, "if it ain't tested, it don't work!". This commit therefore adds a "trivial" flavor to rcutorture and a corresponding TRIVIAL test scenario. This variant does not handle CPU hotplug, which is unconditionally enabled on x86 for post-v5.1-rc3 kernels, which is why the TRIVIAL.boot says "rcutorture.onoff_interval=0". This commit actually does handle CONFIG_PREEMPT=y kernels, but only because it turns back the Linux-kernel clock in order to provide these alternative definitions (or the moral equivalent thereof): #define rcu_read_lock() preempt_disable() #define rcu_read_unlock() preempt_enable() In CONFIG_PREEMPT=n kernels without debugging, these are equivalent to empty macros give or take a compiler barrier. However, the have been successfully tested with actual empty macros as well. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Fix symbol issue reported by kbuild test robot <lkp@intel.com>. ] [ paulmck: Work around sched_setaffinity() issue noted by Andrea Parri. ] [ paulmck: Add rcutorture.shuffle_interval=0 to TRIVIAL.boot to fix interaction with shuffler task noted by Peter Zijlstra. ] Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	3432d765c5	rcutorture: Halt forward-progress checks at end of run Once removed, an rcu_torture element can be deferred-freed by a chain of call_rcu() invocations, with each callback invoking another round of call_rcu() until either a fixed number of call_rcu() invocations have been chained or until the test ends. This means that if the test ends, some of the rcu_torture elements will be "stranded" partway through the deferred-free process, which results in false-positive warnings from rcu_torture_writer() due to lack of forward progress should the test end just at the end of a stutter interval. This commit therefore suppresses rcu_torture_writer()'s forward-progress checks when the test ends in order to avoid these false-positive reports.. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	ab21f6081f	rcutorture: Give the scheduler a chance on PREEMPT && NO_HZ_FULL kernels In !PREEMPT kernels, cond_resched() is a no-op. In NO_HZ_FULL kernels, in-kernel execution (such as that of rcutorture's kthreads) might extend indefinitely without the scheduler gaining the aid of a scheduling-clock interrupt. This combination can make the interaction of an rcutorture forward-progress test and a CPU-hotplug stop_machine operation make less forward progress than one might like. Additionally, Sebastian Siewior notes that NO_HZ_FULL kernels have a scheduler check upon return to userspace execution, which suggests that in-kernel emulation of tight userspace loops containing system calls doing call_rcu() might also need explicit checks in the PREEMPT && NO_HZ_FULL case. This commit therefore introduces a rcu_torture_fwd_prog_cond_resched() function that explicitly invokes schedule() in such kernels whenever need_resched() returns true, while retaining use of cond_resched() for kernels that are either !PREEMPT or !NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	5eabea594b	rcutorture: Exempt tasks RCU from timely draining of grace periods After the end of each stutter pause interval, the rcu_torture_writer() kthread checks to be sure that all prior callbacks have completed so that all the test structures have been freed. This works fine except for tasks RCU, in which grace periods can take one good long time. This commit therefore exempts tasks RCU from this check. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	ff3bf92d90	torture: Allow inter-stutter interval to be specified Currently, the inter-stutter interval is the same as the stutter duration, that is, whatever number of jiffies is passed into torture_stutter_init(). This has worked well for quite some time, but the addition of forward-progress testing to rcutorture can delay processes for several seconds, which can triple the time that they are stuttered. This commit therefore adds a second argument to torture_stutter_init() that specifies the inter-stutter interval. While locktorture preserves the current behavior, rcutorture uses the RCU CPU stall warning interval to provide a wider inter-stutter interval. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	e8516c64fe	rcutorture: Fix stutter_wait() return value and freelist checks The stutter_wait() function is supposed to return true if it actually waits and false otherwise, but it instead unconditionally returns false. Which hides a bug in rcu_torture_writer() that fails to account for the fact that one of the rcu_tortures[] array elements will normally be referenced by rcu_torture_current, and thus not be on the freelist. This commit therefore corrects the stutter_wait() return value and adds a check for rcu_torture_current to rcu_torture_writer()'s check that things get freed after everything goes quiescent. In addition, this commit causes torture_stutter() to give a bit more than one second (instead of only one jiffy) warning of the end of the stutter interval. Finally, this commit disables long-delay readers and aggressive update-side forward-progress checks while forward-progress testing is in flight. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Paul E. McKenney	140e53f20b	rcutorture: Add cond_resched() to forward-progress free-up loop The rcu_torture_fwd_prog_cbfree() function frees callbacks used during rcutorture's call_rcu() forward-progress test, but does so in a tight loop. This could cause problems given a very long list of callbacks to be freed, and actual testing produces lists with as many as 25M callbacks. This commit therefore adds a cond_resched() to this loop. While in the area, this commit also rearranges the lock releases to look a bit more sane. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:06:09 -07:00
Oleg Nesterov	89da3b94bb	rcu/sync: Simplify the state machine With this patch rcu_sync has a single state variable and the transition rules become really simple: GP_IDLE - owned by the first rcu_sync_enter() which moves it to GP_ENTER - owned by rcu-callback which moves it to GP_PASSED - owned by the last rcu_sync_exit() which moves it to GP_EXIT - and this is the only "nontrivial" state. rcu-callback moves it back to GP_IDLE unless another enter() comes before a GP pass. If rcu-callback is invoked before the next rcu_sync_exit() it must see gp_count incremented by that enter() and set GP_PASSED. Otherwise, if the next rcu_sync_exit() wins the race, it will move it to GP_REPLAY - owned by rcu-callback which moves it to GP_EXIT Signed-off-by: Oleg Nesterov <oleg@redhat.com> [ paulmck: While here, apply READ_ONCE() and WRITE_ONCE() to ->gp_state. ] [ paulmck: Tweaks to make htmldocs happy. (Reported by kbuild test robot.) ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:05:23 -07:00
Oleg Nesterov	3f2947b781	locking/percpu-rwsem: Add DEFINE_PERCPU_RWSEM(), use it to initialize cgroup_threadgroup_rwsem Turn DEFINE_STATIC_PERCPU_RWSEM() into __DEFINE_PERCPU_RWSEM() with the additional "is_static" argument to introduce DEFINE_PERCPU_RWSEM(). Change cgroup.c to use DEFINE_PERCPU_RWSEM(cgroup_threadgroup_rwsem). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:05:23 -07:00
Oleg Nesterov	2bf1acc299	uprobes: Use DEFINE_STATIC_PERCPU_RWSEM() to initialize dup_mmap_sem Use DEFINE_STATIC_PERCPU_RWSEM() to initialize dup_mmap_sem. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:05:23 -07:00
Oleg Nesterov	95bf33b55f	rcu/sync: Kill rcu_sync_type/gp_type Now that the RCU flavors have been consolidated, rcu_sync_type makes no sense because none of internal update functions aside from .held() depend on gp_type. This commit therefore removes this field and consolidates the relevant code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> [ paulmck: Added RCU and RCU-bh checks to rcu_sync_is_idle(). ] [ paulmck: And applied subsequent feedback from Oleg Nesterov. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:05:23 -07:00
Jiang Biao	11b000457f	rcu: Make __call_srcu static Because __call_srcu() is not used outside kernel/rcu/srcutree.c, this commit makes it static. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:03:35 -07:00
Paul E. McKenney	fe15b50cde	srcu: Allocate per-CPU data for DEFINE_SRCU() in modules Adding DEFINE_SRCU() or DEFINE_STATIC_SRCU() to a loadable module requires that the size of the reserved region be increased, which is not something we want to be doing all that often. One approach would be to require that loadable modules define an srcu_struct and invoke init_srcu_struct() from their module_init function and cleanup_srcu_struct() from their module_exit function. However, this is more than a bit user unfriendly. This commit therefore creates an ___srcu_struct_ptrs linker section, and pointers to srcu_struct structures created by DEFINE_SRCU() and DEFINE_STATIC_SRCU() within a module are placed into that module's ___srcu_struct_ptrs section. The required init_srcu_struct() and cleanup_srcu_struct() functions are then automatically invoked as needed when that module is loaded and unloaded, thus allowing modules to continue to use DEFINE_SRCU() and DEFINE_STATIC_SRCU() while avoiding the need to increase the size of the reserved region. Many of the algorithms and some of the code was cheerfully cherry-picked from other code making use of linker sections, perhaps most notably from tracepoints. All bugs are nevertheless the sole property of the author. Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> [ paulmck: Use __section() and use "default" in srcu_module_notify()'s "switch" statement as suggested by Joel Fernandes. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2019-05-28 09:03:35 -07:00
Paul E. McKenney	d5a9a8c3bc	rcu: Set a maximum limit for back-to-back callback invocation Currently, if a CPU has more than 10,000 callbacks pending, it will increase rdp->blimit to LONG_MAX. If you are lucky, LONG_MAX is only about two billion, but this is still a bit too many callbacks to invoke back-to-back while otherwise ignoring the world. This commit therefore sets a maximum limit of DEFAULT_MAX_RCU_BLIMIT, which is set to 10,000, for rdp->blimit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:02:57 -07:00
Neeraj Upadhyay	3ae976a7e3	rcu: Correctly unlock root node in rcu_check_gp_start_stall() On systems whose rcu_node tree has only one node, the rcu_check_gp_start_stall() function's values of rnp and rnp_root will be identical. In this case, it clearly does not make sense to release both rnp->lock and rnp_root->lock, but that is exactly what this function does in the last early exit. This commit therefore unlocks only rnp->lock when rnp and rnp_root are equal. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:02:57 -07:00
Neeraj Upadhyay	cd6d17b4a4	rcu: Dump specified number of blocked tasks The dump_blkd_tasks() function dumps at most 10 blocked tasks, ignoring the value of the ncheck parameter. This commit therefore substitutes the value of ncheck for the hard-coded value of 10. Because all callers currently pass 10 as the number, this patch does not change behavior, but it is clearly an accident waiting to happen. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:02:57 -07:00
Jiang Biao	f0b6356273	rcu: Remove unused rdp local from synchronize_rcu_expedited() Because rdp is initialized but never used in synchronize_rcu_expedited(), this commit removes it. Signed-off-by: Jiang Biao <benbjiang@tencent.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 08:48:19 -07:00
Paul E. McKenney	1bb336443c	rcu: Rename rcu_data's ->deferred_qs to ->exp_deferred_qs The rcu_data structure's ->deferred_qs field is used to indicate that the current CPU is blocking an expedited grace period (perhaps a future one). Given that it is used only for expedited grace periods, its current name is misleading, so this commit renames it to ->exp_deferred_qs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 08:48:19 -07:00
Joel Fernandes (Google)	eddded8012	rcu: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle() It would be good to combine the dynticks and dynticks_nesting counters in order to simplify the code. Unfortunately, there are concerns about usermode upcalls appearing to RCU as half of an interrupt, as Byungchul learned [1]. The "half" in "half interrupt" is due to an unpaired rcu_irq_enter(): Normally, each rcu_irq_enter() has a later call to rcu_irq_exit(). Out of an abundance of caution, Paul added warnings [2] in the RCU code which if not fired by 2021 will be interpreted as meaning that this half-interrupt scenario cannot happen any more, thus permitting simplification of this code. In the meantime, this commit makes the following changes: (1) Combining these two counters requires that rcu_rrupt_from_idle() is invoked only from hard-interrupt contexts as discussed here [3]. This commit therefore adds the required lockdep_assert_in_irq() to check this constraint. (2) Furthermore, rcu_rrupt_from_idle() is not explicit about how it is using the counters which can lead to weird future bugs. This commit therefore adds comments indicating the meaning and use of each counter. (3) Lastly, this commit checks for counter underflows as another check that half interrupts don't occur. (Previously, the function would simply return true upon underflow.) All these checks checks are NOOPs if PROVE_LOCKING (and thus PROVE_RCU) are disabled. [1] https://lore.kernel.org/patchwork/patch/952349/ [2] Commit e11ec65cc8d6 ("rcu: Add warning to detect half-interrupts") [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/ Cc: byungchul.park@lge.com Cc: kernel-team@android.com Cc: rcu@vger.kernel.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 08:48:19 -07:00
Paul E. McKenney	e015a34112	rcu: Avoid self-IPI in sync_sched_exp_online_cleanup() The sync_sched_exp_online_cleanup() is invoked at online time to handle the case where the start of an expedited grace period ran concurrently with a CPU being taken offline and then immediately being placed online. It checks to see if RCU needs an expedited quiescent state from the incoming CPU, sending it an IPI if so. However, it is quite possible that sync_sched_exp_online_cleanup() is running on that CPU, in which case it is considerably less overhead to simply request the quiescent state locally instead of simulating a self-IPI. This commit therefore places the last few lines of rcu_exp_handler() into a new rcu_exp_need_qs() function, which is invoked both by rcu_exp_handler() and by sync_sched_exp_online_cleanup() in the self-IPI case. This also reduces the rcu_exp_handler() function's state space by removing the direct call that this smp_call_function_single() uses to emulate the requested self-IPI. This in turn will allow tighter error checking in rcu_is_cpu_rrupt_from_idle(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2019-05-25 14:50:50 -07:00
Paul E. McKenney	b9ad4d6ed1	rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus() Although sync_rcu_exp_select_node_cpus() treats the current CPU as being in a quiescent state, it might well migrate to some other CPU before reaching the smp_call_function_single(), which could then result in an unnecessary simulated self-IPI. This commit therefore instead simply refuses to invoke smp_call_function_single() on the current CPU, which causes the later rcu_report_exp_cpu_mult() to report this CPU's quiescent state with less overhead. This also reduces the rcu_exp_handler() function's state space by removing the direct call that this smp_call_function_single() uses to emulate the requested self-IPI. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Use get_cpu() instead of preempt_disable() per Joel Fernandes. ]	2019-05-25 14:50:50 -07:00
Paul E. McKenney	43e903ad3e	rcu: Inline invoke_rcu_callbacks() into its sole remaining caller This commit saves a few lines of code by inlining invoke_rcu_callbacks() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:49 -07:00
Paul E. McKenney	0864f057b0	rcu: Use irq_work to get scheduler's attention in clean context When rcu_read_unlock_special() is invoked with interrupts disabled, is either not in an interrupt handler or is not using RCU_SOFTIRQ, is not the first RCU read-side critical section in the chain, and either there is an expedited grace period in flight or this is a NO_HZ_FULL kernel, the end of the grace period can be unduly delayed. The reason for this is that it is not safe to do wakeups in this situation. This commit fixes this problem by using the irq_work subsystem to force a later interrupt handler in a clean environment. Because set_tsk_need_resched(current) and set_preempt_need_resched() are invoked prior to this, the scheduler will force a context switch upon return from this interrupt (though perhaps at the end of any interrupted preempt-disable or BH-disable region of code), which will invoke rcu_note_context_switch() (again in a clean environment), which will in turn give RCU the chance to report the deferred quiescent state. Of course, by then this task might be within another RCU read-side critical section. But that will be detected at that time and reporting will be further deferred to the outermost rcu_read_unlock(). See rcu_preempt_need_deferred_qs() and rcu_preempt_deferred_qs() for more details on the checking. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:49 -07:00
Paul E. McKenney	385b599e8c	rcu: Allow rcu_read_unlock_special() to raise_softirq() if in_irq() When running in an interrupt handler, raise_softirq() and raise_softirq_irqoff() have extremely low overhead: They simply set a bit in a per-CPU mask, which is checked upon exit from that interrupt handler. Therefore, if rcu_read_unlock_special() is invoked within an interrupt handler and RCU_SOFTIRQ is in use, this commit make use of raise_softirq_irqoff() even if there is no expedited grace period in flight and even if this is not a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:48 -07:00
Paul E. McKenney	25102de65f	rcu: Only do rcu_read_unlock_special() wakeups if expedited Currently, rcu_read_unlock_special() will do wakeups whenever it is safe to do so. However, wakeups are expensive, and they are only really needed when the just-ended RCU read-side critical section is blocking an expedited grace period (in which case speed is of the essence) or on a nohz_full CPU (where it might be a good long time before an interrupt arrives). This commit therefore checks for these conditions, and does the expensive wakeups only if doing so would be useful. Note it can be rather expensive to determine whether or not the current task (as opposed to the current CPU) is blocking the current expedited grace period. Doing so requires traversing the ->blkd_tasks list, which can be quite long. This commit therefore cheats: If the current task is on a given ->blkd_tasks list, and some task on that list is blocking the current expedited grace period, the code assumes that the current task is blocking that expedited grace period. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:48 -07:00
Paul E. McKenney	23634ebc1d	rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() When RCU core processing is offloaded from RCU_SOFTIRQ to the rcuc kthreads, a full and unconditional wakeup is required to initiate RCU core processing. In contrast, when RCU core processing is carried out by RCU_SOFTIRQ, a raise_softirq() suffices. Of course, there are situations where raise_softirq() does a full wakeup, but these do not occur with normal usage of rcu_read_unlock(). The reason that full wakeups can be problematic is that the scheduler sometimes invokes rcu_read_unlock() with its pi or rq locks held, which can of course result in deadlock in CONFIG_PREEMPT=y kernels when rcu_read_unlock() invokes the scheduler. Scheduler invocations can happen in the following situations: (1) The just-ended reader has been subjected to RCU priority boosting, in which case rcu_read_unlock() must deboost, (2) Interrupts were disabled across the call to rcu_read_unlock(), so the quiescent state must be deferred, requiring a wakeup of the rcuc kthread corresponding to the current CPU. Now, the scheduler may hold one of its locks across rcu_read_unlock() only if preemption has been disabled across the entire RCU read-side critical section, which in the days prior to RCU flavor consolidation meant that rcu_read_unlock() never needed to do wakeups. However, this is no longer the case for any but the first rcu_read_unlock() following a condition (e.g., preempted RCU reader) requiring special rcu_read_unlock() attention. For example, an RCU read-side critical section might be preempted, but preemption might be disabled across the rcu_read_unlock(). The rcu_read_unlock() must defer the quiescent state, and therefore leaves the task queued on its leaf rcu_node structure. If a scheduler interrupt occurs, the scheduler might well invoke rcu_read_unlock() with one of its locks held. However, the preempted task is still queued, so rcu_read_unlock() will attempt to defer the quiescent state once more. When RCU core processing is carried out by RCU_SOFTIRQ, this works just fine: The raise_softirq() function simply sets a bit in a per-CPU mask and the RCU core processing will be undertaken upon return from interrupt. Not so when RCU core processing is carried out by the rcuc kthread: In this case, the required wakeup can result in deadlock. The initial solution to this problem was to use set_tsk_need_resched() and set_preempt_need_resched() to force a future context switch, which allows rcu_preempt_note_context_switch() to report the deferred quiescent state to RCU's core processing. Unfortunately for expedited grace periods, there can be a significant delay between the call for a context switch and the actual context switch. This commit therefore introduces a ->deferred_qs flag to the task_struct structure's rcu_special structure. This flag is initially false, and is set to true by the first call to rcu_read_unlock() requiring special attention, then finally reset back to false when the quiescent state is finally reported. Then rcu_read_unlock() attempts full wakeups only when ->deferred_qs is false, that is, on the first rcu_read_unlock() requiring special attention. Note that a chain of RCU readers linked by some other sort of reader may find that a later rcu_read_unlock() is once again able to do a full wakeup, courtesy of an intervening preemption: rcu_read_lock(); /* preempted / local_irq_disable(); rcu_read_unlock(); / Can do full wakeup, sets ->deferred_qs. / rcu_read_lock(); local_irq_enable(); preempt_disable() rcu_read_unlock(); / Cannot do full wakeup, ->deferred_qs set. / rcu_read_lock(); preempt_enable(); / preempted, >deferred_qs reset. / local_irq_disable(); rcu_read_unlock(); / Can again do full wakeup, sets ->deferred_qs. */ Such linked RCU readers do not yet seem to appear in the Linux kernel, and it is probably best if they don't. However, RCU needs to handle them, and some variations on this theme could make even raise_softirq() unsafe due to the possibility of its doing a full wakeup. This commit therefore also avoids invoking raise_softirq() when the ->deferred_qs set flag is set. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>	2019-05-25 14:50:47 -07:00
Sebastian Andrzej Siewior	48d07c04b4	rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:46 -07:00
Linus Torvalds	cb6f8739fb	Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: "A few final bits: - large changes to vmalloc, yielding large performance benefits - tweak the console-flush-on-panic code - a few fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: panic: add an option to replay all the printk message in buffer initramfs: don't free a non-existent initrd fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro mm/vmalloc.c: keep track of free blocks for vmap allocation	2019-05-19 12:15:32 -07:00
Linus Torvalds	d9351ea14d	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ chip updates from Ingo Molnar: "A late irqchips update: - New TI INTR/INTA set of drivers - Rewrite of the stm32mp1-exti driver as a platform driver - Update the IOMMU MSI mapping API to be RT friendly - A number of cleanups and other low impact fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) iommu/dma-iommu: Remove iommu_dma_map_msi_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts genirq/msi: Add a new field in msi_desc to store an IOMMU cookie arm64: arch_k3: Enable interrupt controller drivers irqchip/ti-sci-inta: Add msi domain support soc: ti: Add MSI domain bus support for Interrupt Aggregator irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings irqchip/ti-sci-intr: Add support for Interrupt Router driver dt-bindings: irqchip: Introduce TISCI Interrupt router bindings gpio: thunderx: Use the default parent apis for {request,release}_resources genirq: Introduce irq_chip_{request,release}_resource_parent() apis firmware: ti_sci: Add helper apis to manage resources firmware: ti_sci: Add RM mapping table for am654 firmware: ti_sci: Add support for IRQ management firmware: ti_sci: Add support for RM core ops ...	2019-05-19 10:58:45 -07:00
Feng Tang	de6da1e8bc	panic: add an option to replay all the printk message in buffer Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-18 15:52:26 -07:00
Linus Torvalds	5f3ab27b9e	Merge branch 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "The cgroup2 freezer pulled in this cycle broke strace. This pull request includes a workaround for the problem. It's not a complete fix in that it may cause spurious frozen state flip-flops which is fairly minor. Will push a full fix once it's ready" * 'for-5.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: signal: unconditionally leave the frozen state in ptrace_stop()	2019-05-16 19:01:23 -07:00
Linus Torvalds	b2c3dda6f8	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull time fixes from Ingo Molnar: "A TIA adjtimex interface extension, and a POSIX compliance ABI fix for timespec64 users" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ntp: Allow TAI-UTC offset to be set to zero y2038: Make CONFIG_64BIT_TIME unconditional	2019-05-16 11:00:20 -07:00
Linus Torvalds	f57d7715d7	Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "A single rwsem fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Prevent decrement of reader count before increment	2019-05-16 10:54:19 -07:00
Roman Gushchin	05b2892637	signal: unconditionally leave the frozen state in ptrace_stop() Alex Xu reported a regression in strace, caused by the introduction of the cgroup v2 freezer. The regression can be reproduced by stracing the following simple program: #include <unistd.h> int main() { write(1, "a", 1); return 0; } An attempt to run strace ./a.out leads to the infinite loop: [ pre-main omitted ] write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) write(1, "a", 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) [ repeats forever ] The problem occurs because the traced task leaves ptrace_stop() (and the signal handling loop) with the frozen bit set. So let's call cgroup_leave_frozen(true) unconditionally after sleeping in ptrace_stop(). With this patch applied, strace works as expected: [ pre-main omitted ] write(1, "a", 1) = 1 exit_group(0) = ? +++ exited with 0 +++ Reported-by: Alex Xu <alex_y_xu@yahoo.ca> Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2019-05-16 10:43:58 -07:00
Linus Torvalds	d2d8b14604	The major changes in this tracing update includes: - Removing of non-DYNAMIC_FTRACE from 32bit x86 - Removing of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw= =RVQo -----END PGP SIGNATURE----- Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major changes in this tracing update includes: - Removal of non-DYNAMIC_FTRACE from 32bit x86 - Removal of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups" * tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) x86: Hide the int3_emulate_call/jmp functions from UML livepatch: Remove klp_check_compiler_support() ftrace/x86: Remove mcount support ftrace/x86_32: Remove support for non DYNAMIC_FTRACE tracing: Simplify "if" macro code tracing: Fix documentation about disabling options using trace_options tracing: Replace kzalloc with kcalloc tracing: Fix partial reading of trace event's id file tracing: Allow RCU to run between postponed startup tests tracing: Fix white space issues in parse_pred() function tracing: Eliminate const char[] auto variables ring-buffer: Fix mispelling of Calculate tracing: probeevent: Fix to make the type of $comm string tracing: probeevent: Do not accumulate on ret variable tracing: uprobes: Re-enable $comm support for uprobe events ftrace/x86_64: Emulate call function while updating in breakpoint handler x86_64: Allow breakpoints to emulate call instructions x86_64: Add gap to int3 to allow for call emulation tracing: kdb: Allow ftdump to skip all but the last few entries tracing: Add trace_total_entries() / trace_total_entries_cpu() ...	2019-05-15 16:05:47 -07:00
Stephen Rothwell	89963adcdb	kernel/compat.c: mark expected switch fall-throughs In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch aims to suppress 3 missing-break-in-switch false positives on some architectures. Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-15 08:16:14 -07:00
Linus Torvalds	1064d85773	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: - a couple of hotfixes - almost all of the rest of MM - lib/ updates - binfmt_elf updates - autofs updates - quite a lot of misc fixes and updates - reiserfs, fatfs - signals - exec - cpumask - rapidio - sysctl - pids - eventfd - gcov - panic - pps - gdb script updates - ipc updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits) mm: memcontrol: fix NUMA round-robin reclaim at intermediate level mm: memcontrol: fix recursive statistics correctness & scalabilty mm: memcontrol: move stat/event counting functions out-of-line mm: memcontrol: make cgroup stats and events query API explicitly local drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl mm, memcg: rename ambiguously named memory.stat counters and functions arch: remove <asm/sizes.h> and <asm-generic/sizes.h> treewide: replace #include <asm/sizes.h> with #include <linux/sizes.h> fs/block_dev.c: Remove duplicate header fs/cachefiles/namei.c: remove duplicate header include/linux/sched/signal.h: replace `tsk' with `task' fs/coda/psdev.c: remove duplicate header ipc: do cyclic id allocation for the ipc object. ipc: conserve sequence numbers in ipcmni_extend mode ipc: allow boot time extension of IPCMNI from 32k to 16M ipc/mqueue: optimize msg_get() ipc/mqueue: remove redundant wq task assignment ipc: prevent lockup on alloc_msg and free_msg scripts/gdb: print cached rate in lx-clk-summary ...	2019-05-14 20:08:51 -07:00
Aaro Koskinen	b287a25a71	panic/reboot: allow specifying reboot_mode for panic only Allow specifying reboot_mode for panic only. This is needed on systems where ramoops is used to store panic logs, and user wants to use warm reset to preserve those, while still having cold reset on normal reboots. Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Feng Tang	c39ea0b9dd	panic: avoid the extra noise dmesg When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] ------------[ cut here ]------------ The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: <IRQ> try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridden by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Adding a flag "suppress_printk" which gets set in panic() to avoid those noisy messages, without changing current kernel behavior that both panic blinking and sysrq magic key can work as is, suggested by Petr Mladek. To verify this, make sure kernel is not configured to reboot on panic and in console # echo c > /proc/sysrq-trigger to see if console only prints out the panic call stack. Link: http://lkml.kernel.org/r/1551430186-24169-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Suggested-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jslaby@suse.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Greg Hackmann	e178a5beb3	gcov: clang support LLVM uses profiling data that's deliberately similar to GCC, but has a very different way of exporting that data. LLVM calls llvm_gcov_init() once per module, and provides a couple of callbacks that we can use to ask for more data. We care about the "writeout" callback, which in turn calls back into compiler-rt/this module to dump all the gathered coverage data to disk: llvm_gcda_start_file() llvm_gcda_emit_function() llvm_gcda_emit_arcs() llvm_gcda_emit_function() llvm_gcda_emit_arcs() [... repeats for each function ...] llvm_gcda_summary_info() llvm_gcda_end_file() This design is much more stateless and unstructured than gcc's, and is intended to run at process exit. This forces us to keep some local state about which module we're dealing with at the moment. On the other hand, it also means we don't depend as much on how LLVM represents profiling data internally. See LLVM's lib/Transforms/Instrumentation/GCOVProfiling.cpp for more details on how this works, particularly GCOVProfiler::emitProfileArcs(), GCOVProfiler::insertCounterWriteout(), and GCOVProfiler::insertFlush(). [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20190417225328.208129-1-trong@android.com Signed-off-by: Greg Hackmann <ghackmann@android.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Tri Vo <trong@android.com> Co-developed-by: Nick Desaulniers <ndesaulniers@google.com> Co-developed-by: Tri Vo <trong@android.com> Tested-by: Trilok Soni <tsoni@quicinc.com> Tested-by: Prasad Sodagudi <psodagud@quicinc.com> Tested-by: Tri Vo <trong@android.com> Tested-by: Daniel Mentz <danielmentz@google.com> Tested-by: Petri Gynther <pgynther@google.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Greg Hackmann	826eba0d77	gcov: clang: move common GCC code into gcc_base.c Patch series "gcov: add Clang support", v4. This patch (of 3): base.c contains a few callbacks specific to GCC's gcov implementation. Move these into their own module in preparation for Clang support. Link: http://lkml.kernel.org/r/20190318025411.98014-2-trong@android.com Signed-off-by: Greg Hackmann <ghackmann@android.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Tri Vo <trong@android.com> Tested-by: Trilok Soni <tsoni@quicinc.com> Tested-by: Prasad Sodagudi <psodagud@quicinc.com> Tested-by: Tri Vo <trong@android.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Petri Gynther <pgynther@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Timmy Li	1fd402df45	kernel/pid.c: remove unneeded hash header file Hash functions are not needed since idr is used now. Let's remove hash header file for cleanup. Link: http://lkml.kernel.org/r/20190430053319.95913-1-scuttimmy@gmail.com Signed-off-by: Timmy Li <scuttimmy@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Eric Sandeen	3116ad38f5	kernel/sysctl.c: fix proc_do_large_bitmap for large input buffers Today, proc_do_large_bitmap() truncates a large write input buffer to PAGE_SIZE - 1, which may result in misparsed numbers at the (truncated) end of the buffer. Further, it fails to notify the caller that the buffer was truncated, so it doesn't get called iteratively to finish the entire input buffer. Tell the caller if there's more work to do by adding the skipped amount back to left/*lenp before returning. To fix the misparsing, reset the position if we have completely consumed a truncated buffer (or if just one char is left, which may be a "-" in a range), and ask the caller to come back for more. Link: http://lkml.kernel.org/r/20190320222831.8243-7-mcgrof@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Christian Brauner	e260ad01f0	sysctl: return -EINVAL if val violates minmax Currently when userspace gives us a values that overflow e.g. file-max and other callers of __do_proc_doulongvec_minmax() we simply ignore the new value and leave the current value untouched. This can be problematic as it gives the illusion that the limit has indeed be bumped when in fact it failed. This commit makes sure to return EINVAL when an overflow is detected. Please note that this is a userspace facing change. Link: http://lkml.kernel.org/r/20190210203943.8227-4-christian@brauner.io Signed-off-by: Christian Brauner <christian@brauner.io> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Andy Shevchenko	475dae3854	kernel/sysctl.c: switch to bitmap_zalloc() Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Link: http://lkml.kernel.org/r/20190304094037.57756-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:51 -07:00
Mathieu Malaterre	b028fb6128	kernel/signal.c: annotate implicit fall through There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warning (W=1). This commit remove the following warning: kernel/signal.c:795:13: warning: this statement may fall through [-Wimplicit-fallthrough=] Link: http://lkml.kernel.org/r/20190114203505.17875-1-malat@debian.org Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-05-14 19:52:50 -07:00

1 2 3 4 5 ...

30372 Commits