linux

iv/linux

Author	SHA1	Message	Date
Frederic Weisbecker	8ca1836769	timer/migration: Fix quick check reporting late expiry When a CPU is the last active in the hierarchy and it tries to enter into idle, the quick check looking up the next event towards cpuidle heuristics may report a too late expiry, such as in the following scenario: [GRP1:0] migrator = NONE active = NONE nextevt = T0:0, T0:1 / \ [GRP0:0] [GRP0:1] migrator = NONE migrator = NONE active = NONE active = NONE nextevt = T0, T1 nextevt = T2 / \ / \ 0 1 2 3 idle idle idle idle 0) The whole system is idle, and CPU 0 was the last migrator. CPU 0 has a timer (T0), CPU 1 has a timer (T1) and CPU 2 has a timer (T2). The expire order is T0 < T1 < T2. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:0(i), T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T0(i), T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 1) CPU 0 becomes active. The (i) means a now ignored timer. [GRP1:0] migrator = GRP0:0 active = GRP0:0 nextevt = T0:1 / \ [GRP0:0] [GRP0:1] migrator = CPU0 migrator = NONE active = CPU0 active = NONE nextevt = T1 nextevt = T2 / \ / \ 0 1 2 3 active idle idle idle 2) CPU 0 handles remote. No timer actually expired but ignored timers have been cleaned out and their sibling's timers haven't been propagated. As a result the top level's next event is T2 and not T1. 3) CPU 0 tries to enter idle without any global timer enqueued and calls tmigr_quick_check(). The expiry of T2 is returned instead of the expiry of T1. When the quick check returns an expiry that is too late, the cpuidle governor may pick up a C-state that is too deep. This may be result into undesired CPU wake up latency if the next timer is actually close enough. Fix this with assuming that expiries aren't sorted top-down while performing the quick check. Pick up instead the earliest encountered one while walking up the hierarchy. 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240305002822.18130-1-frederic@kernel.org	2024-03-06 15:02:09 +01:00
Arnd Bergmann	a184d9835a	tick/sched: Fix build failure for CONFIG_NO_HZ_COMMON=n In configurations with CONFIG_TICK_ONESHOT but no CONFIG_NO_HZ or CONFIG_HIGH_RES_TIMERS, tick_sched_timer_dying() is stubbed out, but still defined as a global function as well: kernel/time/tick-sched.c:1599:6: error: redefinition of 'tick_sched_timer_dying' 1599 \| void tick_sched_timer_dying(int cpu) \| ^ kernel/time/tick-sched.h:111:20: note: previous definition is here 111 \| static inline void tick_sched_timer_dying(int cpu) { } \| ^ This configuration only appears with ARM CONFIG_ARCH_BCM_MOBILE, which should not actually select CONFIG_TICK_ONESHOT. Adjust the #ifdef for the stub to match the condition for building the tick-sched.c file for consistency with the definition and to avoid the build regression. Fixes: 3aedb7fcd88a ("tick/sched: Remove useless oneshot ifdeffery") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240228123850.3499024-1-arnd@kernel.org	2024-02-29 17:41:29 +01:00
Anna-Maria Behnsen	8b3843ae36	vdso/datapage: Quick fix - use asm/page-def.h for ARM64 The vdso rework for the generic union vdso_data_store broke compat VDSO on arm64: In file included from arch/arm64/include/asm/lse.h:5, from arch/arm64/include/asm/cmpxchg.h:14, from arch/arm64/include/asm/atomic.h:16, from include/linux/atomic.h:7, from include/asm-generic/bitops/atomic.h:5, from arch/arm64/include/asm/bitops.h:25, from include/linux/bitops.h:68, from arch/arm64/include/asm/memory.h:209, from arch/arm64/include/asm/page.h:46, from include/vdso/datapage.h:22, from lib/vdso/gettimeofday.c:5, from <command-line>: arch/arm64/include/asm/atomic_ll_sc.h:298:9: error: unknown type name 'u128' 298 \| u128 full; \| ^~~~ arch/arm64/include/asm/atomic_ll_sc.h:305:24: error: unknown type name 'u128' 305 \| static __always_inline u128 \ \| The reason is the include of asm/page.h which in turn includes headers which are outside the scope of compat VDSO. The only reason for the asm/page.h include is the required definition of PAGE_SIZE. But as arm64 defines PAGE_SIZE in asm/page-def.h without extra header includes, this could be used instead. Caution: this is a quick fix only! The final fix is an upcoming cleanup of Arnd which consolidates PAGE_SIZE definition. After the cleanup, the include of asm/page.h to access PAGE_SIZE is no longer required. Fixes: a0d2fcd62ac2 ("vdso/ARM: Make union vdso_data_store available for all architectures") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240226175023.56679-1-anna-maria@linutronix.de Link: https://lore.kernel.org/lkml/CA+G9fYtrXXm_KO9fNPz3XaRxHV7UD_yQp-TEuPQrNRHU+_0W_Q@mail.gmail.com/	2024-02-26 23:13:41 +01:00
Frederic Weisbecker	19b344a91f	timers: Assert no next dyntick timer look-up while CPU is offline The next timer (re-)evaluation, with the purpose of entering/updating the dyntick mode, can happen from 3 sites and none of them are relevant while the CPU is offline: 1) The idle loop: a) From the quick check helping the cpuidle governor to heuristically predict the best C-state. b) While stopping the tick. But if the CPU is offline, the tick has been cancelled and there is consequently no need to further stop the tick. 2) Remote expiry: when a CPU remotely expires global timers on behalf of another CPU, the latter target's next timer is re-evaluated afterwards. However remote expîry doesn't happen on offline CPUs. 3) IRQ exit: on nohz_full mode, the tick is (re-)evaluated on IRQ exit. But full dynticks is disabled on offline CPUs. Therefore it is safe to assume that no next dyntick timer lookup can be performed on offline CPUs. Assert this expectation to report any surprise. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-17-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	500f8f9bce	tick: Assume timekeeping is correctly handed over upon last offline idle call The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle. Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state. Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-16-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	3f69d04e14	tick: Shut down low-res tick from dying CPU The timekeeping duty is handed over from the outgoing CPU within stop machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in high-res mode. However in low-res dynticks mode, the tick isn't cancelled until the clockevent is shut down, which can happen later. The tick may therefore fire again once IRQs are re-enabled on stop machine and until IRQs are disabled for good upon the last call to idle. That's so many opportunities for a timekeeper to go idle and the outgoing CPU to take over that duty. This is why tick_nohz_idle_stop_tick() is called one last time on idle if the CPU is seen offline: so that the timekeeping duty is handed over again in case the CPU has re-taken the duty. This means there are two timekeeping handovers on CPU down hotplug with different undocumented constraints and purposes: 1) A handover on stop machine for !dynticks \|\| highres. All online CPUs are guaranteed to be non-idle and the timekeeping duty can be safely handed-over. The hrtimer tick is cancelled so it is guaranteed that in dynticks mode the outgoing CPU won't take again the duty. 2) A handover on last idle call for dynticks && lowres. Setting the duty to TICK_DO_TIMER_NONE makes sure that a CPU will take over the timekeeping. Prepare for consolidating the handover to a single place (the first one) with shutting down the low-res tick as well from tick_cancel_sched_timer() as well. This will simplify the handover and unify the tick cancellation between high-res and low-res. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-15-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	7988e5ae2b	tick: Split nohz and highres features from nohz_mode The nohz mode field tells about low resolution nohz mode or high resolution nohz mode but it doesn't tell about high resolution non-nohz mode. In order to retrieve the latter state, tick_cancel_sched_timer() must fiddle with struct hrtimer's internals to guess if the tick has been initialized in high resolution. Move instead the nohz mode field information into the tick flags and provide two new bits: one to know if the tick is in nohz mode and another one to know if the tick is in high resolution. The combination of those two flags provides all the needed informations to determine which of the three tick modes is running. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-14-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	a478ffb2ae	tick: Move individual bit features to debuggable mask accesses The individual bitfields of struct tick_sched must be modified from IRQs disabled places, otherwise local modifications can race due to them sharing the same memory storage. The recent move of the "got_idle_tick" bitfield to its own storage shows that the use of these bitfields, as pretty as they look, can be as much error prone. In order to avoid future issues of the like and make sure that those bitfields are safely accessed, move those flags to an explicit mask along with a mutator function performing the basic IRQs disabled sanity check. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-13-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	3ce74f1a85	tick: Move got_idle_tick away from common flags tick_nohz_idle_got_tick() is called by cpuidle_reflect() within the idle loop with interrupts enabled. This function modifies the struct tick_sched's bitfield "got_idle_tick". However this bitfield is stored within the same mask as other bitfields that can be modified from interrupts. Fortunately so far it looks like the only race that can happen is while writing ->got_idle_tick to 0, an interrupt fires and writes the ->idle_active field to 0. It's then possible that the interrupted write to ->got_idle_tick writes back the old value of ->idle_active back to 1. However if that happens, the worst possible outcome is that the time spent between that interrupt and the upcoming call to tick_nohz_idle_exit() is accounted as idle, which is negligible quantity. Still all the bitfield writes within this struct tick_sched's shadow mask should be IRQ-safe. Therefore move this bitfield out to its own storage to avoid further suprises. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-12-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	d9b1865c86	tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode The full-nohz update function checks if the nohz mode is active before proceeding. It considers one exception though: if the tick is already stopped even though the nohz mode is inactive, it still moves on in order to update/restart the tick if needed. However in order for the tick to be stopped, the nohz_mode has to be either NOHZ_MODE_LOWRES or NOHZ_MODE_HIGHRES. Therefore it doesn't make sense to test if the tick is stopped before verifying NOHZ_MODE_INACTIVE mode. Remove the needless related condition. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-11-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	ef8969bb55	tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU. However the tick broadcast is a midware between the tick callback and the clocksource, therefore it makes more sense to shut it down after the tick callback and before the clocksource drivers. Move it instead to the common tick shutdown CPU hotplug state where related operations can be ordered from highest to lowest level. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-10-frederic@kernel.org	2024-02-26 11:37:32 +01:00
Frederic Weisbecker	f04e51220a	tick: Move tick cancellation up to CPUHP_AP_TICK_DYING The tick hrtimer is cancelled right before hrtimers are migrated. This is done from the hrtimer subsystem even though it shouldn't know about its actual users. Move instead the tick hrtimer cancellation to the relevant CPU hotplug state that aims at centralizing high level tick shutdown operations so that the related flow is easy to follow. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-9-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Frederic Weisbecker	3ad6eb0683	tick: Start centralizing tick related CPU hotplug operations During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks on stop machine, sometimes through explicit calls, sometimes from the control CPU after the CPU died. The reason why these shutdown operations are spread around is not always clear and it makes the tick lifecycle hard to follow. The tick should be shut down in order from highest to lowest level: On stop machine from the dying CPU (high-level): 1) Hand-over the timekeeping duty (tick_handover_do_timer()) 2) Cancel the tick implementation called by the clockevent callback (tick_cancel_sched_timer()) 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline()) On stop machine from the dying CPU (low-level): 4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states) From the control CPU after the CPU died (low-level): 5) Shutdown/unregister/cleanup clockevents for the dead CPU (tick_cleanup_dead_cpu()) Instead the current order is 2, 4 (both from CPU hotplug states), then 1 and 3 through direct calls. This layout and order don't make much sense. The operations 1, 2, 3 should be gathered together and in order. Sort this situation with creating a new TICK shut-down CPU hotplug state and start with introducing the timekeeping duty hand-over there. The state must precede hrtimers migration because the tick hrtimer will be stopped from it in a further patch. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-8-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Frederic Weisbecker	60313c21c3	tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick() The tick sched structure is already cleared from tick_cancel_sched_timer(), so there is no need to clear that field again. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-7-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Frederic Weisbecker	3650f49bfb	tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick() tick_nohz_stop_sched_tick() is only about NOHZ_full and not about dynticks-idle. Reflect that in the function name to avoid confusion. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-6-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Frederic Weisbecker	27dc08096c	tick: Use IS_ENABLED() whenever possible Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-5-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Frederic Weisbecker	3aedb7fcd8	tick/sched: Remove useless oneshot ifdeffery tick-sched.c is only built when CONFIG_TICK_ONESHOT=y, which is selected only if CONFIG_NO_HZ_COMMON=y or CONFIG_HIGH_RES_TIMERS=y. Therefore the related ifdeferry in this file is needless and can be removed. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-4-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Peng Liu	37263ba0c4	tick/nohz: Remove duplicate between lowres and highres handlers tick_nohz_lowres_handler() does the same work as tick_nohz_highres_handler() plus the clockevent device reprogramming, so make the former reuse the latter and rename it accordingly. Signed-off-by: Peng Liu <liupeng17@lenovo.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-3-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Peng Liu	ffb7e01c4e	tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer() The ts->sched_timer initialization work of tick_nohz_switch_to_nohz() is almost the same as that of tick_setup_sched_timer(), so adjust the latter to get it reused by tick_nohz_switch_to_nohz(). This also makes the low resolution mode sched_timer benefit from the tick skew boot option. Signed-off-by: Peng Liu <liupeng17@lenovo.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240225225508.11587-2-frederic@kernel.org	2024-02-26 11:37:31 +01:00
Costa Shulyupin	56c2cb1012	hrtimer: Select housekeeping CPU during migration During CPU-down hotplug, hrtimers may migrate to isolated CPUs, compromising CPU isolation. Address this issue by masking valid CPUs for hrtimers using housekeeping_cpumask(HK_TYPE_TIMER). Suggested-by: Waiman Long <longman@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20240222200856.569036-1-costa.shul@redhat.com	2024-02-22 22:18:21 +01:00
Anna-Maria Behnsen	b2cf7507e1	timers: Always queue timers on the local CPU The timer pull model is in place so we can remove the heuristics which try to guess the best target CPU at enqueue/modification time. All non pinned timers are queued on the local CPU in the separate storage and eventually pulled at expiry time to a remote CPU. Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-21-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Anna-Maria Behnsen	36e40df35d	timer_migration: Add tracepoints The timer pull logic needs proper debugging aids. Add tracepoints so the hierarchical idle machinery can be diagnosed. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240222103403.31923-1-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Anna-Maria Behnsen	7ee9887703	timers: Implement the hierarchical pull model Placing timers at enqueue time on a target CPU based on dubious heuristics does not make any sense: 1) Most timer wheel timers are canceled or rearmed before they expire. 2) The heuristics to predict which CPU will be busy when the timer expires are wrong by definition. So placing the timers at enqueue wastes precious cycles. The proper solution to this problem is to always queue the timers on the local CPU and allow the non pinned timers to be pulled onto a busy CPU at expiry time. Therefore split the timer storage into local pinned and global timers: Local pinned timers are always expired on the CPU on which they have been queued. Global timers can be expired on any CPU. As long as a CPU is busy it expires both local and global timers. When a CPU goes idle it arms for the first expiring local timer. If the first expiring pinned (local) timer is before the first expiring movable timer, then no action is required because the CPU will wake up before the first movable timer expires. If the first expiring movable timer is before the first expiring pinned (local) timer, then this timer is queued into an idle timerqueue and eventually expired by another active CPU. To avoid global locking the timerqueues are implemented as a hierarchy. The lowest level of the hierarchy holds the CPUs. The CPUs are associated to groups of 8, which are separated per node. If more than one CPU group exist, then a second level in the hierarchy collects the groups. Depending on the size of the system more than 2 levels are required. Each group has a "migrator" which checks the timerqueue during the tick for remote expirable timers. If the last CPU in a group goes idle it reports the first expiring event in the group up to the next group(s) in the hierarchy. If the last CPU goes idle it arms its timer for the first system wide expiring timer to ensure that no timer event is missed. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240222103710.32582-1-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Anna-Maria Behnsen	57e95a5c41	timers: Introduce function to check timer base is_idle flag To prepare for the conversion of the NOHZ timer placement to a pull at expiry time model it's required to have a function that returns the value of the is_idle flag of the timer base to keep the hierarchy states during online in sync with timer base state. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-18-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Richard Cochran (linutronix GmbH)	4c532939aa	tick/sched: Split out jiffies update helper function The logic to get the time of the last jiffies update will be needed by the timer pull model as well. Move the code into a global function in anticipation of the new caller. No functional change. Signed-off-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-17-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Anna-Maria Behnsen	89f01e10c9	timers: Check if timers base is handled already Due to the conversion of the NOHZ timer placement to a pull at expiry time model, the per CPU timer bases with non pinned timers are no longer handled only by the local CPU. In case a remote CPU already expires the non pinned timers base of the local CPU, nothing more needs to be done by the local CPU. A check at the begin of the expire timers routine is required, because timer base lock is dropped before executing the timer callback function. This is a preparatory work, but has no functional impact right now. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-16-anna-maria@linutronix.de	2024-02-22 17:52:32 +01:00
Richard Cochran (linutronix GmbH)	90f5df66c8	timers: Restructure internal locking Move the locking out from __run_timers() to the call sites, so the protected section can be extended at the call site. Preparatory work for changing the NOHZ timer placement to a pull at expiry time model. No functional change. Signed-off-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-15-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	f73d9257ff	timers: Add get next timer interrupt functionality for remote CPUs To prepare for the conversion of the NOHZ timer placement to a pull at expiry time model it's required to have functionality available getting the next timer interrupt on a remote CPU. Locking of the timer bases and getting the information for the next timer interrupt functionality is split into separate functions. This is required to be compliant with lock ordering when the new model is in place. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-14-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	70b4cf84f3	timers: Split out "get next timer interrupt" functionality The functionality for getting the next timer interrupt in get_next_timer_interrupt() is split into a separate function fetch_next_timer_interrupt() to be usable by other call sites. This is preparatory work for the conversion of the NOHZ timer placement to a pull at expiry time model. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-13-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	21927fc89e	timers: Retrieve next expiry of pinned/non-pinned timers separately For the conversion of the NOHZ timer placement to a pull at expiry time model it's required to have separate expiry times for the pinned and the non-pinned (movable) timers. Therefore struct timer_events is introduced. No functional change Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-12-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	83a665dc99	timers: Keep the pinned timers separate from the others Separate the storage space for pinned timers. Deferrable timers (doesn't matter if pinned or non pinned) are still enqueued into their own base. This is preparatory work for changing the NOHZ timer placement from a push at enqueue time to a pull at expiry time model. Originally-by: Richard Cochran (linutronix GmbH) <richardcochran@gmail.com> Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-11-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	9f6a3c602c	timers: Split next timer interrupt logic Split the logic for getting next timer interrupt (no matter of recalculated or already stored in base->next_expiry) into a separate function named next_timer_interrupt(). Make it available to local call sites only. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-10-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	af68cb3fc7	timers: Simplify code in run_local_timers() The logic for raising a softirq the way it is implemented right now, is readable for two timer bases. When increasing the number of timer bases, code gets harder to read. With the introduction of the timer migration hierarchy, there will be three timer bases. Therefore restructure the code to use a loop. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-9-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	aae55e9fb8	timers: Make sure TIMER_PINNED flag is set in add_timer_on() When adding a timer to the timer wheel using add_timer_on(), it is an implicitly pinned timer. With the timer pull at expiry time model in place, the TIMER_PINNED flag is required to make sure timers end up in proper base. Set the TIMER_PINNED flag unconditionally when add_timer_on() is executed. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-8-anna-maria@linutronix.de	2024-02-22 17:52:31 +01:00
Anna-Maria Behnsen	c0e8c5b599	workqueue: Use global variant for add_timer() The implementation of the NOHZ pull at expiry model will change the timer bases per CPU. Timers, that have to expire on a specific CPU, require the TIMER_PINNED flag. If the CPU doesn't matter, the TIMER_PINNED flag must be dropped. This is required for call sites which use the timer alternately as pinned and not pinned timer like workqueues do. Therefore use add_timer_global() in __queue_delayed_work() for non-bound delayed work to make sure the TIMER_PINNED flag is dropped. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-7-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Anna-Maria Behnsen	8e7e247f64	timers: Introduce add_timer() variants which modify timer flags A timer might be used as a pinned timer (using add_timer_on()) and later on as non-pinned timer using add_timer(). When the "NOHZ timer pull at expiry model" is in place, the TIMER_PINNED flag is required to be used whenever a timer needs to expire on a dedicated CPU. Otherwise the flag must not be set if expiration on a dedicated CPU is not required. add_timer_on()'s behavior will be changed during the preparation patches for the "NOHZ timer pull at expiry model" to unconditionally set the TIMER_PINNED flag. To be able to clear/ set the flag when queueing a timer, two variants of add_timer() are introduced. This is a preparatory step and has no functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-6-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Anna-Maria Behnsen	73129cf4b6	timers: Optimization for timer_base_try_to_set_idle() When tick is stopped also the timer base is_idle flag is set. When reentering timer_base_try_to_set_idle() with the tick stopped, there is no need to check whether the timer base needs to be set idle again. When a timer was enqueued in the meantime, this is already handled by the tick_nohz_next_event() call which was executed before tick_nohz_stop_tick(). Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-5-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Anna-Maria Behnsen	e2e1d724e9	timers: Move marking timer bases idle into tick_nohz_stop_tick() The timer base is marked idle when get_next_timer_interrupt() is executed. But the decision whether the tick will be stopped and whether the system is able to go idle is done later. When the timer bases is marked idle and a new first timer is enqueued remote an IPI is raised. Even if it is not required because the tick is not stopped and the timer base is evaluated again at the next tick. To prevent this, the timer base is marked idle in tick_nohz_stop_tick() and get_next_timer_interrupt() is streamlined by only looking for the next timer interrupt. All other work is postponed to timer_base_try_to_set_idle() which is called by tick_nohz_stop_tick(). timer_base_try_to_set_idle() never resets timer_base::is_idle state. This is done when the tick is restarted via tick_nohz_restart_sched_tick(). With this, tick_sched::tick_stopped and timer_base::is_idle are always in sync. So there is no longer the need to execute timer_clear_idle() in tick_nohz_idle_retain_tick(). This was required before, as tick_nohz_next_event() set timer_base::is_idle even if the tick would not be stopped. So timer_clear_idle() is only executed, when timer base is idle. So the check whether timer base is idle, is now no longer required as well. While at it fix some nearby whitespace damage as well. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-4-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Anna-Maria Behnsen	39ed699fb6	timers: Split out get next timer interrupt Split out get_next_timer_interrupt() to be able to extend it and make it reusable for other call sites. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-3-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Anna-Maria Behnsen	bebed6649e	timers: Restructure get_next_timer_interrupt() get_next_timer_interrupt() contains two parts for the next timer interrupt calculation. Those two parts are separated by forwarding the base clock. But the second part does not depend on the forwarded base clock. Therefore restructure get_next_timer_interrupt() to keep things together which belong together. No functional change. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20240221090548.36600-2-anna-maria@linutronix.de	2024-02-22 17:52:30 +01:00
Feng Tang	2ed08e4bc5	clocksource: Scale the watchdog read retries automatically On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com	2024-02-21 12:00:42 +01:00
David Gow	e0a1284b29	time/kunit: Use correct format specifier 'days' is a s64 (from div_s64), and so should use a %lld specifier. This was found by extending KUnit's assertion macros to use gcc's __printf attribute. Fixes: 276010551664 ("time: Improve performance of time64_to_tm()") Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240221092728.1281499-5-davidgow@google.com	2024-02-21 12:00:42 +01:00
Anna-Maria Behnsen	56145a0f84	csky/vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in the vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240219153939.75719-11-anna-maria@linutronix.de	2024-02-20 20:56:01 +01:00
Anna-Maria Behnsen	d697a9997a	MIPS: vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in the vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240219153939.75719-10-anna-maria@linutronix.de	2024-02-20 20:56:01 +01:00
Anna-Maria Behnsen	8d87d2cd1d	LoongArch: vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240219153939.75719-9-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen	cb3444cfdb	s390/vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in the vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20240219153939.75719-8-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen	eba755314f	riscv: vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in the vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240220085212.6547-1-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen	d0fba04847	arm64: vdso: Use generic union vdso_data_store There is already a generic union definition for vdso_data_store in vdso datapage header. Use this definition to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240219153939.75719-6-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen	a0d2fcd62a	vdso/ARM: Make union vdso_data_store available for all architectures The vDSO data page "union vdso_data_store" is defined in an ARM specific header file and also defined in several other places. Move the definition from the ARM header file into the generic vdso datapage header to make it also usable for others and to prevent code duplication. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240219153939.75719-5-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00
Anna-Maria Behnsen	4eb0833d7d	csky/vdso: Remove superfluous ifdeffery CSKY selects GENERIC_TIME_VSYSCALL. GENERIC_TIME_VSYSCALL dependent ifdeffery is superfluous. Clean it up. Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240219153939.75719-4-anna-maria@linutronix.de	2024-02-20 20:56:00 +01:00

1 2 3 4 5 ...

1250421 Commits