Locking changes for this cycle were:

- rtmutex cleanup & spring cleaning pass that removes ~400 lines of code
  - Futex simplifications & cleanups
  - Add debugging to the CSD code, to help track down a tenacious race (or hw problem)
  - Add lockdep_assert_not_held(), to allow code to require a lock to not be held,
    and propagate this into the ath10k driver
  - Misc LKMM documentation updates
  - Misc KCSAN updates: cleanups & documentation updates
  - Misc fixes and cleanups
  - Fix locktorture bugs with ww_mutexes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJDn0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hPrRAAryS4zPnuDsfkVk0smxo7a0lK5ljbH2Xo
 28QUZXOl6upnEV8dzbjwG7eAjt5ZJVI5tKIeG0PV0NUJH2nsyHwESdtULGGYuPf/
 4YUzNwZJa+nI/jeBnVsXCimLVxxnNCRdR7yOVOHm4ukEwa+YTNt1pvlYRmUd4YyH
 Q5cCrpb3THvLka3AAamEbqnHnAdGxHKuuHYVRkODpMQ+zrQvtN8antYsuk8kJsqM
 m+GZg/dVCuLEPah5k+lOACtcq/w7HCmTlxS8t4XLvD52jywFZLcCPvi1rk0+JR+k
 Vd9TngC09GJ4jXuDpr42YKkU9/X6qy2Es39iA/ozCvc1Alrhspx/59XmaVSuWQGo
 XYuEPx38Yuo/6w16haSgp0k4WSay15A4uhCTQ75VF4vli8Bqgg9PaxLyQH1uG8e2
 xk8U90R7bDzLlhKYIx1Vu5Z0t7A1JtB5CJtgpcfg/zQLlzygo75fHzdAiU5fDBDm
 3QQXSU2Oqzt7c5ZypioHWazARk7tL6th38KGN1gZDTm5zwifpaCtHi7sml6hhZ/4
 ATH6zEPzIbXJL2UqumSli6H4ye5ORNjOu32r7YPqLI4IDbzpssfoSwfKYlQG4Tvn
 4H1Ukirzni0gz5+wbleItzf2aeo1rocs4YQTnaT02j8NmUHUz4AzOHGOQFr5Tvh0
 wk/P4MIoSb0=
 =cOOk
 -----END PGP SIGNATURE-----

Merge tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

 - rtmutex cleanup & spring cleaning pass that removes ~400 lines of
   code

 - Futex simplifications & cleanups

 - Add debugging to the CSD code, to help track down a tenacious race
   (or hw problem)

 - Add lockdep_assert_not_held(), to allow code to require a lock to not
   be held, and propagate this into the ath10k driver

 - Misc LKMM documentation updates

 - Misc KCSAN updates: cleanups & documentation updates

 - Misc fixes and cleanups

 - Fix locktorture bugs with ww_mutexes

* tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  kcsan: Fix printk format string
  static_call: Relax static_call_update() function argument type
  static_call: Fix unused variable warn w/o MODULE
  locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()
  locking/rtmutex: Restrict the trylock WARN_ON() to debug
  locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()
  locking/rtmutex: Consolidate the fast/slowpath invocation
  locking/rtmutex: Make text section and inlining consistent
  locking/rtmutex: Move debug functions as inlines into common header
  locking/rtmutex: Decrapify __rt_mutex_init()
  locking/rtmutex: Remove pointless CONFIG_RT_MUTEXES=n stubs
  locking/rtmutex: Inline chainwalk depth check
  locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.c
  locking/rtmutex: Remove empty and unused debug stubs
  locking/rtmutex: Consolidate rt_mutex_init()
  locking/rtmutex: Remove output from deadlock detector
  locking/rtmutex: Remove rtmutex deadlock tester leftovers
  locking/rtmutex: Remove rt_mutex_timed_lock()
  MAINTAINERS: Add myself as futex reviewer
  locking/mutex: Remove repeated declaration
  ...
This commit is contained in:
Linus Torvalds 2021-04-28 12:37:53 -07:00
commit 0ff0edb550
44 changed files with 1248 additions and 829 deletions

View File

@ -782,6 +782,16 @@
cs89x0_media= [HW,NET]
Format: { rj45 | aui | bnc }
csdlock_debug= [KNL] Enable debug add-ons of cross-CPU function call
handling. When switched on, additional debug data is
printed to the console in case a hanging CPU is
detected, and that CPU is pinged again in order to try
to resolve the hang situation.
0: disable csdlock debugging (default)
1: enable basic csdlock debugging (minor impact)
ext: enable extended csdlock debugging (more impact,
but more data)
dasd= [HW,NET]
See header of drivers/s390/block/dasd_devmap.c.

View File

@ -1,3 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
.. Copyright (C) 2019, Google LLC.
The Kernel Concurrency Sanitizer (KCSAN)
========================================

View File

@ -7452,6 +7452,7 @@ M: Thomas Gleixner <tglx@linutronix.de>
M: Ingo Molnar <mingo@redhat.com>
R: Peter Zijlstra <peterz@infradead.org>
R: Darren Hart <dvhart@infradead.org>
R: Davidlohr Bueso <dave@stgolabs.net>
L: linux-kernel@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core

View File

@ -22,7 +22,7 @@
* assembler to insert a extra (16-bit) IT instruction, depending on the
* presence or absence of neighbouring conditional instructions.
*
* To avoid this unpredictableness, an approprite IT is inserted explicitly:
* To avoid this unpredictability, an appropriate IT is inserted explicitly:
* the assembler won't change IT instructions which are explicitly present
* in the input.
*/

View File

@ -14,7 +14,7 @@
#include <linux/stringify.h>
#include <linux/types.h>
static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
{
asm_volatile_goto("1:"
".byte " __stringify(BYTES_NOP5) "\n\t"
@ -30,7 +30,7 @@ l_yes:
return true;
}
static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
{
asm_volatile_goto("1:"
".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"

View File

@ -4727,6 +4727,8 @@ out:
/* Must not be called with conf_mutex held as workers can use that also. */
void ath10k_drain_tx(struct ath10k *ar)
{
lockdep_assert_not_held(&ar->conf_mutex);
/* make sure rcu-protected mac80211 tx path itself is drained */
synchronize_net();

View File

@ -1,4 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KCSAN access checks and modifiers. These can be used to explicitly check
* uninstrumented accesses, or change KCSAN checking behaviour of accesses.
*
* Copyright (C) 2019, Google LLC.
*/
#ifndef _LINUX_KCSAN_CHECKS_H
#define _LINUX_KCSAN_CHECKS_H

View File

@ -1,4 +1,11 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* The Kernel Concurrency Sanitizer (KCSAN) infrastructure. Public interface and
* data structures to set up runtime. See kcsan-checks.h for explicit checks and
* modifiers. For more info please see Documentation/dev-tools/kcsan.rst.
*
* Copyright (C) 2019, Google LLC.
*/
#ifndef _LINUX_KCSAN_H
#define _LINUX_KCSAN_H

View File

@ -155,7 +155,7 @@ extern void lockdep_set_selftest_task(struct task_struct *task);
extern void lockdep_init_task(struct task_struct *task);
/*
* Split the recrursion counter in two to readily detect 'off' vs recursion.
* Split the recursion counter in two to readily detect 'off' vs recursion.
*/
#define LOCKDEP_RECURSION_BITS 16
#define LOCKDEP_OFF (1U << LOCKDEP_RECURSION_BITS)
@ -268,6 +268,11 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
extern void lock_release(struct lockdep_map *lock, unsigned long ip);
/* lock_is_held_type() returns */
#define LOCK_STATE_UNKNOWN -1
#define LOCK_STATE_NOT_HELD 0
#define LOCK_STATE_HELD 1
/*
* Same "read" as for lock_acquire(), except -1 means any.
*/
@ -302,7 +307,13 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie);
#define lockdep_depth(tsk) (debug_locks ? (tsk)->lockdep_depth : 0)
#define lockdep_assert_held(l) do { \
WARN_ON(debug_locks && !lockdep_is_held(l)); \
WARN_ON(debug_locks && \
lockdep_is_held(l) == LOCK_STATE_NOT_HELD); \
} while (0)
#define lockdep_assert_not_held(l) do { \
WARN_ON(debug_locks && \
lockdep_is_held(l) == LOCK_STATE_HELD); \
} while (0)
#define lockdep_assert_held_write(l) do { \
@ -397,6 +408,7 @@ extern int lockdep_is_held(const void *);
#define lockdep_is_held_type(l, r) (1)
#define lockdep_assert_held(l) do { (void)(l); } while (0)
#define lockdep_assert_not_held(l) do { (void)(l); } while (0)
#define lockdep_assert_held_write(l) do { (void)(l); } while (0)
#define lockdep_assert_held_read(l) do { (void)(l); } while (0)
#define lockdep_assert_held_once(l) do { (void)(l); } while (0)

View File

@ -20,6 +20,7 @@
#include <linux/osq_lock.h>
#include <linux/debug_locks.h>
struct ww_class;
struct ww_acquire_ctx;
/*
@ -65,9 +66,6 @@ struct mutex {
#endif
};
struct ww_class;
struct ww_acquire_ctx;
struct ww_mutex {
struct mutex base;
struct ww_acquire_ctx *ctx;

View File

@ -31,12 +31,6 @@ struct rt_mutex {
raw_spinlock_t wait_lock;
struct rb_root_cached waiters;
struct task_struct *owner;
#ifdef CONFIG_DEBUG_RT_MUTEXES
int save_state;
const char *name, *file;
int line;
void *magic;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
@ -46,35 +40,17 @@ struct rt_mutex_waiter;
struct hrtimer_sleeper;
#ifdef CONFIG_DEBUG_RT_MUTEXES
extern int rt_mutex_debug_check_no_locks_freed(const void *from,
unsigned long len);
extern void rt_mutex_debug_check_no_locks_held(struct task_struct *task);
extern void rt_mutex_debug_task_free(struct task_struct *tsk);
#else
static inline int rt_mutex_debug_check_no_locks_freed(const void *from,
unsigned long len)
{
return 0;
}
# define rt_mutex_debug_check_no_locks_held(task) do { } while (0)
static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
#endif
#ifdef CONFIG_DEBUG_RT_MUTEXES
# define __DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
, .name = #mutexname, .file = __FILE__, .line = __LINE__
# define rt_mutex_init(mutex) \
#define rt_mutex_init(mutex) \
do { \
static struct lock_class_key __key; \
__rt_mutex_init(mutex, __func__, &__key); \
} while (0)
extern void rt_mutex_debug_task_free(struct task_struct *tsk);
#else
# define __DEBUG_RT_MUTEX_INITIALIZER(mutexname)
# define rt_mutex_init(mutex) __rt_mutex_init(mutex, NULL, NULL)
# define rt_mutex_debug_task_free(t) do { } while (0)
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname) \
, .dep_map = { .name = #mutexname }
@ -86,7 +62,6 @@ do { \
{ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
, .waiters = RB_ROOT_CACHED \
, .owner = NULL \
__DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
__DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)}
#define DEFINE_RT_MUTEX(mutexname) \
@ -104,7 +79,6 @@ static inline int rt_mutex_is_locked(struct rt_mutex *lock)
}
extern void __rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key);
extern void rt_mutex_destroy(struct rt_mutex *lock);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
extern void rt_mutex_lock_nested(struct rt_mutex *lock, unsigned int subclass);
@ -115,9 +89,6 @@ extern void rt_mutex_lock(struct rt_mutex *lock);
#endif
extern int rt_mutex_lock_interruptible(struct rt_mutex *lock);
extern int rt_mutex_timed_lock(struct rt_mutex *lock,
struct hrtimer_sleeper *timeout);
extern int rt_mutex_trylock(struct rt_mutex *lock);
extern void rt_mutex_unlock(struct rt_mutex *lock);

View File

@ -110,7 +110,7 @@ do { \
/*
* This is the same regardless of which rwsem implementation that is being used.
* It is just a heuristic meant to be called by somebody alreadying holding the
* It is just a heuristic meant to be called by somebody already holding the
* rwsem to see if somebody from an incompatible type is wanting access to the
* lock.
*/

View File

@ -118,9 +118,9 @@ extern void arch_static_call_transform(void *site, void *tramp, void *func, bool
#define static_call_update(name, func) \
({ \
BUILD_BUG_ON(!__same_type(*(func), STATIC_CALL_TRAMP(name))); \
typeof(&STATIC_CALL_TRAMP(name)) __F = (func); \
__static_call_update(&STATIC_CALL_KEY(name), \
STATIC_CALL_TRAMP_ADDR(name), func); \
STATIC_CALL_TRAMP_ADDR(name), __F); \
})
#define static_call_query(name) (READ_ONCE(STATIC_CALL_KEY(name).func))

View File

@ -48,39 +48,26 @@ struct ww_acquire_ctx {
#endif
};
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \
, .ww_class = class
#else
# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
#endif
#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
{ .stamp = ATOMIC_LONG_INIT(0) \
, .acquire_name = #ww_class "_acquire" \
, .mutex_name = #ww_class "_mutex" \
, .is_wait_die = _is_wait_die }
#define __WW_MUTEX_INITIALIZER(lockname, class) \
{ .base = __MUTEX_INITIALIZER(lockname.base) \
__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
#define DEFINE_WD_CLASS(classname) \
struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
#define DEFINE_WW_CLASS(classname) \
struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
#define DEFINE_WW_MUTEX(mutexname, ww_class) \
struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
/**
* ww_mutex_init - initialize the w/w mutex
* @lock: the mutex to be initialized
* @ww_class: the w/w class the mutex should belong to
*
* Initialize the w/w mutex to unlocked state and associate it with the given
* class.
* class. Static define macro for w/w mutex is not provided and this function
* is the only way to properly initialize the w/w mutex.
*
* It is not allowed to initialize an already locked mutex.
*/

View File

@ -981,6 +981,7 @@ static inline void exit_pi_state_list(struct task_struct *curr) { }
* p->pi_lock:
*
* p->pi_state_list -> pi_state->list, relation
* pi_mutex->owner -> pi_state->owner, relation
*
* pi_state->refcount:
*
@ -1494,13 +1495,14 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q)
static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state)
{
u32 curval, newval;
struct rt_mutex_waiter *top_waiter;
struct task_struct *new_owner;
bool postunlock = false;
DEFINE_WAKE_Q(wake_q);
int ret = 0;
new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
if (WARN_ON_ONCE(!new_owner)) {
top_waiter = rt_mutex_top_waiter(&pi_state->pi_mutex);
if (WARN_ON_ONCE(!top_waiter)) {
/*
* As per the comment in futex_unlock_pi() this should not happen.
*
@ -1513,6 +1515,8 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_
goto out_unlock;
}
new_owner = top_waiter->task;
/*
* We pass it to the next owner. The WAITERS bit is always kept
* enabled while there is PI state around. We cleanup the owner
@ -2315,19 +2319,15 @@ retry:
/*
* PI futexes can not be requeued and must remove themself from the
* hash bucket. The hash bucket lock (i.e. lock_ptr) is held on entry
* and dropped here.
* hash bucket. The hash bucket lock (i.e. lock_ptr) is held.
*/
static void unqueue_me_pi(struct futex_q *q)
__releases(q->lock_ptr)
{
__unqueue_futex(q);
BUG_ON(!q->pi_state);
put_pi_state(q->pi_state);
q->pi_state = NULL;
spin_unlock(q->lock_ptr);
}
static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
@ -2909,8 +2909,8 @@ no_block:
if (res)
ret = (res < 0) ? res : 0;
/* Unqueue and drop the lock */
unqueue_me_pi(&q);
spin_unlock(q.lock_ptr);
goto out;
out_unlock_put_key:
@ -3237,15 +3237,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
* reference count.
*/
/* Check if the requeue code acquired the second futex for us. */
if (!q.rt_waiter) {
/*
* Got the lock. We might not be the anticipated owner if we
* did a lock-steal - fix up the PI-state in that case.
* Check if the requeue code acquired the second futex for us and do
* any pertinent fixup.
*/
if (!q.rt_waiter) {
if (q.pi_state && (q.pi_state->owner != current)) {
spin_lock(q.lock_ptr);
ret = fixup_pi_state_owner(uaddr2, &q, current);
ret = fixup_owner(uaddr2, &q, true);
/*
* Drop the reference to the pi state which
* the requeue_pi() code acquired for us.
@ -3287,8 +3286,8 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
if (res)
ret = (res < 0) ? res : 0;
/* Unqueue and drop the lock. */
unqueue_me_pi(&q);
spin_unlock(q.lock_ptr);
}
if (ret == -EINTR) {

View File

@ -13,5 +13,5 @@ CFLAGS_core.o := $(call cc-option,-fno-conserve-stack) \
obj-y := core.o debugfs.o report.o
obj-$(CONFIG_KCSAN_SELFTEST) += selftest.o
CFLAGS_kcsan-test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
obj-$(CONFIG_KCSAN_TEST) += kcsan-test.o
CFLAGS_kcsan_test.o := $(CFLAGS_KCSAN) -g -fno-omit-frame-pointer
obj-$(CONFIG_KCSAN_KUNIT_TEST) += kcsan_test.o

View File

@ -1,4 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Rules for implicitly atomic memory accesses.
*
* Copyright (C) 2019, Google LLC.
*/
#ifndef _KERNEL_KCSAN_ATOMIC_H
#define _KERNEL_KCSAN_ATOMIC_H

View File

@ -1,4 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/*
* KCSAN core runtime.
*
* Copyright (C) 2019, Google LLC.
*/
#define pr_fmt(fmt) "kcsan: " fmt
@ -639,8 +644,6 @@ void __init kcsan_init(void)
BUG_ON(!in_task());
kcsan_debugfs_init();
for_each_possible_cpu(cpu)
per_cpu(kcsan_rand_state, cpu) = (u32)get_cycles();

View File

@ -1,4 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/*
* KCSAN debugfs interface.
*
* Copyright (C) 2019, Google LLC.
*/
#define pr_fmt(fmt) "kcsan: " fmt
@ -261,7 +266,9 @@ static const struct file_operations debugfs_ops =
.release = single_release
};
void __init kcsan_debugfs_init(void)
static void __init kcsan_debugfs_init(void)
{
debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
}
late_initcall(kcsan_debugfs_init);

View File

@ -1,4 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* KCSAN watchpoint encoding.
*
* Copyright (C) 2019, Google LLC.
*/
#ifndef _KERNEL_KCSAN_ENCODING_H
#define _KERNEL_KCSAN_ENCODING_H

View File

@ -1,8 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
* see Documentation/dev-tools/kcsan.rst.
*
* Copyright (C) 2019, Google LLC.
*/
#ifndef _KERNEL_KCSAN_KCSAN_H
@ -30,11 +31,6 @@ extern bool kcsan_enabled;
void kcsan_save_irqtrace(struct task_struct *task);
void kcsan_restore_irqtrace(struct task_struct *task);
/*
* Initialize debugfs file.
*/
void kcsan_debugfs_init(void);
/*
* Statistics counters displayed via debugfs; should only be modified in
* slow-paths.

View File

@ -13,6 +13,8 @@
* Author: Marco Elver <elver@google.com>
*/
#define pr_fmt(fmt) "kcsan_test: " fmt
#include <kunit/test.h>
#include <linux/jiffies.h>
#include <linux/kcsan-checks.h>
@ -951,22 +953,53 @@ static void test_atomic_builtins(struct kunit *test)
}
/*
* Each test case is run with different numbers of threads. Until KUnit supports
* passing arguments for each test case, we encode #threads in the test case
* name (read by get_num_threads()). [The '-' was chosen as a stylistic
* preference to separate test name and #threads.]
* Generate thread counts for all test cases. Values generated are in interval
* [2, 5] followed by exponentially increasing thread counts from 8 to 32.
*
* The thread counts are chosen to cover potentially interesting boundaries and
* corner cases (range 2-5), and then stress the system with larger counts.
* corner cases (2 to 5), and then stress the system with larger counts.
*/
#define KCSAN_KUNIT_CASE(test_name) \
{ .run_case = test_name, .name = #test_name "-02" }, \
{ .run_case = test_name, .name = #test_name "-03" }, \
{ .run_case = test_name, .name = #test_name "-04" }, \
{ .run_case = test_name, .name = #test_name "-05" }, \
{ .run_case = test_name, .name = #test_name "-08" }, \
{ .run_case = test_name, .name = #test_name "-16" }
static const void *nthreads_gen_params(const void *prev, char *desc)
{
long nthreads = (long)prev;
if (nthreads < 0 || nthreads >= 32)
nthreads = 0; /* stop */
else if (!nthreads)
nthreads = 2; /* initial value */
else if (nthreads < 5)
nthreads++;
else if (nthreads == 5)
nthreads = 8;
else
nthreads *= 2;
if (!IS_ENABLED(CONFIG_PREEMPT) || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) {
/*
* Without any preemption, keep 2 CPUs free for other tasks, one
* of which is the main test case function checking for
* completion or failure.
*/
const long min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 2 : 0;
const long min_required_cpus = 2 + min_unused_cpus;
if (num_online_cpus() < min_required_cpus) {
pr_err_once("Too few online CPUs (%u < %ld) for test\n",
num_online_cpus(), min_required_cpus);
nthreads = 0;
} else if (nthreads >= num_online_cpus() - min_unused_cpus) {
/* Use negative value to indicate last param. */
nthreads = -(num_online_cpus() - min_unused_cpus);
pr_warn_once("Limiting number of threads to %ld (only %d online CPUs)\n",
-nthreads, num_online_cpus());
}
}
snprintf(desc, KUNIT_PARAM_DESC_SIZE, "threads=%ld", abs(nthreads));
return (void *)nthreads;
}
#define KCSAN_KUNIT_CASE(test_name) KUNIT_CASE_PARAM(test_name, nthreads_gen_params)
static struct kunit_case kcsan_test_cases[] = {
KCSAN_KUNIT_CASE(test_basic),
KCSAN_KUNIT_CASE(test_concurrent_races),
@ -996,24 +1029,6 @@ static struct kunit_case kcsan_test_cases[] = {
/* ===== End test cases ===== */
/* Get number of threads encoded in test name. */
static bool __no_kcsan
get_num_threads(const char *test, int *nthreads)
{
int len = strlen(test);
if (WARN_ON(len < 3))
return false;
*nthreads = test[len - 1] - '0';
*nthreads += (test[len - 2] - '0') * 10;
if (WARN_ON(*nthreads < 0))
return false;
return true;
}
/* Concurrent accesses from interrupts. */
__no_kcsan
static void access_thread_timer(struct timer_list *timer)
@ -1076,9 +1091,6 @@ static int test_init(struct kunit *test)
if (!torture_init_begin((char *)test->name, 1))
return -EBUSY;
if (!get_num_threads(test->name, &nthreads))
goto err;
if (WARN_ON(threads))
goto err;
@ -1087,39 +1099,19 @@ static int test_init(struct kunit *test)
goto err;
}
if (!IS_ENABLED(CONFIG_PREEMPT) || !IS_ENABLED(CONFIG_KCSAN_INTERRUPT_WATCHER)) {
/*
* Without any preemption, keep 2 CPUs free for other tasks, one
* of which is the main test case function checking for
* completion or failure.
*/
const int min_unused_cpus = IS_ENABLED(CONFIG_PREEMPT_NONE) ? 2 : 0;
const int min_required_cpus = 2 + min_unused_cpus;
if (num_online_cpus() < min_required_cpus) {
pr_err("%s: too few online CPUs (%u < %d) for test",
test->name, num_online_cpus(), min_required_cpus);
nthreads = abs((long)test->param_value);
if (WARN_ON(!nthreads))
goto err;
} else if (nthreads > num_online_cpus() - min_unused_cpus) {
nthreads = num_online_cpus() - min_unused_cpus;
pr_warn("%s: limiting number of threads to %d\n",
test->name, nthreads);
}
}
if (nthreads) {
threads = kcalloc(nthreads + 1, sizeof(struct task_struct *),
GFP_KERNEL);
threads = kcalloc(nthreads + 1, sizeof(struct task_struct *), GFP_KERNEL);
if (WARN_ON(!threads))
goto err;
threads[nthreads] = NULL;
for (i = 0; i < nthreads; ++i) {
if (torture_create_kthread(access_thread, NULL,
threads[i]))
if (torture_create_kthread(access_thread, NULL, threads[i]))
goto err;
}
}
torture_init_end();
@ -1156,7 +1148,7 @@ static void test_exit(struct kunit *test)
}
static struct kunit_suite kcsan_test_suite = {
.name = "kcsan-test",
.name = "kcsan",
.test_cases = kcsan_test_cases,
.init = test_init,
.exit = test_exit,

View File

@ -1,4 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/*
* KCSAN reporting.
*
* Copyright (C) 2019, Google LLC.
*/
#include <linux/debug_locks.h>
#include <linux/delay.h>

View File

@ -1,4 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/*
* KCSAN short boot-time selftests.
*
* Copyright (C) 2019, Google LLC.
*/
#define pr_fmt(fmt) "kcsan: " fmt

View File

@ -12,7 +12,6 @@ ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_lockdep_proc.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE)
endif
obj-$(CONFIG_DEBUG_IRQFLAGS) += irqflag-debug.o
@ -26,7 +25,6 @@ obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
obj-$(CONFIG_DEBUG_RT_MUTEXES) += rtmutex-debug.o
obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock.o
obj-$(CONFIG_DEBUG_SPINLOCK) += spinlock_debug.o
obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o

View File

@ -54,6 +54,7 @@
#include <linux/nmi.h>
#include <linux/rcupdate.h>
#include <linux/kprobes.h>
#include <linux/lockdep.h>
#include <asm/sections.h>
@ -1747,7 +1748,7 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
/*
* Step 4: if not match, expand the path by adding the
* forward or backwards dependencis in the search
* forward or backwards dependencies in the search
*
*/
first = true;
@ -1916,7 +1917,7 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
* -> B is -(ER)-> or -(EN)->, then we don't need to add A -> B into the
* dependency graph, as any strong path ..-> A -> B ->.. we can get with
* having dependency A -> B, we could already get a equivalent path ..-> A ->
* .. -> B -> .. with A -> .. -> B. Therefore A -> B is reduntant.
* .. -> B -> .. with A -> .. -> B. Therefore A -> B is redundant.
*
* We need to make sure both the start and the end of A -> .. -> B is not
* weaker than A -> B. For the start part, please see the comment in
@ -5253,13 +5254,13 @@ int __lock_is_held(const struct lockdep_map *lock, int read)
if (match_held_lock(hlock, lock)) {
if (read == -1 || hlock->read == read)
return 1;
return LOCK_STATE_HELD;
return 0;
return LOCK_STATE_NOT_HELD;
}
}
return 0;
return LOCK_STATE_NOT_HELD;
}
static struct pin_cookie __lock_pin_lock(struct lockdep_map *lock)
@ -5538,10 +5539,14 @@ EXPORT_SYMBOL_GPL(lock_release);
noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
{
unsigned long flags;
int ret = 0;
int ret = LOCK_STATE_NOT_HELD;
/*
* Avoid false negative lockdep_assert_held() and
* lockdep_assert_not_held().
*/
if (unlikely(!lockdep_enabled()))
return 1; /* avoid false negative lockdep_assert_held() */
return LOCK_STATE_UNKNOWN;
raw_local_irq_save(flags);
check_flags(flags);

View File

@ -348,7 +348,7 @@ static int lockdep_stats_show(struct seq_file *m, void *v)
debug_locks);
/*
* Zappped classes and lockdep data buffers reuse statistics.
* Zapped classes and lockdep data buffers reuse statistics.
*/
seq_puts(m, "\n");
seq_printf(m, " zapped classes: %11lu\n",

View File

@ -76,13 +76,13 @@ static void lock_torture_cleanup(void);
struct lock_torture_ops {
void (*init)(void);
void (*exit)(void);
int (*writelock)(void);
int (*writelock)(int tid);
void (*write_delay)(struct torture_random_state *trsp);
void (*task_boost)(struct torture_random_state *trsp);
void (*writeunlock)(void);
int (*readlock)(void);
void (*writeunlock)(int tid);
int (*readlock)(int tid);
void (*read_delay)(struct torture_random_state *trsp);
void (*readunlock)(void);
void (*readunlock)(int tid);
unsigned long flags; /* for irq spinlocks */
const char *name;
@ -105,7 +105,7 @@ static struct lock_torture_cxt cxt = { 0, 0, false, false,
* Definitions for lock torture testing.
*/
static int torture_lock_busted_write_lock(void)
static int torture_lock_busted_write_lock(int tid __maybe_unused)
{
return 0; /* BUGGY, do not use in real life!!! */
}
@ -122,7 +122,7 @@ static void torture_lock_busted_write_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_lock_busted_write_unlock(void)
static void torture_lock_busted_write_unlock(int tid __maybe_unused)
{
/* BUGGY, do not use in real life!!! */
}
@ -145,7 +145,8 @@ static struct lock_torture_ops lock_busted_ops = {
static DEFINE_SPINLOCK(torture_spinlock);
static int torture_spin_lock_write_lock(void) __acquires(torture_spinlock)
static int torture_spin_lock_write_lock(int tid __maybe_unused)
__acquires(torture_spinlock)
{
spin_lock(&torture_spinlock);
return 0;
@ -169,7 +170,8 @@ static void torture_spin_lock_write_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_spin_lock_write_unlock(void) __releases(torture_spinlock)
static void torture_spin_lock_write_unlock(int tid __maybe_unused)
__releases(torture_spinlock)
{
spin_unlock(&torture_spinlock);
}
@ -185,7 +187,7 @@ static struct lock_torture_ops spin_lock_ops = {
.name = "spin_lock"
};
static int torture_spin_lock_write_lock_irq(void)
static int torture_spin_lock_write_lock_irq(int tid __maybe_unused)
__acquires(torture_spinlock)
{
unsigned long flags;
@ -195,7 +197,7 @@ __acquires(torture_spinlock)
return 0;
}
static void torture_lock_spin_write_unlock_irq(void)
static void torture_lock_spin_write_unlock_irq(int tid __maybe_unused)
__releases(torture_spinlock)
{
spin_unlock_irqrestore(&torture_spinlock, cxt.cur_ops->flags);
@ -214,7 +216,8 @@ static struct lock_torture_ops spin_lock_irq_ops = {
static DEFINE_RWLOCK(torture_rwlock);
static int torture_rwlock_write_lock(void) __acquires(torture_rwlock)
static int torture_rwlock_write_lock(int tid __maybe_unused)
__acquires(torture_rwlock)
{
write_lock(&torture_rwlock);
return 0;
@ -235,12 +238,14 @@ static void torture_rwlock_write_delay(struct torture_random_state *trsp)
udelay(shortdelay_us);
}
static void torture_rwlock_write_unlock(void) __releases(torture_rwlock)
static void torture_rwlock_write_unlock(int tid __maybe_unused)
__releases(torture_rwlock)
{
write_unlock(&torture_rwlock);
}
static int torture_rwlock_read_lock(void) __acquires(torture_rwlock)
static int torture_rwlock_read_lock(int tid __maybe_unused)
__acquires(torture_rwlock)
{
read_lock(&torture_rwlock);
return 0;
@ -261,7 +266,8 @@ static void torture_rwlock_read_delay(struct torture_random_state *trsp)
udelay(shortdelay_us);
}
static void torture_rwlock_read_unlock(void) __releases(torture_rwlock)
static void torture_rwlock_read_unlock(int tid __maybe_unused)
__releases(torture_rwlock)
{
read_unlock(&torture_rwlock);
}
@ -277,7 +283,8 @@ static struct lock_torture_ops rw_lock_ops = {
.name = "rw_lock"
};
static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock)
static int torture_rwlock_write_lock_irq(int tid __maybe_unused)
__acquires(torture_rwlock)
{
unsigned long flags;
@ -286,13 +293,14 @@ static int torture_rwlock_write_lock_irq(void) __acquires(torture_rwlock)
return 0;
}
static void torture_rwlock_write_unlock_irq(void)
static void torture_rwlock_write_unlock_irq(int tid __maybe_unused)
__releases(torture_rwlock)
{
write_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags);
}
static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock)
static int torture_rwlock_read_lock_irq(int tid __maybe_unused)
__acquires(torture_rwlock)
{
unsigned long flags;
@ -301,7 +309,7 @@ static int torture_rwlock_read_lock_irq(void) __acquires(torture_rwlock)
return 0;
}
static void torture_rwlock_read_unlock_irq(void)
static void torture_rwlock_read_unlock_irq(int tid __maybe_unused)
__releases(torture_rwlock)
{
read_unlock_irqrestore(&torture_rwlock, cxt.cur_ops->flags);
@ -320,7 +328,8 @@ static struct lock_torture_ops rw_lock_irq_ops = {
static DEFINE_MUTEX(torture_mutex);
static int torture_mutex_lock(void) __acquires(torture_mutex)
static int torture_mutex_lock(int tid __maybe_unused)
__acquires(torture_mutex)
{
mutex_lock(&torture_mutex);
return 0;
@ -340,7 +349,8 @@ static void torture_mutex_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_mutex_unlock(void) __releases(torture_mutex)
static void torture_mutex_unlock(int tid __maybe_unused)
__releases(torture_mutex)
{
mutex_unlock(&torture_mutex);
}
@ -357,12 +367,34 @@ static struct lock_torture_ops mutex_lock_ops = {
};
#include <linux/ww_mutex.h>
/*
* The torture ww_mutexes should belong to the same lock class as
* torture_ww_class to avoid lockdep problem. The ww_mutex_init()
* function is called for initialization to ensure that.
*/
static DEFINE_WD_CLASS(torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
static struct ww_mutex torture_ww_mutex_0, torture_ww_mutex_1, torture_ww_mutex_2;
static struct ww_acquire_ctx *ww_acquire_ctxs;
static int torture_ww_mutex_lock(void)
static void torture_ww_mutex_init(void)
{
ww_mutex_init(&torture_ww_mutex_0, &torture_ww_class);
ww_mutex_init(&torture_ww_mutex_1, &torture_ww_class);
ww_mutex_init(&torture_ww_mutex_2, &torture_ww_class);
ww_acquire_ctxs = kmalloc_array(cxt.nrealwriters_stress,
sizeof(*ww_acquire_ctxs),
GFP_KERNEL);
if (!ww_acquire_ctxs)
VERBOSE_TOROUT_STRING("ww_acquire_ctx: Out of memory");
}
static void torture_ww_mutex_exit(void)
{
kfree(ww_acquire_ctxs);
}
static int torture_ww_mutex_lock(int tid)
__acquires(torture_ww_mutex_0)
__acquires(torture_ww_mutex_1)
__acquires(torture_ww_mutex_2)
@ -372,7 +404,7 @@ __acquires(torture_ww_mutex_2)
struct list_head link;
struct ww_mutex *lock;
} locks[3], *ll, *ln;
struct ww_acquire_ctx ctx;
struct ww_acquire_ctx *ctx = &ww_acquire_ctxs[tid];
locks[0].lock = &torture_ww_mutex_0;
list_add(&locks[0].link, &list);
@ -383,12 +415,12 @@ __acquires(torture_ww_mutex_2)
locks[2].lock = &torture_ww_mutex_2;
list_add(&locks[2].link, &list);
ww_acquire_init(&ctx, &torture_ww_class);
ww_acquire_init(ctx, &torture_ww_class);
list_for_each_entry(ll, &list, link) {
int err;
err = ww_mutex_lock(ll->lock, &ctx);
err = ww_mutex_lock(ll->lock, ctx);
if (!err)
continue;
@ -399,25 +431,29 @@ __acquires(torture_ww_mutex_2)
if (err != -EDEADLK)
return err;
ww_mutex_lock_slow(ll->lock, &ctx);
ww_mutex_lock_slow(ll->lock, ctx);
list_move(&ll->link, &list);
}
ww_acquire_fini(&ctx);
return 0;
}
static void torture_ww_mutex_unlock(void)
static void torture_ww_mutex_unlock(int tid)
__releases(torture_ww_mutex_0)
__releases(torture_ww_mutex_1)
__releases(torture_ww_mutex_2)
{
struct ww_acquire_ctx *ctx = &ww_acquire_ctxs[tid];
ww_mutex_unlock(&torture_ww_mutex_0);
ww_mutex_unlock(&torture_ww_mutex_1);
ww_mutex_unlock(&torture_ww_mutex_2);
ww_acquire_fini(ctx);
}
static struct lock_torture_ops ww_mutex_lock_ops = {
.init = torture_ww_mutex_init,
.exit = torture_ww_mutex_exit,
.writelock = torture_ww_mutex_lock,
.write_delay = torture_mutex_delay,
.task_boost = torture_boost_dummy,
@ -431,7 +467,8 @@ static struct lock_torture_ops ww_mutex_lock_ops = {
#ifdef CONFIG_RT_MUTEXES
static DEFINE_RT_MUTEX(torture_rtmutex);
static int torture_rtmutex_lock(void) __acquires(torture_rtmutex)
static int torture_rtmutex_lock(int tid __maybe_unused)
__acquires(torture_rtmutex)
{
rt_mutex_lock(&torture_rtmutex);
return 0;
@ -487,7 +524,8 @@ static void torture_rtmutex_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_rtmutex_unlock(void) __releases(torture_rtmutex)
static void torture_rtmutex_unlock(int tid __maybe_unused)
__releases(torture_rtmutex)
{
rt_mutex_unlock(&torture_rtmutex);
}
@ -505,7 +543,8 @@ static struct lock_torture_ops rtmutex_lock_ops = {
#endif
static DECLARE_RWSEM(torture_rwsem);
static int torture_rwsem_down_write(void) __acquires(torture_rwsem)
static int torture_rwsem_down_write(int tid __maybe_unused)
__acquires(torture_rwsem)
{
down_write(&torture_rwsem);
return 0;
@ -525,12 +564,14 @@ static void torture_rwsem_write_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_rwsem_up_write(void) __releases(torture_rwsem)
static void torture_rwsem_up_write(int tid __maybe_unused)
__releases(torture_rwsem)
{
up_write(&torture_rwsem);
}
static int torture_rwsem_down_read(void) __acquires(torture_rwsem)
static int torture_rwsem_down_read(int tid __maybe_unused)
__acquires(torture_rwsem)
{
down_read(&torture_rwsem);
return 0;
@ -550,7 +591,8 @@ static void torture_rwsem_read_delay(struct torture_random_state *trsp)
torture_preempt_schedule(); /* Allow test to be preempted. */
}
static void torture_rwsem_up_read(void) __releases(torture_rwsem)
static void torture_rwsem_up_read(int tid __maybe_unused)
__releases(torture_rwsem)
{
up_read(&torture_rwsem);
}
@ -579,24 +621,28 @@ static void torture_percpu_rwsem_exit(void)
percpu_free_rwsem(&pcpu_rwsem);
}
static int torture_percpu_rwsem_down_write(void) __acquires(pcpu_rwsem)
static int torture_percpu_rwsem_down_write(int tid __maybe_unused)
__acquires(pcpu_rwsem)
{
percpu_down_write(&pcpu_rwsem);
return 0;
}
static void torture_percpu_rwsem_up_write(void) __releases(pcpu_rwsem)
static void torture_percpu_rwsem_up_write(int tid __maybe_unused)
__releases(pcpu_rwsem)
{
percpu_up_write(&pcpu_rwsem);
}
static int torture_percpu_rwsem_down_read(void) __acquires(pcpu_rwsem)
static int torture_percpu_rwsem_down_read(int tid __maybe_unused)
__acquires(pcpu_rwsem)
{
percpu_down_read(&pcpu_rwsem);
return 0;
}
static void torture_percpu_rwsem_up_read(void) __releases(pcpu_rwsem)
static void torture_percpu_rwsem_up_read(int tid __maybe_unused)
__releases(pcpu_rwsem)
{
percpu_up_read(&pcpu_rwsem);
}
@ -621,6 +667,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
static int lock_torture_writer(void *arg)
{
struct lock_stress_stats *lwsp = arg;
int tid = lwsp - cxt.lwsa;
DEFINE_TORTURE_RANDOM(rand);
VERBOSE_TOROUT_STRING("lock_torture_writer task started");
@ -631,7 +678,7 @@ static int lock_torture_writer(void *arg)
schedule_timeout_uninterruptible(1);
cxt.cur_ops->task_boost(&rand);
cxt.cur_ops->writelock();
cxt.cur_ops->writelock(tid);
if (WARN_ON_ONCE(lock_is_write_held))
lwsp->n_lock_fail++;
lock_is_write_held = true;
@ -642,7 +689,7 @@ static int lock_torture_writer(void *arg)
cxt.cur_ops->write_delay(&rand);
lock_is_write_held = false;
WRITE_ONCE(last_lock_release, jiffies);
cxt.cur_ops->writeunlock();
cxt.cur_ops->writeunlock(tid);
stutter_wait("lock_torture_writer");
} while (!torture_must_stop());
@ -659,6 +706,7 @@ static int lock_torture_writer(void *arg)
static int lock_torture_reader(void *arg)
{
struct lock_stress_stats *lrsp = arg;
int tid = lrsp - cxt.lrsa;
DEFINE_TORTURE_RANDOM(rand);
VERBOSE_TOROUT_STRING("lock_torture_reader task started");
@ -668,7 +716,7 @@ static int lock_torture_reader(void *arg)
if ((torture_random(&rand) & 0xfffff) == 0)
schedule_timeout_uninterruptible(1);
cxt.cur_ops->readlock();
cxt.cur_ops->readlock(tid);
lock_is_read_held = true;
if (WARN_ON_ONCE(lock_is_write_held))
lrsp->n_lock_fail++; /* rare, but... */
@ -676,7 +724,7 @@ static int lock_torture_reader(void *arg)
lrsp->n_lock_acquired++;
cxt.cur_ops->read_delay(&rand);
lock_is_read_held = false;
cxt.cur_ops->readunlock();
cxt.cur_ops->readunlock(tid);
stutter_wait("lock_torture_reader");
} while (!torture_must_stop());
@ -891,16 +939,16 @@ static int __init lock_torture_init(void)
goto unwind;
}
if (cxt.cur_ops->init) {
cxt.cur_ops->init();
cxt.init_called = true;
}
if (nwriters_stress >= 0)
cxt.nrealwriters_stress = nwriters_stress;
else
cxt.nrealwriters_stress = 2 * num_online_cpus();
if (cxt.cur_ops->init) {
cxt.cur_ops->init();
cxt.init_called = true;
}
#ifdef CONFIG_DEBUG_MUTEXES
if (str_has_prefix(torture_type, "mutex"))
cxt.debug_lock = true;

View File

@ -7,7 +7,7 @@
* The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spin-lock
* with the desirable properties of being fair, and with each cpu trying
* to acquire the lock spinning on a local variable.
* It avoids expensive cache bouncings that common test-and-set spin-lock
* It avoids expensive cache bounces that common test-and-set spin-lock
* implementations incur.
*/
#ifndef __LINUX_MCS_SPINLOCK_H

View File

@ -92,7 +92,7 @@ static inline unsigned long __owner_flags(unsigned long owner)
}
/*
* Trylock variant that retuns the owning task on failure.
* Trylock variant that returns the owning task on failure.
*/
static inline struct task_struct *__mutex_trylock_or_owner(struct mutex *lock)
{
@ -207,7 +207,7 @@ __mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
/*
* Give up ownership to a specific task, when @task = NULL, this is equivalent
* to a regular unlock. Sets PICKUP on a handoff, clears HANDOF, preserves
* to a regular unlock. Sets PICKUP on a handoff, clears HANDOFF, preserves
* WAITERS. Provides RELEASE semantics like a regular unlock, the
* __mutex_trylock() provides a matching ACQUIRE semantics for the handoff.
*/

View File

@ -135,7 +135,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
*/
/*
* Wait to acquire the lock or cancelation. Note that need_resched()
* Wait to acquire the lock or cancellation. Note that need_resched()
* will come with an IPI, which will wake smp_cond_load_relaxed() if it
* is implemented with a monitor-wait. vcpu_is_preempted() relies on
* polling, be careful.
@ -164,7 +164,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
/*
* We can only fail the cmpxchg() racing against an unlock(),
* in which case we should observe @node->locked becomming
* in which case we should observe @node->locked becoming
* true.
*/
if (smp_load_acquire(&node->locked))

View File

@ -1,182 +0,0 @@
// SPDX-License-Identifier: GPL-2.0
/*
* RT-Mutexes: blocking mutual exclusion locks with PI support
*
* started by Ingo Molnar and Thomas Gleixner:
*
* Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2006 Timesys Corp., Thomas Gleixner <tglx@timesys.com>
*
* This code is based on the rt.c implementation in the preempt-rt tree.
* Portions of said code are
*
* Copyright (C) 2004 LynuxWorks, Inc., Igor Manyilov, Bill Huey
* Copyright (C) 2006 Esben Nielsen
* Copyright (C) 2006 Kihon Technologies Inc.,
* Steven Rostedt <rostedt@goodmis.org>
*
* See rt.c in preempt-rt for proper credits and further information
*/
#include <linux/sched.h>
#include <linux/sched/rt.h>
#include <linux/sched/debug.h>
#include <linux/delay.h>
#include <linux/export.h>
#include <linux/spinlock.h>
#include <linux/kallsyms.h>
#include <linux/syscalls.h>
#include <linux/interrupt.h>
#include <linux/rbtree.h>
#include <linux/fs.h>
#include <linux/debug_locks.h>
#include "rtmutex_common.h"
static void printk_task(struct task_struct *p)
{
if (p)
printk("%16s:%5d [%p, %3d]", p->comm, task_pid_nr(p), p, p->prio);
else
printk("<none>");
}
static void printk_lock(struct rt_mutex *lock, int print_owner)
{
if (lock->name)
printk(" [%p] {%s}\n",
lock, lock->name);
else
printk(" [%p] {%s:%d}\n",
lock, lock->file, lock->line);
if (print_owner && rt_mutex_owner(lock)) {
printk(".. ->owner: %p\n", lock->owner);
printk(".. held by: ");
printk_task(rt_mutex_owner(lock));
printk("\n");
}
}
void rt_mutex_debug_task_free(struct task_struct *task)
{
DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters.rb_root));
DEBUG_LOCKS_WARN_ON(task->pi_blocked_on);
}
/*
* We fill out the fields in the waiter to store the information about
* the deadlock. We print when we return. act_waiter can be NULL in
* case of a remove waiter operation.
*/
void debug_rt_mutex_deadlock(enum rtmutex_chainwalk chwalk,
struct rt_mutex_waiter *act_waiter,
struct rt_mutex *lock)
{
struct task_struct *task;
if (!debug_locks || chwalk == RT_MUTEX_FULL_CHAINWALK || !act_waiter)
return;
task = rt_mutex_owner(act_waiter->lock);
if (task && task != current) {
act_waiter->deadlock_task_pid = get_pid(task_pid(task));
act_waiter->deadlock_lock = lock;
}
}
void debug_rt_mutex_print_deadlock(struct rt_mutex_waiter *waiter)
{
struct task_struct *task;
if (!waiter->deadlock_lock || !debug_locks)
return;
rcu_read_lock();
task = pid_task(waiter->deadlock_task_pid, PIDTYPE_PID);
if (!task) {
rcu_read_unlock();
return;
}
if (!debug_locks_off()) {
rcu_read_unlock();
return;
}
pr_warn("\n");
pr_warn("============================================\n");
pr_warn("WARNING: circular locking deadlock detected!\n");
pr_warn("%s\n", print_tainted());
pr_warn("--------------------------------------------\n");
printk("%s/%d is deadlocking current task %s/%d\n\n",
task->comm, task_pid_nr(task),
current->comm, task_pid_nr(current));
printk("\n1) %s/%d is trying to acquire this lock:\n",
current->comm, task_pid_nr(current));
printk_lock(waiter->lock, 1);
printk("\n2) %s/%d is blocked on this lock:\n",
task->comm, task_pid_nr(task));
printk_lock(waiter->deadlock_lock, 1);
debug_show_held_locks(current);
debug_show_held_locks(task);
printk("\n%s/%d's [blocked] stackdump:\n\n",
task->comm, task_pid_nr(task));
show_stack(task, NULL, KERN_DEFAULT);
printk("\n%s/%d's [current] stackdump:\n\n",
current->comm, task_pid_nr(current));
dump_stack();
debug_show_all_locks();
rcu_read_unlock();
printk("[ turning off deadlock detection."
"Please report this trace. ]\n\n");
}
void debug_rt_mutex_lock(struct rt_mutex *lock)
{
}
void debug_rt_mutex_unlock(struct rt_mutex *lock)
{
DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current);
}
void
debug_rt_mutex_proxy_lock(struct rt_mutex *lock, struct task_struct *powner)
{
}
void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock)
{
DEBUG_LOCKS_WARN_ON(!rt_mutex_owner(lock));
}
void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
{
memset(waiter, 0x11, sizeof(*waiter));
waiter->deadlock_task_pid = NULL;
}
void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter)
{
put_pid(waiter->deadlock_task_pid);
memset(waiter, 0x22, sizeof(*waiter));
}
void debug_rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key)
{
/*
* Make sure we are not reinitializing a held lock:
*/
debug_check_no_locks_freed((void *)lock, sizeof(*lock));
lock->name = name;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
lockdep_init_map(&lock->dep_map, name, key, 0);
#endif
}

View File

@ -1,37 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* RT-Mutexes: blocking mutual exclusion locks with PI support
*
* started by Ingo Molnar and Thomas Gleixner:
*
* Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com>
*
* This file contains macros used solely by rtmutex.c. Debug version.
*/
extern void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter);
extern void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter);
extern void debug_rt_mutex_init(struct rt_mutex *lock, const char *name, struct lock_class_key *key);
extern void debug_rt_mutex_lock(struct rt_mutex *lock);
extern void debug_rt_mutex_unlock(struct rt_mutex *lock);
extern void debug_rt_mutex_proxy_lock(struct rt_mutex *lock,
struct task_struct *powner);
extern void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock);
extern void debug_rt_mutex_deadlock(enum rtmutex_chainwalk chwalk,
struct rt_mutex_waiter *waiter,
struct rt_mutex *lock);
extern void debug_rt_mutex_print_deadlock(struct rt_mutex_waiter *waiter);
# define debug_rt_mutex_reset_waiter(w) \
do { (w)->deadlock_lock = NULL; } while (0)
static inline bool debug_rt_mutex_detect_deadlock(struct rt_mutex_waiter *waiter,
enum rtmutex_chainwalk walk)
{
return (waiter != NULL);
}
static inline void rt_mutex_print_deadlock(struct rt_mutex_waiter *w)
{
debug_rt_mutex_print_deadlock(w);
}

View File

@ -49,7 +49,7 @@
* set this bit before looking at the lock.
*/
static void
static __always_inline void
rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
@ -60,13 +60,13 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
WRITE_ONCE(lock->owner, (struct task_struct *)val);
}
static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
{
lock->owner = (struct task_struct *)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex *lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@ -149,7 +149,7 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
* all future threads that attempt to [Rmw] the lock to the slowpath. As such
* relaxed semantics suffice.
*/
static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
static __always_inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@ -165,7 +165,7 @@ static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
* 2) Drop lock->wait_lock
* 3) Try to unlock the lock with cmpxchg
*/
static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
static __always_inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
unsigned long flags)
__releases(lock->wait_lock)
{
@ -204,7 +204,7 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
# define rt_mutex_cmpxchg_acquire(l,c,n) (0)
# define rt_mutex_cmpxchg_release(l,c,n) (0)
static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
static __always_inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
{
lock->owner = (struct task_struct *)
((unsigned long)lock->owner | RT_MUTEX_HAS_WAITERS);
@ -213,7 +213,7 @@ static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
/*
* Simple slow path only version: lock->owner is protected by lock->wait_lock.
*/
static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
static __always_inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
unsigned long flags)
__releases(lock->wait_lock)
{
@ -229,8 +229,7 @@ static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock,
#define task_to_waiter(p) \
&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
static inline int
rt_mutex_waiter_less(struct rt_mutex_waiter *left,
static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,
struct rt_mutex_waiter *right)
{
if (left->prio < right->prio)
@ -248,8 +247,7 @@ rt_mutex_waiter_less(struct rt_mutex_waiter *left,
return 0;
}
static inline int
rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
struct rt_mutex_waiter *right)
{
if (left->prio != right->prio)
@ -270,18 +268,18 @@ rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
#define __node_2_waiter(node) \
rb_entry((node), struct rt_mutex_waiter, tree_entry)
static inline bool __waiter_less(struct rb_node *a, const struct rb_node *b)
static __always_inline bool __waiter_less(struct rb_node *a, const struct rb_node *b)
{
return rt_mutex_waiter_less(__node_2_waiter(a), __node_2_waiter(b));
}
static void
static __always_inline void
rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
{
rb_add_cached(&waiter->tree_entry, &lock->waiters, __waiter_less);
}
static void
static __always_inline void
rt_mutex_dequeue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
{
if (RB_EMPTY_NODE(&waiter->tree_entry))
@ -294,18 +292,19 @@ rt_mutex_dequeue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
#define __node_2_pi_waiter(node) \
rb_entry((node), struct rt_mutex_waiter, pi_tree_entry)
static inline bool __pi_waiter_less(struct rb_node *a, const struct rb_node *b)
static __always_inline bool
__pi_waiter_less(struct rb_node *a, const struct rb_node *b)
{
return rt_mutex_waiter_less(__node_2_pi_waiter(a), __node_2_pi_waiter(b));
}
static void
static __always_inline void
rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
{
rb_add_cached(&waiter->pi_tree_entry, &task->pi_waiters, __pi_waiter_less);
}
static void
static __always_inline void
rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
{
if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
@ -315,7 +314,7 @@ rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
RB_CLEAR_NODE(&waiter->pi_tree_entry);
}
static void rt_mutex_adjust_prio(struct task_struct *p)
static __always_inline void rt_mutex_adjust_prio(struct task_struct *p)
{
struct task_struct *pi_task = NULL;
@ -340,17 +339,13 @@ static void rt_mutex_adjust_prio(struct task_struct *p)
* deadlock detection is disabled independent of the detect argument
* and the config settings.
*/
static bool rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter,
static __always_inline bool
rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter,
enum rtmutex_chainwalk chwalk)
{
/*
* This is just a wrapper function for the following call,
* because debug_rt_mutex_detect_deadlock() smells like a magic
* debug feature and I wanted to keep the cond function in the
* main source file along with the comments instead of having
* two of the same in the headers.
*/
return debug_rt_mutex_detect_deadlock(waiter, chwalk);
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEX))
return waiter != NULL;
return chwalk == RT_MUTEX_FULL_CHAINWALK;
}
/*
@ -358,7 +353,7 @@ static bool rt_mutex_cond_detect_deadlock(struct rt_mutex_waiter *waiter,
*/
int max_lock_depth = 1024;
static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p)
static __always_inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p)
{
return p->pi_blocked_on ? p->pi_blocked_on->lock : NULL;
}
@ -426,7 +421,7 @@ static inline struct rt_mutex *task_blocked_on_lock(struct task_struct *p)
* unlock(lock->wait_lock); release [L]
* goto again;
*/
static int rt_mutex_adjust_prio_chain(struct task_struct *task,
static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
enum rtmutex_chainwalk chwalk,
struct rt_mutex *orig_lock,
struct rt_mutex *next_lock,
@ -579,7 +574,6 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
* walk, we detected a deadlock.
*/
if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
debug_rt_mutex_deadlock(chwalk, orig_waiter, lock);
raw_spin_unlock(&lock->wait_lock);
ret = -EDEADLK;
goto out_unlock_pi;
@ -706,7 +700,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
} else if (prerequeue_top_waiter == waiter) {
/*
* The waiter was the top waiter on the lock, but is
* no longer the top prority waiter. Replace waiter in
* no longer the top priority waiter. Replace waiter in
* the owner tasks pi waiters tree with the new top
* (highest priority) waiter and adjust the priority
* of the owner.
@ -784,7 +778,8 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
* @waiter: The waiter that is queued to the lock's wait tree if the
* callsite called task_blocked_on_lock(), otherwise NULL
*/
static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
static int __sched
try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
struct rt_mutex_waiter *waiter)
{
lockdep_assert_held(&lock->wait_lock);
@ -886,9 +881,6 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
raw_spin_unlock(&task->pi_lock);
takeit:
/* We got the lock. */
debug_rt_mutex_lock(lock);
/*
* This either preserves the RT_MUTEX_HAS_WAITERS bit if there
* are still waiters or clears it.
@ -905,7 +897,7 @@ takeit:
*
* This must be called with lock->wait_lock held and interrupts disabled
*/
static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
static int __sched task_blocks_on_rt_mutex(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter,
struct task_struct *task,
enum rtmutex_chainwalk chwalk)
@ -994,7 +986,7 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
*
* Called with lock->wait_lock held and interrupts disabled.
*/
static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
static void __sched mark_wakeup_next_waiter(struct wake_q_head *wake_q,
struct rt_mutex *lock)
{
struct rt_mutex_waiter *waiter;
@ -1044,7 +1036,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
* Must be called with lock->wait_lock held and interrupts disabled. I must
* have just failed to try_to_take_rt_mutex().
*/
static void remove_waiter(struct rt_mutex *lock,
static void __sched remove_waiter(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter)
{
bool is_top_waiter = (waiter == rt_mutex_top_waiter(lock));
@ -1102,7 +1094,7 @@ static void remove_waiter(struct rt_mutex *lock,
*
* Called from sched_setscheduler
*/
void rt_mutex_adjust_pi(struct task_struct *task)
void __sched rt_mutex_adjust_pi(struct task_struct *task)
{
struct rt_mutex_waiter *waiter;
struct rt_mutex *next_lock;
@ -1125,7 +1117,7 @@ void rt_mutex_adjust_pi(struct task_struct *task)
next_lock, NULL, task);
}
void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
void __sched rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
{
debug_rt_mutex_init_waiter(waiter);
RB_CLEAR_NODE(&waiter->pi_tree_entry);
@ -1143,8 +1135,7 @@ void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
*
* Must be called with lock->wait_lock held and interrupts disabled
*/
static int __sched
__rt_mutex_slowlock(struct rt_mutex *lock, int state,
static int __sched __rt_mutex_slowlock(struct rt_mutex *lock, int state,
struct hrtimer_sleeper *timeout,
struct rt_mutex_waiter *waiter)
{
@ -1155,24 +1146,17 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
if (try_to_take_rt_mutex(lock, current, waiter))
break;
/*
* TASK_INTERRUPTIBLE checks for signals and
* timeout. Ignored otherwise.
*/
if (likely(state == TASK_INTERRUPTIBLE)) {
/* Signal pending? */
if (signal_pending(current))
ret = -EINTR;
if (timeout && !timeout->task)
if (timeout && !timeout->task) {
ret = -ETIMEDOUT;
if (ret)
break;
}
if (signal_pending_state(state, current)) {
ret = -EINTR;
break;
}
raw_spin_unlock_irq(&lock->wait_lock);
debug_rt_mutex_print_deadlock(waiter);
schedule();
raw_spin_lock_irq(&lock->wait_lock);
@ -1183,7 +1167,7 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}
static void rt_mutex_handle_deadlock(int res, int detect_deadlock,
static void __sched rt_mutex_handle_deadlock(int res, int detect_deadlock,
struct rt_mutex_waiter *w)
{
/*
@ -1194,9 +1178,9 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock,
return;
/*
* Yell lowdly and stop the task right here.
* Yell loudly and stop the task right here.
*/
rt_mutex_print_deadlock(w);
WARN(1, "rtmutex deadlock detected\n");
while (1) {
set_current_state(TASK_INTERRUPTIBLE);
schedule();
@ -1206,8 +1190,7 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock,
/*
* Slow path lock function:
*/
static int __sched
rt_mutex_slowlock(struct rt_mutex *lock, int state,
static int __sched rt_mutex_slowlock(struct rt_mutex *lock, int state,
struct hrtimer_sleeper *timeout,
enum rtmutex_chainwalk chwalk)
{
@ -1268,7 +1251,7 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
return ret;
}
static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
static int __sched __rt_mutex_slowtrylock(struct rt_mutex *lock)
{
int ret = try_to_take_rt_mutex(lock, current, NULL);
@ -1284,7 +1267,7 @@ static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
/*
* Slow path try-lock function:
*/
static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
static int __sched rt_mutex_slowtrylock(struct rt_mutex *lock)
{
unsigned long flags;
int ret;
@ -1310,14 +1293,25 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
return ret;
}
/*
* Performs the wakeup of the top-waiter and re-enables preemption.
*/
void __sched rt_mutex_postunlock(struct wake_q_head *wake_q)
{
wake_up_q(wake_q);
/* Pairs with preempt_disable() in mark_wakeup_next_waiter() */
preempt_enable();
}
/*
* Slow path to release a rt-mutex.
*
* Return whether the current task needs to call rt_mutex_postunlock().
*/
static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock,
struct wake_q_head *wake_q)
static void __sched rt_mutex_slowunlock(struct rt_mutex *lock)
{
DEFINE_WAKE_Q(wake_q);
unsigned long flags;
/* irqsave required to support early boot calls */
@ -1359,7 +1353,7 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock,
while (!rt_mutex_has_waiters(lock)) {
/* Drops lock->wait_lock ! */
if (unlock_rt_mutex_safe(lock, flags) == true)
return false;
return;
/* Relock the rtmutex and try again */
raw_spin_lock_irqsave(&lock->wait_lock, flags);
}
@ -1370,10 +1364,10 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock,
*
* Queue the next waiter for wakeup once we release the wait_lock.
*/
mark_wakeup_next_waiter(wake_q, lock);
mark_wakeup_next_waiter(&wake_q, lock);
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
return true; /* call rt_mutex_postunlock() */
rt_mutex_postunlock(&wake_q);
}
/*
@ -1382,74 +1376,21 @@ static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock,
* The atomic acquire/release ops are compiled away, when either the
* architecture does not support cmpxchg or when debugging is enabled.
*/
static inline int
rt_mutex_fastlock(struct rt_mutex *lock, int state,
int (*slowfn)(struct rt_mutex *lock, int state,
struct hrtimer_sleeper *timeout,
enum rtmutex_chainwalk chwalk))
static __always_inline int __rt_mutex_lock(struct rt_mutex *lock, long state,
unsigned int subclass)
{
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 0;
int ret;
return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK);
}
static inline int
rt_mutex_timed_fastlock(struct rt_mutex *lock, int state,
struct hrtimer_sleeper *timeout,
enum rtmutex_chainwalk chwalk,
int (*slowfn)(struct rt_mutex *lock, int state,
struct hrtimer_sleeper *timeout,
enum rtmutex_chainwalk chwalk))
{
if (chwalk == RT_MUTEX_MIN_CHAINWALK &&
likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 0;
return slowfn(lock, state, timeout, chwalk);
}
static inline int
rt_mutex_fasttrylock(struct rt_mutex *lock,
int (*slowfn)(struct rt_mutex *lock))
{
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 1;
return slowfn(lock);
}
/*
* Performs the wakeup of the top-waiter and re-enables preemption.
*/
void rt_mutex_postunlock(struct wake_q_head *wake_q)
{
wake_up_q(wake_q);
/* Pairs with preempt_disable() in rt_mutex_slowunlock() */
preempt_enable();
}
static inline void
rt_mutex_fastunlock(struct rt_mutex *lock,
bool (*slowfn)(struct rt_mutex *lock,
struct wake_q_head *wqh))
{
DEFINE_WAKE_Q(wake_q);
if (likely(rt_mutex_cmpxchg_release(lock, current, NULL)))
return;
if (slowfn(lock, &wake_q))
rt_mutex_postunlock(&wake_q);
}
static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass)
{
might_sleep();
mutex_acquire(&lock->dep_map, subclass, 0, _RET_IP_);
rt_mutex_fastlock(lock, TASK_UNINTERRUPTIBLE, rt_mutex_slowlock);
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 0;
ret = rt_mutex_slowlock(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK);
if (ret)
mutex_release(&lock->dep_map, _RET_IP_);
return ret;
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
@ -1461,7 +1402,7 @@ static inline void __rt_mutex_lock(struct rt_mutex *lock, unsigned int subclass)
*/
void __sched rt_mutex_lock_nested(struct rt_mutex *lock, unsigned int subclass)
{
__rt_mutex_lock(lock, subclass);
__rt_mutex_lock(lock, TASK_UNINTERRUPTIBLE, subclass);
}
EXPORT_SYMBOL_GPL(rt_mutex_lock_nested);
@ -1474,7 +1415,7 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock_nested);
*/
void __sched rt_mutex_lock(struct rt_mutex *lock)
{
__rt_mutex_lock(lock, 0);
__rt_mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0);
}
EXPORT_SYMBOL_GPL(rt_mutex_lock);
#endif
@ -1490,82 +1431,37 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock);
*/
int __sched rt_mutex_lock_interruptible(struct rt_mutex *lock)
{
int ret;
might_sleep();
mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
ret = rt_mutex_fastlock(lock, TASK_INTERRUPTIBLE, rt_mutex_slowlock);
if (ret)
mutex_release(&lock->dep_map, _RET_IP_);
return ret;
return __rt_mutex_lock(lock, TASK_INTERRUPTIBLE, 0);
}
EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible);
/*
* Futex variant, must not use fastpath.
*/
int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
{
return rt_mutex_slowtrylock(lock);
}
int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
{
return __rt_mutex_slowtrylock(lock);
}
/**
* rt_mutex_timed_lock - lock a rt_mutex interruptible
* the timeout structure is provided
* by the caller
*
* @lock: the rt_mutex to be locked
* @timeout: timeout structure or NULL (no timeout)
*
* Returns:
* 0 on success
* -EINTR when interrupted by a signal
* -ETIMEDOUT when the timeout expired
*/
int
rt_mutex_timed_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout)
{
int ret;
might_sleep();
mutex_acquire(&lock->dep_map, 0, 0, _RET_IP_);
ret = rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout,
RT_MUTEX_MIN_CHAINWALK,
rt_mutex_slowlock);
if (ret)
mutex_release(&lock->dep_map, _RET_IP_);
return ret;
}
EXPORT_SYMBOL_GPL(rt_mutex_timed_lock);
/**
* rt_mutex_trylock - try to lock a rt_mutex
*
* @lock: the rt_mutex to be locked
*
* This function can only be called in thread context. It's safe to
* call it from atomic regions, but not from hard interrupt or soft
* interrupt context.
* This function can only be called in thread context. It's safe to call it
* from atomic regions, but not from hard or soft interrupt context.
*
* Returns 1 on success and 0 on contention
* Returns:
* 1 on success
* 0 on contention
*/
int __sched rt_mutex_trylock(struct rt_mutex *lock)
{
int ret;
if (WARN_ON_ONCE(in_irq() || in_nmi() || in_serving_softirq()))
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
return 0;
ret = rt_mutex_fasttrylock(lock, rt_mutex_slowtrylock);
/*
* No lockdep annotation required because lockdep disables the fast
* path.
*/
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 1;
ret = rt_mutex_slowtrylock(lock);
if (ret)
mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
@ -1581,10 +1477,26 @@ EXPORT_SYMBOL_GPL(rt_mutex_trylock);
void __sched rt_mutex_unlock(struct rt_mutex *lock)
{
mutex_release(&lock->dep_map, _RET_IP_);
rt_mutex_fastunlock(lock, rt_mutex_slowunlock);
if (likely(rt_mutex_cmpxchg_release(lock, current, NULL)))
return;
rt_mutex_slowunlock(lock);
}
EXPORT_SYMBOL_GPL(rt_mutex_unlock);
/*
* Futex variants, must not use fastpath.
*/
int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
{
return rt_mutex_slowtrylock(lock);
}
int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
{
return __rt_mutex_slowtrylock(lock);
}
/**
* __rt_mutex_futex_unlock - Futex variant, that since futex variants
* do not use the fast-path, can be simple and will not need to retry.
@ -1629,23 +1541,6 @@ void __sched rt_mutex_futex_unlock(struct rt_mutex *lock)
rt_mutex_postunlock(&wake_q);
}
/**
* rt_mutex_destroy - mark a mutex unusable
* @lock: the mutex to be destroyed
*
* This function marks the mutex uninitialized, and any subsequent
* use of the mutex is forbidden. The mutex must not be locked when
* this function is called.
*/
void rt_mutex_destroy(struct rt_mutex *lock)
{
WARN_ON(rt_mutex_is_locked(lock));
#ifdef CONFIG_DEBUG_RT_MUTEXES
lock->magic = NULL;
#endif
}
EXPORT_SYMBOL_GPL(rt_mutex_destroy);
/**
* __rt_mutex_init - initialize the rt_mutex
*
@ -1657,15 +1552,13 @@ EXPORT_SYMBOL_GPL(rt_mutex_destroy);
*
* Initializing of a locked rt_mutex is not allowed
*/
void __rt_mutex_init(struct rt_mutex *lock, const char *name,
void __sched __rt_mutex_init(struct rt_mutex *lock, const char *name,
struct lock_class_key *key)
{
lock->owner = NULL;
raw_spin_lock_init(&lock->wait_lock);
lock->waiters = RB_ROOT_CACHED;
debug_check_no_locks_freed((void *)lock, sizeof(*lock));
lockdep_init_map(&lock->dep_map, name, key, 0);
if (name && key)
debug_rt_mutex_init(lock, name, key);
__rt_mutex_basic_init(lock);
}
EXPORT_SYMBOL_GPL(__rt_mutex_init);
@ -1683,11 +1576,10 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init);
* possible at this point because the pi_state which contains the rtmutex
* is not yet visible to other tasks.
*/
void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
void __sched rt_mutex_init_proxy_locked(struct rt_mutex *lock,
struct task_struct *proxy_owner)
{
__rt_mutex_init(lock, NULL, NULL);
debug_rt_mutex_proxy_lock(lock, proxy_owner);
__rt_mutex_basic_init(lock);
rt_mutex_set_owner(lock, proxy_owner);
}
@ -1703,7 +1595,7 @@ void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
* possible because it belongs to the pi_state which is about to be freed
* and it is not longer visible to other tasks.
*/
void rt_mutex_proxy_unlock(struct rt_mutex *lock)
void __sched rt_mutex_proxy_unlock(struct rt_mutex *lock)
{
debug_rt_mutex_proxy_unlock(lock);
rt_mutex_set_owner(lock, NULL);
@ -1728,7 +1620,7 @@ void rt_mutex_proxy_unlock(struct rt_mutex *lock)
*
* Special API call for PI-futex support.
*/
int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
int __sched __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter,
struct task_struct *task)
{
@ -1753,8 +1645,6 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
ret = 0;
}
debug_rt_mutex_print_deadlock(waiter);
return ret;
}
@ -1777,7 +1667,7 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
*
* Special API call for PI-futex support.
*/
int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
int __sched rt_mutex_start_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter,
struct task_struct *task)
{
@ -1792,26 +1682,6 @@ int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
return ret;
}
/**
* rt_mutex_next_owner - return the next owner of the lock
*
* @lock: the rt lock query
*
* Returns the next owner of the lock or NULL
*
* Caller has to serialize against other accessors to the lock
* itself.
*
* Special API call for PI-futex support
*/
struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock)
{
if (!rt_mutex_has_waiters(lock))
return NULL;
return rt_mutex_top_waiter(lock)->task;
}
/**
* rt_mutex_wait_proxy_lock() - Wait for lock acquisition
* @lock: the rt_mutex we were woken on
@ -1829,7 +1699,7 @@ struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock)
*
* Special API call for PI-futex support
*/
int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
int __sched rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
struct hrtimer_sleeper *to,
struct rt_mutex_waiter *waiter)
{
@ -1869,7 +1739,7 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
*
* Special API call for PI-futex support
*/
bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
struct rt_mutex_waiter *waiter)
{
bool cleanup = false;
@ -1905,3 +1775,11 @@ bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
return cleanup;
}
#ifdef CONFIG_DEBUG_RT_MUTEXES
void rt_mutex_debug_task_free(struct task_struct *task)
{
DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters.rb_root));
DEBUG_LOCKS_WARN_ON(task->pi_blocked_on);
}
#endif

View File

@ -1,35 +0,0 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* RT-Mutexes: blocking mutual exclusion locks with PI support
*
* started by Ingo Molnar and Thomas Gleixner:
*
* Copyright (C) 2004-2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
* Copyright (C) 2006, Timesys Corp., Thomas Gleixner <tglx@timesys.com>
*
* This file contains macros used solely by rtmutex.c.
* Non-debug version.
*/
#define rt_mutex_deadlock_check(l) (0)
#define debug_rt_mutex_init_waiter(w) do { } while (0)
#define debug_rt_mutex_free_waiter(w) do { } while (0)
#define debug_rt_mutex_lock(l) do { } while (0)
#define debug_rt_mutex_proxy_lock(l,p) do { } while (0)
#define debug_rt_mutex_proxy_unlock(l) do { } while (0)
#define debug_rt_mutex_unlock(l) do { } while (0)
#define debug_rt_mutex_init(m, n, k) do { } while (0)
#define debug_rt_mutex_deadlock(d, a ,l) do { } while (0)
#define debug_rt_mutex_print_deadlock(w) do { } while (0)
#define debug_rt_mutex_reset_waiter(w) do { } while (0)
static inline void rt_mutex_print_deadlock(struct rt_mutex_waiter *w)
{
WARN(1, "rtmutex deadlock detected\n");
}
static inline bool debug_rt_mutex_detect_deadlock(struct rt_mutex_waiter *w,
enum rtmutex_chainwalk walk)
{
return walk == RT_MUTEX_FULL_CHAINWALK;
}

View File

@ -13,6 +13,7 @@
#ifndef __KERNEL_RTMUTEX_COMMON_H
#define __KERNEL_RTMUTEX_COMMON_H
#include <linux/debug_locks.h>
#include <linux/rtmutex.h>
#include <linux/sched/wake_q.h>
@ -23,34 +24,30 @@
* @tree_entry: pi node to enqueue into the mutex waiters tree
* @pi_tree_entry: pi node to enqueue into the mutex owner waiters tree
* @task: task reference to the blocked task
* @lock: Pointer to the rt_mutex on which the waiter blocks
* @prio: Priority of the waiter
* @deadline: Deadline of the waiter if applicable
*/
struct rt_mutex_waiter {
struct rb_node tree_entry;
struct rb_node pi_tree_entry;
struct task_struct *task;
struct rt_mutex *lock;
#ifdef CONFIG_DEBUG_RT_MUTEXES
unsigned long ip;
struct pid *deadlock_task_pid;
struct rt_mutex *deadlock_lock;
#endif
int prio;
u64 deadline;
};
/*
* Various helpers to access the waiters-tree:
* Must be guarded because this header is included from rcu/tree_plugin.h
* unconditionally.
*/
#ifdef CONFIG_RT_MUTEXES
static inline int rt_mutex_has_waiters(struct rt_mutex *lock)
{
return !RB_EMPTY_ROOT(&lock->waiters.rb_root);
}
static inline struct rt_mutex_waiter *
rt_mutex_top_waiter(struct rt_mutex *lock)
static inline struct rt_mutex_waiter *rt_mutex_top_waiter(struct rt_mutex *lock)
{
struct rb_node *leftmost = rb_first_cached(&lock->waiters);
struct rt_mutex_waiter *w = NULL;
@ -67,42 +64,12 @@ static inline int task_has_pi_waiters(struct task_struct *p)
return !RB_EMPTY_ROOT(&p->pi_waiters.rb_root);
}
static inline struct rt_mutex_waiter *
task_top_pi_waiter(struct task_struct *p)
static inline struct rt_mutex_waiter *task_top_pi_waiter(struct task_struct *p)
{
return rb_entry(p->pi_waiters.rb_leftmost,
struct rt_mutex_waiter, pi_tree_entry);
return rb_entry(p->pi_waiters.rb_leftmost, struct rt_mutex_waiter,
pi_tree_entry);
}
#else
static inline int rt_mutex_has_waiters(struct rt_mutex *lock)
{
return false;
}
static inline struct rt_mutex_waiter *
rt_mutex_top_waiter(struct rt_mutex *lock)
{
return NULL;
}
static inline int task_has_pi_waiters(struct task_struct *p)
{
return false;
}
static inline struct rt_mutex_waiter *
task_top_pi_waiter(struct task_struct *p)
{
return NULL;
}
#endif
/*
* lock->owner state tracking:
*/
#define RT_MUTEX_HAS_WAITERS 1UL
static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
@ -111,6 +78,13 @@ static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS);
}
#else /* CONFIG_RT_MUTEXES */
/* Used in rcu/tree_plugin.h */
static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
{
return NULL;
}
#endif /* !CONFIG_RT_MUTEXES */
/*
* Constants for rt mutex functions which have a selectable deadlock
@ -127,10 +101,16 @@ enum rtmutex_chainwalk {
RT_MUTEX_FULL_CHAINWALK,
};
static inline void __rt_mutex_basic_init(struct rt_mutex *lock)
{
lock->owner = NULL;
raw_spin_lock_init(&lock->wait_lock);
lock->waiters = RB_ROOT_CACHED;
}
/*
* PI-futex support (proxy locking functions, etc.):
*/
extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock);
extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
struct task_struct *proxy_owner);
extern void rt_mutex_proxy_unlock(struct rt_mutex *lock);
@ -156,10 +136,29 @@ extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,
extern void rt_mutex_postunlock(struct wake_q_head *wake_q);
#ifdef CONFIG_DEBUG_RT_MUTEXES
# include "rtmutex-debug.h"
#else
# include "rtmutex.h"
#endif
/* Debug functions */
static inline void debug_rt_mutex_unlock(struct rt_mutex *lock)
{
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES))
DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current);
}
static inline void debug_rt_mutex_proxy_unlock(struct rt_mutex *lock)
{
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES))
DEBUG_LOCKS_WARN_ON(!rt_mutex_owner(lock));
}
static inline void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
{
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES))
memset(waiter, 0x11, sizeof(*waiter));
}
static inline void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter)
{
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES))
memset(waiter, 0x22, sizeof(*waiter));
}
#endif

View File

@ -632,7 +632,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
}
/*
* The rwsem_spin_on_owner() function returns the folowing 4 values
* The rwsem_spin_on_owner() function returns the following 4 values
* depending on the lock owner state.
* OWNER_NULL : owner is currently NULL
* OWNER_WRITER: when owner changes and is a writer
@ -819,7 +819,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
* we try to get it. The new owner may be a spinnable
* writer.
*
* To take advantage of two scenarios listed agove, the RT
* To take advantage of two scenarios listed above, the RT
* task is made to retry one more time to see if it can
* acquire the lock or continue spinning on the new owning
* writer. Of course, if the time lag is long enough or the

View File

@ -58,10 +58,10 @@ EXPORT_PER_CPU_SYMBOL(__mmiowb_state);
/*
* We build the __lock_function inlines here. They are too large for
* inlining all over the place, but here is only one user per function
* which embedds them into the calling _lock_function below.
* which embeds them into the calling _lock_function below.
*
* This could be a long-held lock. We both prepare to spin for a long
* time (making _this_ CPU preemptable if possible), and we also signal
* time (making _this_ CPU preemptible if possible), and we also signal
* towards that other CPU that it should break the lock ASAP.
*/
#define BUILD_LOCK_OPS(op, locktype) \

View File

@ -5396,25 +5396,25 @@ static void sched_dynamic_update(int mode)
switch (mode) {
case preempt_dynamic_none:
static_call_update(cond_resched, __cond_resched);
static_call_update(might_resched, (typeof(&__cond_resched)) __static_call_return0);
static_call_update(preempt_schedule, (typeof(&preempt_schedule)) NULL);
static_call_update(preempt_schedule_notrace, (typeof(&preempt_schedule_notrace)) NULL);
static_call_update(irqentry_exit_cond_resched, (typeof(&irqentry_exit_cond_resched)) NULL);
static_call_update(might_resched, (void *)&__static_call_return0);
static_call_update(preempt_schedule, NULL);
static_call_update(preempt_schedule_notrace, NULL);
static_call_update(irqentry_exit_cond_resched, NULL);
pr_info("Dynamic Preempt: none\n");
break;
case preempt_dynamic_voluntary:
static_call_update(cond_resched, __cond_resched);
static_call_update(might_resched, __cond_resched);
static_call_update(preempt_schedule, (typeof(&preempt_schedule)) NULL);
static_call_update(preempt_schedule_notrace, (typeof(&preempt_schedule_notrace)) NULL);
static_call_update(irqentry_exit_cond_resched, (typeof(&irqentry_exit_cond_resched)) NULL);
static_call_update(preempt_schedule, NULL);
static_call_update(preempt_schedule_notrace, NULL);
static_call_update(irqentry_exit_cond_resched, NULL);
pr_info("Dynamic Preempt: voluntary\n");
break;
case preempt_dynamic_full:
static_call_update(cond_resched, (typeof(&__cond_resched)) __static_call_return0);
static_call_update(might_resched, (typeof(&__cond_resched)) __static_call_return0);
static_call_update(cond_resched, (void *)&__static_call_return0);
static_call_update(might_resched, (void *)&__static_call_return0);
static_call_update(preempt_schedule, __preempt_schedule_func);
static_call_update(preempt_schedule_notrace, __preempt_schedule_notrace_func);
static_call_update(irqentry_exit_cond_resched, irqentry_exit_cond_resched);

View File

@ -24,14 +24,70 @@
#include <linux/sched/clock.h>
#include <linux/nmi.h>
#include <linux/sched/debug.h>
#include <linux/jump_label.h>
#include "smpboot.h"
#include "sched/smp.h"
#define CSD_TYPE(_csd) ((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
union cfd_seq_cnt {
u64 val;
struct {
u64 src:16;
u64 dst:16;
#define CFD_SEQ_NOCPU 0xffff
u64 type:4;
#define CFD_SEQ_QUEUE 0
#define CFD_SEQ_IPI 1
#define CFD_SEQ_NOIPI 2
#define CFD_SEQ_PING 3
#define CFD_SEQ_PINGED 4
#define CFD_SEQ_HANDLE 5
#define CFD_SEQ_DEQUEUE 6
#define CFD_SEQ_IDLE 7
#define CFD_SEQ_GOTIPI 8
#define CFD_SEQ_HDLEND 9
u64 cnt:28;
} u;
};
static char *seq_type[] = {
[CFD_SEQ_QUEUE] = "queue",
[CFD_SEQ_IPI] = "ipi",
[CFD_SEQ_NOIPI] = "noipi",
[CFD_SEQ_PING] = "ping",
[CFD_SEQ_PINGED] = "pinged",
[CFD_SEQ_HANDLE] = "handle",
[CFD_SEQ_DEQUEUE] = "dequeue (src CPU 0 == empty)",
[CFD_SEQ_IDLE] = "idle",
[CFD_SEQ_GOTIPI] = "gotipi",
[CFD_SEQ_HDLEND] = "hdlend (src CPU 0 == early)",
};
struct cfd_seq_local {
u64 ping;
u64 pinged;
u64 handle;
u64 dequeue;
u64 idle;
u64 gotipi;
u64 hdlend;
};
#endif
struct cfd_percpu {
call_single_data_t csd;
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
u64 seq_queue;
u64 seq_ipi;
u64 seq_noipi;
#endif
};
struct call_function_data {
call_single_data_t __percpu *csd;
struct cfd_percpu __percpu *pcpu;
cpumask_var_t cpumask;
cpumask_var_t cpumask_ipi;
};
@ -54,8 +110,8 @@ int smpcfd_prepare_cpu(unsigned int cpu)
free_cpumask_var(cfd->cpumask);
return -ENOMEM;
}
cfd->csd = alloc_percpu(call_single_data_t);
if (!cfd->csd) {
cfd->pcpu = alloc_percpu(struct cfd_percpu);
if (!cfd->pcpu) {
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
return -ENOMEM;
@ -70,7 +126,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
free_cpumask_var(cfd->cpumask);
free_cpumask_var(cfd->cpumask_ipi);
free_percpu(cfd->csd);
free_percpu(cfd->pcpu);
return 0;
}
@ -102,15 +158,60 @@ void __init call_function_init(void)
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
static DEFINE_STATIC_KEY_FALSE(csdlock_debug_enabled);
static DEFINE_STATIC_KEY_FALSE(csdlock_debug_extended);
static int __init csdlock_debug(char *str)
{
unsigned int val = 0;
if (str && !strcmp(str, "ext")) {
val = 1;
static_branch_enable(&csdlock_debug_extended);
} else
get_option(&str, &val);
if (val)
static_branch_enable(&csdlock_debug_enabled);
return 0;
}
early_param("csdlock_debug", csdlock_debug);
static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
static DEFINE_PER_CPU(void *, cur_csd_info);
static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local);
#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
static atomic_t csd_bug_count = ATOMIC_INIT(0);
static u64 cfd_seq;
#define CFD_SEQ(s, d, t, c) \
(union cfd_seq_cnt){ .u.src = s, .u.dst = d, .u.type = t, .u.cnt = c }
static u64 cfd_seq_inc(unsigned int src, unsigned int dst, unsigned int type)
{
union cfd_seq_cnt new, old;
new = CFD_SEQ(src, dst, type, 0);
do {
old.val = READ_ONCE(cfd_seq);
new.u.cnt = old.u.cnt + 1;
} while (cmpxchg(&cfd_seq, old.val, new.val) != old.val);
return old.val;
}
#define cfd_seq_store(var, src, dst, type) \
do { \
if (static_branch_unlikely(&csdlock_debug_extended)) \
var = cfd_seq_inc(src, dst, type); \
} while (0)
/* Record current CSD work for current CPU, NULL to erase. */
static void csd_lock_record(call_single_data_t *csd)
static void __csd_lock_record(call_single_data_t *csd)
{
if (!csd) {
smp_mb(); /* NULL cur_csd after unlock. */
@ -125,7 +226,13 @@ static void csd_lock_record(call_single_data_t *csd)
/* Or before unlock, as the case may be. */
}
static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
static __always_inline void csd_lock_record(call_single_data_t *csd)
{
if (static_branch_unlikely(&csdlock_debug_enabled))
__csd_lock_record(csd);
}
static int csd_lock_wait_getcpu(call_single_data_t *csd)
{
unsigned int csd_type;
@ -135,12 +242,86 @@ static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
return -1;
}
static void cfd_seq_data_add(u64 val, unsigned int src, unsigned int dst,
unsigned int type, union cfd_seq_cnt *data,
unsigned int *n_data, unsigned int now)
{
union cfd_seq_cnt new[2];
unsigned int i, j, k;
new[0].val = val;
new[1] = CFD_SEQ(src, dst, type, new[0].u.cnt + 1);
for (i = 0; i < 2; i++) {
if (new[i].u.cnt <= now)
new[i].u.cnt |= 0x80000000U;
for (j = 0; j < *n_data; j++) {
if (new[i].u.cnt == data[j].u.cnt) {
/* Direct read value trumps generated one. */
if (i == 0)
data[j].val = new[i].val;
break;
}
if (new[i].u.cnt < data[j].u.cnt) {
for (k = *n_data; k > j; k--)
data[k].val = data[k - 1].val;
data[j].val = new[i].val;
(*n_data)++;
break;
}
}
if (j == *n_data) {
data[j].val = new[i].val;
(*n_data)++;
}
}
}
static const char *csd_lock_get_type(unsigned int type)
{
return (type >= ARRAY_SIZE(seq_type)) ? "?" : seq_type[type];
}
static void csd_lock_print_extended(call_single_data_t *csd, int cpu)
{
struct cfd_seq_local *seq = &per_cpu(cfd_seq_local, cpu);
unsigned int srccpu = csd->node.src;
struct call_function_data *cfd = per_cpu_ptr(&cfd_data, srccpu);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
unsigned int now;
union cfd_seq_cnt data[2 * ARRAY_SIZE(seq_type)];
unsigned int n_data = 0, i;
data[0].val = READ_ONCE(cfd_seq);
now = data[0].u.cnt;
cfd_seq_data_add(pcpu->seq_queue, srccpu, cpu, CFD_SEQ_QUEUE, data, &n_data, now);
cfd_seq_data_add(pcpu->seq_ipi, srccpu, cpu, CFD_SEQ_IPI, data, &n_data, now);
cfd_seq_data_add(pcpu->seq_noipi, srccpu, cpu, CFD_SEQ_NOIPI, data, &n_data, now);
cfd_seq_data_add(per_cpu(cfd_seq_local.ping, srccpu), srccpu, CFD_SEQ_NOCPU, CFD_SEQ_PING, data, &n_data, now);
cfd_seq_data_add(per_cpu(cfd_seq_local.pinged, srccpu), srccpu, CFD_SEQ_NOCPU, CFD_SEQ_PINGED, data, &n_data, now);
cfd_seq_data_add(seq->idle, CFD_SEQ_NOCPU, cpu, CFD_SEQ_IDLE, data, &n_data, now);
cfd_seq_data_add(seq->gotipi, CFD_SEQ_NOCPU, cpu, CFD_SEQ_GOTIPI, data, &n_data, now);
cfd_seq_data_add(seq->handle, CFD_SEQ_NOCPU, cpu, CFD_SEQ_HANDLE, data, &n_data, now);
cfd_seq_data_add(seq->dequeue, CFD_SEQ_NOCPU, cpu, CFD_SEQ_DEQUEUE, data, &n_data, now);
cfd_seq_data_add(seq->hdlend, CFD_SEQ_NOCPU, cpu, CFD_SEQ_HDLEND, data, &n_data, now);
for (i = 0; i < n_data; i++) {
pr_alert("\tcsd: cnt(%07x): %04x->%04x %s\n",
data[i].u.cnt & ~0x80000000U, data[i].u.src,
data[i].u.dst, csd_lock_get_type(data[i].u.type));
}
pr_alert("\tcsd: cnt now: %07x\n", now);
}
/*
* Complain if too much time spent waiting. Note that only
* the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
* so waiting on other types gets much less information.
*/
static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
static bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
{
int cpu = -1;
int cpux;
@ -184,6 +365,8 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t
*bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
}
if (cpu >= 0) {
if (static_branch_unlikely(&csdlock_debug_extended))
csd_lock_print_extended(csd, cpu);
if (!trigger_single_cpu_backtrace(cpu))
dump_cpu_task(cpu);
if (!cpu_cur_csd) {
@ -204,7 +387,7 @@ static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 t
* previous function call. For multi-cpu calls its even more interesting
* as we'll have to ensure no other cpu is observing our csd.
*/
static __always_inline void csd_lock_wait(call_single_data_t *csd)
static void __csd_lock_wait(call_single_data_t *csd)
{
int bug_id = 0;
u64 ts0, ts1;
@ -218,7 +401,36 @@ static __always_inline void csd_lock_wait(call_single_data_t *csd)
smp_acquire__after_ctrl_dep();
}
static __always_inline void csd_lock_wait(call_single_data_t *csd)
{
if (static_branch_unlikely(&csdlock_debug_enabled)) {
__csd_lock_wait(csd);
return;
}
smp_cond_load_acquire(&csd->node.u_flags, !(VAL & CSD_FLAG_LOCK));
}
static void __smp_call_single_queue_debug(int cpu, struct llist_node *node)
{
unsigned int this_cpu = smp_processor_id();
struct cfd_seq_local *seq = this_cpu_ptr(&cfd_seq_local);
struct call_function_data *cfd = this_cpu_ptr(&cfd_data);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(node, &per_cpu(call_single_queue, cpu))) {
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
cfd_seq_store(seq->ping, this_cpu, cpu, CFD_SEQ_PING);
send_call_function_single_ipi(cpu);
cfd_seq_store(seq->pinged, this_cpu, cpu, CFD_SEQ_PINGED);
} else {
cfd_seq_store(pcpu->seq_noipi, this_cpu, cpu, CFD_SEQ_NOIPI);
}
}
#else
#define cfd_seq_store(var, src, dst, type)
static void csd_lock_record(call_single_data_t *csd)
{
}
@ -256,6 +468,19 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data);
void __smp_call_single_queue(int cpu, struct llist_node *node)
{
#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
if (static_branch_unlikely(&csdlock_debug_extended)) {
unsigned int type;
type = CSD_TYPE(container_of(node, call_single_data_t,
node.llist));
if (type == CSD_TYPE_SYNC || type == CSD_TYPE_ASYNC) {
__smp_call_single_queue_debug(cpu, node);
return;
}
}
#endif
/*
* The list addition should be visible before sending the IPI
* handler locks the list to pull the entry off it because of
@ -314,6 +539,8 @@ static int generic_exec_single(int cpu, call_single_data_t *csd)
*/
void generic_smp_call_function_single_interrupt(void)
{
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->gotipi, CFD_SEQ_NOCPU,
smp_processor_id(), CFD_SEQ_GOTIPI);
flush_smp_call_function_queue(true);
}
@ -341,7 +568,13 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
lockdep_assert_irqs_disabled();
head = this_cpu_ptr(&call_single_queue);
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->handle, CFD_SEQ_NOCPU,
smp_processor_id(), CFD_SEQ_HANDLE);
entry = llist_del_all(head);
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->dequeue,
/* Special meaning of source cpu: 0 == queue empty */
entry ? CFD_SEQ_NOCPU : 0,
smp_processor_id(), CFD_SEQ_DEQUEUE);
entry = llist_reverse_order(entry);
/* There shouldn't be any pending callbacks on an offline CPU. */
@ -400,8 +633,12 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
}
}
if (!entry)
if (!entry) {
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->hdlend,
0, smp_processor_id(),
CFD_SEQ_HDLEND);
return;
}
/*
* Second; run all !SYNC callbacks.
@ -439,6 +676,9 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
*/
if (entry)
sched_ttwu_pending(entry);
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->hdlend, CFD_SEQ_NOCPU,
smp_processor_id(), CFD_SEQ_HDLEND);
}
void flush_smp_call_function_from_idle(void)
@ -448,6 +688,8 @@ void flush_smp_call_function_from_idle(void)
if (llist_empty(this_cpu_ptr(&call_single_queue)))
return;
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->idle, CFD_SEQ_NOCPU,
smp_processor_id(), CFD_SEQ_IDLE);
local_irq_save(flags);
flush_smp_call_function_queue(true);
if (local_softirq_pending())
@ -664,7 +906,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
cpumask_clear(cfd->cpumask_ipi);
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
struct cfd_percpu *pcpu = per_cpu_ptr(cfd->pcpu, cpu);
call_single_data_t *csd = &pcpu->csd;
if (cond_func && !cond_func(cpu, info))
continue;
@ -678,18 +921,27 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
csd->node.src = smp_processor_id();
csd->node.dst = cpu;
#endif
if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu)))
cfd_seq_store(pcpu->seq_queue, this_cpu, cpu, CFD_SEQ_QUEUE);
if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
cfd_seq_store(pcpu->seq_ipi, this_cpu, cpu, CFD_SEQ_IPI);
} else {
cfd_seq_store(pcpu->seq_noipi, this_cpu, cpu, CFD_SEQ_NOIPI);
}
}
/* Send a message to all CPUs in the map */
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->ping, this_cpu,
CFD_SEQ_NOCPU, CFD_SEQ_PING);
arch_send_call_function_ipi_mask(cfd->cpumask_ipi);
cfd_seq_store(this_cpu_ptr(&cfd_seq_local)->pinged, this_cpu,
CFD_SEQ_NOCPU, CFD_SEQ_PINGED);
if (wait) {
for_each_cpu(cpu, cfd->cpumask) {
call_single_data_t *csd;
csd = per_cpu_ptr(cfd->csd, cpu);
csd = &per_cpu_ptr(cfd->pcpu, cpu)->csd;
csd_lock_wait(csd);
}
}

View File

@ -165,13 +165,13 @@ void __static_call_update(struct static_call_key *key, void *tramp, void *func)
stop = __stop_static_call_sites;
#ifdef CONFIG_MODULES
if (mod) {
#ifdef CONFIG_MODULES
stop = mod->static_call_sites +
mod->num_static_call_sites;
init = mod->state == MODULE_STATE_COMING;
}
#endif
}
for (site = site_mod->sites;
site < stop && static_call_key(site) == key; site++) {

View File

@ -69,8 +69,9 @@ config KCSAN_SELFTEST
panic. Recommended to be enabled, ensuring critical functionality
works as intended.
config KCSAN_TEST
tristate "KCSAN test for integrated runtime behaviour"
config KCSAN_KUNIT_TEST
tristate "KCSAN test for integrated runtime behaviour" if !KUNIT_ALL_TESTS
default KUNIT_ALL_TESTS
depends on TRACEPOINTS && KUNIT
select TORTURE_TEST
help

View File

@ -0,0 +1,479 @@
MARKING SHARED-MEMORY ACCESSES
==============================
This document provides guidelines for marking intentionally concurrent
normal accesses to shared memory, that is "normal" as in accesses that do
not use read-modify-write atomic operations. It also describes how to
document these accesses, both with comments and with special assertions
processed by the Kernel Concurrency Sanitizer (KCSAN). This discussion
builds on an earlier LWN article [1].
ACCESS-MARKING OPTIONS
======================
The Linux kernel provides the following access-marking options:
1. Plain C-language accesses (unmarked), for example, "a = b;"
2. Data-race marking, for example, "data_race(a = b);"
3. READ_ONCE(), for example, "a = READ_ONCE(b);"
The various forms of atomic_read() also fit in here.
4. WRITE_ONCE(), for example, "WRITE_ONCE(a, b);"
The various forms of atomic_set() also fit in here.
These may be used in combination, as shown in this admittedly improbable
example:
WRITE_ONCE(a, b + data_race(c + d) + READ_ONCE(e));
Neither plain C-language accesses nor data_race() (#1 and #2 above) place
any sort of constraint on the compiler's choice of optimizations [2].
In contrast, READ_ONCE() and WRITE_ONCE() (#3 and #4 above) restrict the
compiler's use of code-motion and common-subexpression optimizations.
Therefore, if a given access is involved in an intentional data race,
using READ_ONCE() for loads and WRITE_ONCE() for stores is usually
preferable to data_race(), which in turn is usually preferable to plain
C-language accesses.
KCSAN will complain about many types of data races involving plain
C-language accesses, but marking all accesses involved in a given data
race with one of data_race(), READ_ONCE(), or WRITE_ONCE(), will prevent
KCSAN from complaining. Of course, lack of KCSAN complaints does not
imply correct code. Therefore, please take a thoughtful approach
when responding to KCSAN complaints. Churning the code base with
ill-considered additions of data_race(), READ_ONCE(), and WRITE_ONCE()
is unhelpful.
In fact, the following sections describe situations where use of
data_race() and even plain C-language accesses is preferable to
READ_ONCE() and WRITE_ONCE().
Use of the data_race() Macro
----------------------------
Here are some situations where data_race() should be used instead of
READ_ONCE() and WRITE_ONCE():
1. Data-racy loads from shared variables whose values are used only
for diagnostic purposes.
2. Data-racy reads whose values are checked against marked reload.
3. Reads whose values feed into error-tolerant heuristics.
4. Writes setting values that feed into error-tolerant heuristics.
Data-Racy Reads for Approximate Diagnostics
Approximate diagnostics include lockdep reports, monitoring/statistics
(including /proc and /sys output), WARN*()/BUG*() checks whose return
values are ignored, and other situations where reads from shared variables
are not an integral part of the core concurrency design.
In fact, use of data_race() instead READ_ONCE() for these diagnostic
reads can enable better checking of the remaining accesses implementing
the core concurrency design. For example, suppose that the core design
prevents any non-diagnostic reads from shared variable x from running
concurrently with updates to x. Then using plain C-language writes
to x allows KCSAN to detect reads from x from within regions of code
that fail to exclude the updates. In this case, it is important to use
data_race() for the diagnostic reads because otherwise KCSAN would give
false-positive warnings about these diagnostic reads.
In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.
Data-Racy Reads That Are Checked Against Marked Reload
The values from some reads are not implicitly trusted. They are instead
fed into some operation that checks the full value against a later marked
load from memory, which means that the occasional arbitrarily bogus value
is not a problem. For example, if a bogus value is fed into cmpxchg(),
all that happens is that this cmpxchg() fails, which normally results
in a retry. Unless the race condition that resulted in the bogus value
recurs, this retry will with high probability succeed, so no harm done.
However, please keep in mind that a data_race() load feeding into
a cmpxchg_relaxed() might still be subject to load fusing on some
architectures. Therefore, it is best to capture the return value from
the failing cmpxchg() for the next iteration of the loop, an approach
that provides the compiler much less scope for mischievous optimizations.
Capturing the return value from cmpxchg() also saves a memory reference
in many cases.
In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.
Reads Feeding Into Error-Tolerant Heuristics
Values from some reads feed into heuristics that can tolerate occasional
errors. Such reads can use data_race(), thus allowing KCSAN to focus on
the other accesses to the relevant shared variables. But please note
that data_race() loads are subject to load fusing, which can result in
consistent errors, which in turn are quite capable of breaking heuristics.
Therefore use of data_race() should be limited to cases where some other
code (such as a barrier() call) will force the occasional reload.
In theory, plain C-language loads can also be used for this use case.
However, in practice this will have the disadvantage of causing KCSAN
to generate false positives because KCSAN will have no way of knowing
that the resulting data race was intentional.
Writes Setting Values Feeding Into Error-Tolerant Heuristics
The values read into error-tolerant heuristics come from somewhere,
for example, from sysfs. This means that some code in sysfs writes
to this same variable, and these writes can also use data_race().
After all, if the heuristic can tolerate the occasional bogus value
due to compiler-mangled reads, it can also tolerate the occasional
compiler-mangled write, at least assuming that the proper value is in
place once the write completes.
Plain C-language stores can also be used for this use case. However,
in kernels built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, this
will have the disadvantage of causing KCSAN to generate false positives
because KCSAN will have no way of knowing that the resulting data race
was intentional.
Use of Plain C-Language Accesses
--------------------------------
Here are some example situations where plain C-language accesses should
used instead of READ_ONCE(), WRITE_ONCE(), and data_race():
1. Accesses protected by mutual exclusion, including strict locking
and sequence locking.
2. Initialization-time and cleanup-time accesses. This covers a
wide variety of situations, including the uniprocessor phase of
system boot, variables to be used by not-yet-spawned kthreads,
structures not yet published to reference-counted or RCU-protected
data structures, and the cleanup side of any of these situations.
3. Per-CPU variables that are not accessed from other CPUs.
4. Private per-task variables, including on-stack variables, some
fields in the task_struct structure, and task-private heap data.
5. Any other loads for which there is not supposed to be a concurrent
store to that same variable.
6. Any other stores for which there should be neither concurrent
loads nor concurrent stores to that same variable.
But note that KCSAN makes two explicit exceptions to this rule
by default, refraining from flagging plain C-language stores:
a. No matter what. You can override this default by building
with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n.
b. When the store writes the value already contained in
that variable. You can override this default by building
with CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n.
c. When one of the stores is in an interrupt handler and
the other in the interrupted code. You can override this
default by building with CONFIG_KCSAN_INTERRUPT_WATCHER=y.
Note that it is important to use plain C-language accesses in these cases,
because doing otherwise prevents KCSAN from detecting violations of your
code's synchronization rules.
ACCESS-DOCUMENTATION OPTIONS
============================
It is important to comment marked accesses so that people reading your
code, yourself included, are reminded of the synchronization design.
However, it is even more important to comment plain C-language accesses
that are intentionally involved in data races. Such comments are
needed to remind people reading your code, again, yourself included,
of how the compiler has been prevented from optimizing those accesses
into concurrency bugs.
It is also possible to tell KCSAN about your synchronization design.
For example, ASSERT_EXCLUSIVE_ACCESS(foo) tells KCSAN that any
concurrent access to variable foo by any other CPU is an error, even
if that concurrent access is marked with READ_ONCE(). In addition,
ASSERT_EXCLUSIVE_WRITER(foo) tells KCSAN that although it is OK for there
to be concurrent reads from foo from other CPUs, it is an error for some
other CPU to be concurrently writing to foo, even if that concurrent
write is marked with data_race() or WRITE_ONCE().
Note that although KCSAN will call out data races involving either
ASSERT_EXCLUSIVE_ACCESS() or ASSERT_EXCLUSIVE_WRITER() on the one hand
and data_race() writes on the other, KCSAN will not report the location
of these data_race() writes.
EXAMPLES
========
As noted earlier, the goal is to prevent the compiler from destroying
your concurrent algorithm, to help the human reader, and to inform
KCSAN of aspects of your concurrency design. This section looks at a
few examples showing how this can be done.
Lock Protection With Lockless Diagnostic Access
-----------------------------------------------
For example, suppose a shared variable "foo" is read only while a
reader-writer spinlock is read-held, written only while that same
spinlock is write-held, except that it is also read locklessly for
diagnostic purposes. The code might look as follows:
int foo;
DEFINE_RWLOCK(foo_rwlock);
void update_foo(int newval)
{
write_lock(&foo_rwlock);
foo = newval;
do_something(newval);
write_unlock(&foo_rwlock);
}
int read_foo(void)
{
int ret;
read_lock(&foo_rwlock);
do_something_else();
ret = foo;
read_unlock(&foo_rwlock);
return ret;
}
int read_foo_diagnostic(void)
{
return data_race(foo);
}
The reader-writer lock prevents the compiler from introducing concurrency
bugs into any part of the main algorithm using foo, which means that
the accesses to foo within both update_foo() and read_foo() can (and
should) be plain C-language accesses. One benefit of making them be
plain C-language accesses is that KCSAN can detect any erroneous lockless
reads from or updates to foo. The data_race() in read_foo_diagnostic()
tells KCSAN that data races are expected, and should be silently
ignored. This data_race() also tells the human reading the code that
read_foo_diagnostic() might sometimes return a bogus value.
However, please note that your kernel must be built with
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n in order for KCSAN to
detect a buggy lockless write. If you need KCSAN to detect such a
write even if that write did not change the value of foo, you also
need CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n. If you need KCSAN to
detect such a write happening in an interrupt handler running on the
same CPU doing the legitimate lock-protected write, you also need
CONFIG_KCSAN_INTERRUPT_WATCHER=y. With some or all of these Kconfig
options set properly, KCSAN can be quite helpful, although it is not
necessarily a full replacement for hardware watchpoints. On the other
hand, neither are hardware watchpoints a full replacement for KCSAN
because it is not always easy to tell hardware watchpoint to conditionally
trap on accesses.
Lock-Protected Writes With Lockless Reads
-----------------------------------------
For another example, suppose a shared variable "foo" is updated only
while holding a spinlock, but is read locklessly. The code might look
as follows:
int foo;
DEFINE_SPINLOCK(foo_lock);
void update_foo(int newval)
{
spin_lock(&foo_lock);
WRITE_ONCE(foo, newval);
ASSERT_EXCLUSIVE_WRITER(foo);
do_something(newval);
spin_unlock(&foo_wlock);
}
int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}
Because foo is read locklessly, all accesses are marked. The purpose
of the ASSERT_EXCLUSIVE_WRITER() is to allow KCSAN to check for a buggy
concurrent lockless write.
Lockless Reads and Writes
-------------------------
For another example, suppose a shared variable "foo" is both read and
updated locklessly. The code might look as follows:
int foo;
int update_foo(int newval)
{
int ret;
ret = xchg(&foo, newval);
do_something(newval);
return ret;
}
int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}
Because foo is accessed locklessly, all accesses are marked. It does
not make sense to use ASSERT_EXCLUSIVE_WRITER() in this case because
there really can be concurrent lockless writers. KCSAN would
flag any concurrent plain C-language reads from foo, and given
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n, also any concurrent plain
C-language writes to foo.
Lockless Reads and Writes, But With Single-Threaded Initialization
------------------------------------------------------------------
For yet another example, suppose that foo is initialized in a
single-threaded manner, but that a number of kthreads are then created
that locklessly and concurrently access foo. Some snippets of this code
might look as follows:
int foo;
void initialize_foo(int initval, int nkthreads)
{
int i;
foo = initval;
ASSERT_EXCLUSIVE_ACCESS(foo);
for (i = 0; i < nkthreads; i++)
kthread_run(access_foo_concurrently, ...);
}
/* Called from access_foo_concurrently(). */
int update_foo(int newval)
{
int ret;
ret = xchg(&foo, newval);
do_something(newval);
return ret;
}
/* Also called from access_foo_concurrently(). */
int read_foo(void)
{
do_something_else();
return READ_ONCE(foo);
}
The initialize_foo() uses a plain C-language write to foo because there
are not supposed to be concurrent accesses during initialization. The
ASSERT_EXCLUSIVE_ACCESS() allows KCSAN to flag buggy concurrent unmarked
reads, and the ASSERT_EXCLUSIVE_ACCESS() call further allows KCSAN to
flag buggy concurrent writes, even if: (1) Those writes are marked or
(2) The kernel was built with CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=y.
Checking Stress-Test Race Coverage
----------------------------------
When designing stress tests it is important to ensure that race conditions
of interest really do occur. For example, consider the following code
fragment:
int foo;
int update_foo(int newval)
{
return xchg(&foo, newval);
}
int xor_shift_foo(int shift, int mask)
{
int old, new, newold;
newold = data_race(foo); /* Checked by cmpxchg(). */
do {
old = newold;
new = (old << shift) ^ mask;
newold = cmpxchg(&foo, old, new);
} while (newold != old);
return old;
}
int read_foo(void)
{
return READ_ONCE(foo);
}
If it is possible for update_foo(), xor_shift_foo(), and read_foo() to be
invoked concurrently, the stress test should force this concurrency to
actually happen. KCSAN can evaluate the stress test when the above code
is modified to read as follows:
int foo;
int update_foo(int newval)
{
ASSERT_EXCLUSIVE_ACCESS(foo);
return xchg(&foo, newval);
}
int xor_shift_foo(int shift, int mask)
{
int old, new, newold;
newold = data_race(foo); /* Checked by cmpxchg(). */
do {
old = newold;
new = (old << shift) ^ mask;
ASSERT_EXCLUSIVE_ACCESS(foo);
newold = cmpxchg(&foo, old, new);
} while (newold != old);
return old;
}
int read_foo(void)
{
ASSERT_EXCLUSIVE_ACCESS(foo);
return READ_ONCE(foo);
}
If a given stress-test run does not result in KCSAN complaints from
each possible pair of ASSERT_EXCLUSIVE_ACCESS() invocations, the
stress test needs improvement. If the stress test was to be evaluated
on a regular basis, it would be wise to place the above instances of
ASSERT_EXCLUSIVE_ACCESS() under #ifdef so that they did not result in
false positives when not evaluating the stress test.
REFERENCES
==========
[1] "Concurrency bugs should fear the big bad data-race detector (part 2)"
https://lwn.net/Articles/816854/
[2] "Who's afraid of a big bad optimizing compiler?"
https://lwn.net/Articles/793253/

View File

@ -189,7 +189,6 @@ Additional information may be found in these files:
Documentation/atomic_t.txt
Documentation/atomic_bitops.txt
Documentation/core-api/atomic_ops.rst
Documentation/core-api/refcount-vs-atomic.rst
Reading code using these primitives is often also quite helpful.