e031a5f3f1
Traditionally, LoongArch uses "dbar 0" (full completion barrier) for everything. But the full completion barrier is a performance killer, so Loongson-3A6000 and newer processors have made finer granularity hints available: Bit4: ordering or completion (0: completion, 1: ordering) Bit3: barrier for previous read (0: true, 1: false) Bit2: barrier for previous write (0: true, 1: false) Bit1: barrier for succeeding read (0: true, 1: false) Bit0: barrier for succeeding write (0: true, 1: false) Hint 0x700: barrier for "read after read" from the same address, which is needed by LL-SC loops on old models (dbar 0x700 behaves the same as nop if such reordering is disabled on new models). This patch makes use of the various new hints for different kinds of memory barriers. It brings performance improvements on Loongson-3A6000 series, while not affecting the existing models because all variants are treated as 'dbar 0' there. Why override queued_spin_unlock()? After commit 01e3b958efe85a26d9b ("drivers: Remove explicit invocations of mmiowb()") we need a completion barrier in queued_spin_unlock(), but the generic implementation use smp_store_release() which only provide an ordering barrier. Signed-off-by: Jun Yi <yijun@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
19 lines
400 B
C
19 lines
400 B
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _ASM_QSPINLOCK_H
|
|
#define _ASM_QSPINLOCK_H
|
|
|
|
#include <asm-generic/qspinlock_types.h>
|
|
|
|
#define queued_spin_unlock queued_spin_unlock
|
|
|
|
static inline void queued_spin_unlock(struct qspinlock *lock)
|
|
{
|
|
compiletime_assert_atomic_type(lock->locked);
|
|
c_sync();
|
|
WRITE_ONCE(lock->locked, 0);
|
|
}
|
|
|
|
#include <asm-generic/qspinlock.h>
|
|
|
|
#endif /* _ASM_QSPINLOCK_H */
|