linux/arch/arm64
Mark Rutland 053f58bab3 arm64: atomics: lse: define RETURN ops in terms of FETCH ops
The FEAT_LSE atomic instructions include LD* instructions which return
the original value of a memory location can be used to directly
implement FETCH opertations. Each RETURN op is implemented as a copy of
the corresponding FETCH op with a trailing instruction to generate the
new value of the memory location. We only directly implement
*_fetch_add*(), for which we have a trailing `add` instruction.

As the compiler has no visibility of the `add`, this leads to less than
optimal code generation when consuming the result.

For example, the compiler cannot constant-fold the addition into later
operations, and currently GCC 11.1.0 will compile:

       return __lse_atomic_sub_return(1, v) == 0;

As:

	mov     w1, #0xffffffff
	ldaddal w1, w2, [x0]
	add     w1, w1, w2
	cmp     w1, #0x0
	cset    w0, eq  // eq = none
	ret

This patch improves this by replacing the `add` with C addition after
the inline assembly block, e.g.

	ret += i;

This allows the compiler to manipulate `i`. This permits the compiler to
merge the `add` and `cmp` for the above, e.g.

	mov     w1, #0xffffffff
	ldaddal w1, w1, [x0]
	cmp     w1, #0x1
	cset    w0, eq  // eq = none
	ret

With this change the assembly for each RETURN op is identical to the
corresponding FETCH op (including barriers and clobbers) so I've removed
the inline assembly and rewritten each RETURN op in terms of the
corresponding FETCH op, e.g.

| static inline void __lse_atomic_add_return(int i, atomic_t *v)
| {
|       return __lse_atomic_fetch_add(i, v) + i
| }

The new construction does not adversely affect the common case, and
before and after this patch GCC 11.1.0 can compile:

	__lse_atomic_add_return(i, v)

As:

	ldaddal w0, w2, [x1]
	add     w0, w0, w2

... while having the freedom to do better elsewhere.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:33 +00:00
..
boot arm64: dts: exynos: drop samsung,ufs-shareability-reg-offset in ExynosAutov9 2021-11-25 14:46:00 +01:00
configs ARM: defconfig updates for 5.16 2021-11-03 17:07:02 -07:00
crypto crypto: arm64/aes-ccm - avoid by-ref argument for ce_aes_ccm_auth_data 2021-09-17 11:05:11 +08:00
hyperv arm64: hyperv: Initialize hypervisor on boot 2021-08-04 16:54:36 +00:00
include arm64: atomics: lse: define RETURN ops in terms of FETCH ops 2021-12-14 13:00:33 +00:00
kernel arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR 2021-11-16 09:47:54 +00:00
kvm KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() 2021-11-18 02:12:14 -05:00
lib Kbuild updates for v5.16 2021-11-08 09:15:45 -08:00
mm kasan: add kasan mode messages when kasan init 2021-11-11 09:34:35 -08:00
net arm64 updates for 5.16 2021-11-01 16:33:53 -07:00
tools Merge branch 'for-next/trbe-errata' into for-next/core 2021-10-29 12:25:33 +01:00
xen xen: allow pv-only hypercalls only with CONFIG_XEN_PV 2021-11-02 08:11:01 -05:00
Kbuild kbuild: use more subdir- for visiting subdirectories while cleaning 2021-10-24 13:49:46 +09:00
Kconfig Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
Kconfig.debug
Kconfig.platforms ARM: SoC drivers for 5.16 2021-11-03 17:00:52 -07:00
Makefile kbuild: use more subdir- for visiting subdirectories while cleaning 2021-10-24 13:49:46 +09:00