MINOR: atomic/arm64: detect and use builtins for the double-word CAS

Gcc 10.2 implements outline atomics on aarch64. The replace all inline
atomic ops with a function call that checks if the machine supports LSE
atomics. This comes with a small cost but allows modern machines to scale
much better than with the old LL/SC ones even when built for full 8.0
compatibility.

This patch enables the use of the __atomic_compare_exchange() builtin
for the double-word CAS when detected as available instead of using the
hand-written LL/SC version. The extra cost is negligible because we do
very few DWCAS operations (essentially FD migrations and shared pools)
so the cost is low but under high contention it can still be beneficial.
As expected no performance difference was measured in either direction
on 4-core machines with this change.

This could be backported to 2.3 if it was shown that FD migrations were
representing a significant source of contention, but for now it does
not appear to be needed.
This commit is contained in:
Willy Tarreau 2021-04-06 09:21:33 +02:00
parent 184b21259b
commit 6756d95a8e

View File

@ -550,8 +550,27 @@ static forceinline int __ha_cas_dw(void *target, void *compare, const void *set)
return ret;
}
#else // no ARMv8.1-A atomics
#elif defined(__SIZEOF_INT128__) && defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_16) // no ARMv8.1-A atomics but 128-bit atomics
/* According to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
* we can use atomics on __int128. The availability of CAS is defined there:
* https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html
* However these usually involve a function call which can be expensive for some
* cases, but gcc 10.2 and above can reroute the function call to either LL/SC for
* v8.0 or LSE for v8.1+, which allows to use a more scalable version on v8.1+ at
* the extra cost of a function call.
*/
/* returns 0 on failure, non-zero on success */
static __inline int __ha_cas_dw(void *target, void *compare, const void *set)
{
return __atomic_compare_exchange((__int128*)target, (__int128*)compare, (const __int128*)set,
0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
}
#else // neither ARMv8.1-A atomics nor 128-bit atomics
/* returns 0 on failure, non-zero on success */
static __inline int __ha_cas_dw(void *target, void *compare, void *set)
{
void *value[2];