linux/arch/powerpc
Simon Guo c2a4e54e8b powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp()
This patch is based on the previous VMX patch on memcmp().

To optimize ppc64 memcmp() with VMX instruction, we need to think about
the VMX penalty brought with: If kernel uses VMX instruction, it needs
to save/restore current thread's VMX registers. There are 32 x 128 bits
VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store.

The major concern regarding the memcmp() performance in kernel is KSM,
who will use memcmp() frequently to merge identical pages. So it will
make sense to take some measures/enhancement on KSM to see whether any
improvement can be done here.  Cyril Bur indicates that the memcmp() for
KSM has a higher possibility to fail (unmatch) early in previous bytes
in following mail.
	https://patchwork.ozlabs.org/patch/817322/#1773629
And I am taking a follow-up on this with this patch.

Per some testing, it shows KSM memcmp() will fail early at previous 32
bytes.  More specifically:
    - 76% cases will fail/unmatch before 16 bytes;
    - 83% cases will fail/unmatch before 32 bytes;
    - 84% cases will fail/unmatch before 64 bytes;
So 32 bytes looks a better choice than other bytes for pre-checking.

The early failure is also true for memcmp() for non-KSM case. With a
non-typical call load, it shows ~73% cases fail before first 32 bytes.

This patch adds a 32 bytes pre-checking firstly before jumping into VMX
operations, to avoid the unnecessary VMX penalty. It is not limited to
KSM case. And the testing shows ~20% improvement on memcmp() average
execution time with this patch.

And note the 32B pre-checking is only performed when the compare size
is long enough (>=4K currently) to allow VMX operation.

The detail data and analysis is at:
https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:21 +10:00
..
boot powerpc/dts: Use a correct at24 compatible fallback in ac14xx 2018-07-10 10:58:40 +10:00
configs powerpc: Add ppc32_allmodconfig defconfig target 2018-07-24 22:03:15 +10:00
crypto crypto: hash - annotate algorithms taking optional key 2018-01-12 23:03:35 +11:00
include powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision 2018-07-24 22:03:21 +10:00
kernel powerpc/mm: Check memblock_add against MAX_PHYSMEM_BITS range 2018-07-24 22:03:16 +10:00
kvm powerpc/powernv/ioda: Allocate indirect TCE levels on demand 2018-07-16 22:53:11 +10:00
lib powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() 2018-07-24 22:03:21 +10:00
math-emu License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mm powerpc/mm/hash: Reduce contention on hpte lock 2018-07-24 22:03:18 +10:00
net treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
oprofile treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
perf powerpc/64s: Remove POWER9 DD1 support 2018-07-16 11:37:21 +10:00
platforms powerpc/pseries/mm: Improve error reporting on HCALL failures 2018-07-24 22:03:19 +10:00
purgatory License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysdev powerpc/mpic: Pass first free vector number to mpic_setup_error_int() 2018-07-19 21:58:09 +10:00
tools powerpc/kbuild: move -mprofile-kernel check to Kconfig 2018-06-11 09:16:29 +09:00
xmon Merge branch 'topic/ppc-kvm' into next 2018-07-19 14:37:57 +10:00
Kconfig powerpc: Enable kernel XZ compression option on BOOK3S_32 2018-07-04 22:41:10 +10:00
Kconfig.debug powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUG 2018-01-19 22:37:03 +11:00
Makefile powerpc: Add ppc64le and ppc64_book3e allmodconfig targets 2018-07-24 22:03:16 +10:00
Makefile.postlink License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00