linux/arch/x86/lib
Matt Fleming bb5570ad3b x86/asm/64: Align start of __clear_user() loop to 16-bytes
x86 CPUs can suffer severe performance drops if a tight loop, such as
the ones in __clear_user(), straddles a 16-byte instruction fetch
window, or worse, a 64-byte cacheline. This issues was discovered in the
SUSE kernel with the following commit,

  1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")

which increased the code object size from 10 bytes to 15 bytes and
caused the 8-byte copy loop in __clear_user() to be split across a
64-byte cacheline.

Aligning the start of the loop to 16-bytes makes this fit neatly inside
a single instruction fetch window again and restores the performance of
__clear_user() which is used heavily when reading from /dev/zero.

Here are some numbers from running libmicro's read_z* and pread_z*
microbenchmarks which read from /dev/zero:

  Zen 1 (Naples)

  libmicro-file
                                        5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                    revert-1153933703d9+               align16+
  Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
  Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
  Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
  Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
  Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
  Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)

Note that this doesn't affect Haswell or Broadwell microarchitectures
which seem to avoid the alignment issue by executing the loop straight
out of the Loop Stream Detector (verified using perf events).

Fixes: 1153933703 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
2020-06-19 18:32:11 +02:00
..
.gitignore .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
atomic64_32.c x86: Adjust asm constraints in atomic64 wrappers 2012-01-20 17:29:31 -08:00
atomic64_386_32.S x86/asm/32: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 12:03:43 +02:00
atomic64_cx8_32.S x86/asm/32: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 12:03:43 +02:00
cache-smp.c smp: Remove smp_call_function() and on_each_cpu() return values 2019-06-23 14:26:26 +02:00
checksum_32.S x86: Change {JMP,CALL}_NOSPEC argument 2020-04-30 20:14:34 +02:00
clear_page_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
cmdline.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 2019-06-19 17:09:53 +02:00
cmpxchg8b_emu.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
cmpxchg16b_emu.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
copy_page_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
copy_user_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
cpu.c x86/lib/cpu: Address missing prototypes warning 2019-08-08 08:25:53 +02:00
csum-copy_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
csum-partial_64.c x86/lib: Fix indentation issue, remove extra tab 2019-03-21 12:24:38 +01:00
csum-wrappers_64.c x86: switch both 32bit and 64bit to providing csum_and_copy_from_user() 2020-05-29 16:11:48 -04:00
delay.c x86/delay: Introduce TPAUSE delay 2020-05-07 16:06:20 +02:00
error-inject.c x86/asm: Mark all top level asm statements as .text 2019-04-19 17:46:55 +02:00
getuser.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
hweight.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
inat.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
insn-eval.c x86/insn-eval: Add support for 64-bit kernel mode 2019-12-30 20:17:15 +01:00
insn.c x86: xen: insn: Decode Xen and KVM emulate-prefix signature 2019-10-17 21:31:57 +02:00
iomap_copy_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
iomem.c x86: explicitly align IO accesses in memcpy_{to,from}io 2019-02-01 09:07:48 -08:00
kaslr.c x86/kaslr: Fix incorrect i8254 outb() parameters 2019-01-11 21:35:47 +01:00
Makefile kcsan, trace: Make KCSAN compatible with tracing 2020-03-21 09:44:41 +01:00
memcpy_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
memcpy_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
memmove_64.S x86/cpufeatures: Add support for fast short REP; MOVSB 2020-01-08 11:29:25 +01:00
memset_64.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
misc.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mmx_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg-export.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msr-reg.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
msr-smp.c x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well 2018-03-28 10:34:13 +02:00
msr.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
putuser.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
retpoline.S x86/retpoline: Fix retpoline unwind 2020-04-30 20:14:34 +02:00
string_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
strstr_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
usercopy_32.c docs/core-api/mm: fix user memory accessors formatting 2019-03-05 21:07:20 -08:00
usercopy_64.c x86/asm/64: Align start of __clear_user() loop to 16-bytes 2020-06-19 18:32:11 +02:00
usercopy.c x86/nmi: Fix NMI uaccess race against CR3 switching 2018-08-31 17:08:22 +02:00
x86-opcode-map.txt x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map 2020-03-26 12:21:40 +01:00