linux

iv/linux

History

Linus Torvalds c228d294f2 x86: explicitly align IO accesses in memcpy_{to,from}io In commit `170d13ca3a` ("x86: re-introduce non-generic memcpy_{to,from}io") I made our copy from IO space use a separate copy routine rather than rely on the generic memcpy. I did that because our generic memory copy isn't actually well-defined when it comes to internal access ordering or alignment, and will in fact depend on various CPUID flags. In particular, the default memcpy() for a modern Intel CPU will generally be just a "rep movsb", which works reasonably well for medium-sized memory copies of regular RAM, since the CPU will turn it into fairly optimized microcode. However, for non-cached memory and IO, "rep movs" ends up being horrendously slow and will just do the architectural "one byte at a time" accesses implied by the movsb. At the other end of the spectrum, if you _don't_ end up using the "rep movsb" code, you'd likely fall back to the software copy, which does overlapping accesses for the tail, and may copy things backwards. Again, for regular memory that's fine, for IO memory not so much. The thinking was that clearly nobody really cared (because things worked), but some people had seen horrible performance due to the byte accesses, so let's just revert back to our long ago version that dod "rep movsl" for the bulk of the copy, and then fixed up the potentially last few bytes of the tail with "movsw/b". Interestingly (and perhaps not entirely surprisingly), while that was our original memory copy implementation, and had been used before for IO, in the meantime many new users of memcpy_io() had come about. And while the access patterns for the memory copy weren't well-defined (so arguably _any_ access pattern should work), in practice the "rep movsb" case had been very common for the last several years. In particular Jarkko Sakkinen reported that the memcpy_io() change resuled in weird errors from his Geminilake NUC TPM module. And it turns out that the TPM TCG accesses according to spec require that the accesses be (a) done strictly sequentially (b) be naturally aligned otherwise the TPM chip will abort the PCI transaction. And, in fact, the tpm_crb.c driver did this: memcpy_fromio(buf, priv->rsp, 6); ... memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6); which really should never have worked in the first place, but back before commit `170d13ca3a` it happened to work, because the memcpy_fromio() would be expanded to a regular memcpy, and (a) gcc would expand the first memcpy in-line, and turn it into a 4-byte and a 2-byte read, and they happened to be in the right order, and the alignment was right. (b) gcc would call "memcpy()" for the second one, and the machines that had this TPM chip also apparently ended up always having ERMS ("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep movbs" for that copy. In other words, basically by pure luck, the code happened to use the right access sizes in the (two different!) memcpy() implementations to make it all work. But after commit `170d13ca3a`, both of the memcpy_fromio() calls resulted in a call to the routine with the consistent memory accesses, and in both cases it started out transferring with 4-byte accesses. Which worked for the first copy, but resulted in the second copy doing a 32-bit read at an address that was only 2-byte aligned. Jarkko is actually fixing the fragile code in the TPM driver, but since this is an excellent example of why we absolutely must not use a generic memcpy for IO accesses, _and_ an IO-specific one really should strive to align the IO accesses, let's do exactly that. Side note: Jarkko also noted that the driver had been used on ARM platforms, and had worked. That was because on 32-bit ARM, memcpy_io() ends up always doing byte accesses, and on 64-bit ARM it first does byte accesses to align to 8-byte boundaries, and then does 8-byte accesses for the bulk. So ARM actually worked by design, and the x86 case worked by pure luck. We might* want to make x86-64 do the 8-byte case too. That should be a pretty straightforward extension, but let's do one thing at a time. And generally MMIO accesses aren't really all that performance-critical, as shown by the fact that for a long time we just did them a byte at a time, and very few people ever noticed. Reported-and-tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jerry Snitselaar <jsnitsel@redhat.com> Cc: David Laight <David.Laight@aculab.com> Fixes: `170d13ca3a` ("x86: re-introduce non-generic memcpy_{to,from}io") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-02-01 09:07:48 -08:00
..
.gitignore
atomic64_32.c
atomic64_386_32.S	x86/debug: Remove perpetually broken, unmaintainable dwarf annotations	2015-06-02 07:57:48 +02:00
atomic64_cx8_32.S	x86/debug: Remove perpetually broken, unmaintainable dwarf annotations	2015-06-02 07:57:48 +02:00
cache-smp.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
checksum_32.S	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups	2018-09-03 15:12:09 +02:00
clear_page_64.S	x86/asm: Trim clear_page.S includes	2018-02-13 17:37:07 +01:00
cmdline.c	x86/boot: Add early cmdline parsing for options with arguments	2017-07-18 11:38:06 +02:00
cmpxchg8b_emu.S	x86: move exports to actual definitions	2016-08-07 23:47:15 -04:00
cmpxchg16b_emu.S	x86/debug: Remove perpetually broken, unmaintainable dwarf annotations	2015-06-02 07:57:48 +02:00
copy_page_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
copy_user_64.S	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups	2018-09-03 15:12:09 +02:00
cpu.c	x86/cpu: Rename cpu_data.x86_mask to cpu_data.x86_stepping	2018-02-15 01:15:52 +01:00
csum-copy_64.S	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups	2018-09-03 15:12:09 +02:00
csum-partial_64.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
csum-wrappers_64.c	Remove 'type' argument from access_ok() function	2019-01-03 18:57:57 -08:00
delay.c	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-01-30 13:01:09 -08:00
error-inject.c	x86/error_inject: Make just_return_func() globally visible	2018-02-13 14:33:35 +01:00
getuser.S	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups	2018-09-03 15:12:09 +02:00
hweight.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
inat.c
insn-eval.c	x86/umip: Fix insn_get_code_seg_params()'s return value	2017-11-23 20:17:59 +01:00
insn.c	x86/insn: Add AVX-512 support to the instruction decoder	2016-07-21 09:37:11 -03:00
iomap_copy_64.S	x86/debug: Remove perpetually broken, unmaintainable dwarf annotations	2015-06-02 07:57:48 +02:00
iomem.c	x86: explicitly align IO accesses in memcpy_{to,from}io	2019-02-01 09:07:48 -08:00
kaslr.c	x86/kaslr: Fix incorrect i8254 outb() parameters	2019-01-11 21:35:47 +01:00
Makefile	kbuild: remove redundant target cleaning on failure	2019-01-06 09:46:51 +09:00
memcpy_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
memcpy_64.S	x86/asm/64: Use 32-bit XOR to zero registers	2018-07-03 09:59:29 +02:00
memmove_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
memset_64.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
misc.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
mmx_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-reg-export.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-reg.S	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
msr-smp.c	x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well	2018-03-28 10:34:13 +02:00
msr.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
putuser.S	x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups	2018-09-03 15:12:09 +02:00
retpoline.S	Revert "x86/retpoline: Simplify vmexit_fill_RSB()"	2018-02-20 09:38:26 +01:00
rwsem.S	locking/arch, x86: Add __down_read_killable()	2017-10-10 11:50:15 +02:00
string_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
strstr_32.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
usercopy_32.c	Remove 'type' argument from access_ok() function	2019-01-03 18:57:57 -08:00
usercopy_64.c	Remove 'type' argument from access_ok() function	2019-01-03 18:57:57 -08:00
usercopy.c	x86/nmi: Fix NMI uaccess race against CR3 switching	2018-08-31 17:08:22 +02:00
x86-opcode-map.txt	x86/decoder: Fix and update the opcodes map	2017-12-15 13:45:20 +01:00