61 Commits

Author SHA1 Message Date
Masahiro Yamada
94ea9c0521 x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
The following commit:

  ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")

deprecated <asm/export.h>, which is now a wrapper of <linux/export.h>.

Use <linux/export.h> in *.S as well as in *.c files.

After all the <asm/export.h> lines are replaced, <asm/export.h> and
<asm-generic/export.h> will be removed.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230806145958.380314-2-masahiroy@kernel.org
2023-10-03 10:38:07 +02:00
Mateusz Guzik
ca96b162bf x86: bring back rep movsq for user access on CPUs without ERMS
Intel CPUs ship with ERMS for over a decade, but this is not true for
AMD.  In particular one reasonably recent uarch (EPYC 7R13) does not
have it (or at least the bit is inactive when running on the Amazon EC2
cloud -- I found rather conflicting information about AMD CPUs vs the
extension).

Hand-rolled mov loops executing in this case are quite pessimal compared
to rep movsq for bigger sizes.  While the upper limit depends on uarch,
everyone is well south of 1KB AFAICS and sizes bigger than that are
common.

While technically ancient CPUs may be suffering from rep usage, gcc has
been emitting it for years all over kernel code, so I don't think this
is a legitimate concern.

Sample result from read1_processes from will-it-scale (4KB reads/s):

  before:   1507021
  after:    1721828 (+14%)

Note that the cutoff point for rep usage is set to 64 bytes, which is
way too conservative but I'm sticking to what was done in 47ee3f1dd93b
("x86: re-introduce support for ERMS copies for user space accesses").
That is to say *some* copies will now go slower, which is fixable but
beyond the scope of this patch.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-30 09:45:12 -07:00
Linus Torvalds
47ee3f1dd9 x86: re-introduce support for ERMS copies for user space accesses
I tried to streamline our user memory copy code fairly aggressively in
commit adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory
copies"), in order to then be able to clean up the code and inline the
modern FSRM case in commit 577e6a7fd50d ("x86: inline the 'rep movs' in
user copies for the FSRM case").

We had reports [1] of that causing regressions earlier with blogbench,
but that turned out to be a horrible benchmark for that case, and not a
sufficient reason for re-instating "rep movsb" on older machines.

However, now Eric Dumazet reported [2] a regression in performance that
seems to be a rather more real benchmark, where due to the removal of
"rep movs" a TCP stream over a 100Gbps network no longer reaches line
speed.

And it turns out that with the simplified the calling convention for the
non-FSRM case in commit 427fda2c8a49 ("x86: improve on the non-rep
'copy_user' function"), re-introducing the ERMS case is actually fairly
simple.

Of course, that "fairly simple" is glossing over several missteps due to
having to fight our assembler alternative code.  This code really wanted
to rewrite a conditional branch to have two different targets, but that
made objtool sufficiently unhappy that this instead just ended up doing
a choice between "jump to the unrolled loop, or use 'rep movsb'
directly".

Let's see if somebody finds a case where the kernel memory copies also
care (see commit 68674f94ffc9: "x86: don't use REP_GOOD or ERMS for
small memory copies").  But Eric does argue that the user copies are
special because networking tries to copy up to 32KB at a time, if
order-3 pages allocations are possible.

In-kernel memory copies are typically small, unless they are the special
"copy pages at a time" kind that still use "rep movs".

Link: https://lore.kernel.org/lkml/202305041446.71d46724-yujie.liu@intel.com/ [1]
Link: https://lore.kernel.org/lkml/CANn89iKUbyrJ=r2+_kK+sb2ZSSHifFZ7QkPLDpAtkJ8v4WUumA@mail.gmail.com/ [2]
Reported-and-tested-by: Eric Dumazet <edumazet@google.com>
Fixes: adfcf4231b8c ("x86: don't use REP_GOOD or ERMS for user memory copies")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-26 12:34:20 -07:00
Linus Torvalds
034ff37d34 x86: rewrite '__copy_user_nocache' function
I didn't really want to do this, but as part of all the other changes to
the user copy loops, I've been looking at this horror.

I tried to clean it up multiple times, but every time I just found more
problems, and the way it's written, it's just too hard to fix them.

For example, the code is written to do quad-word alignment, and will use
regular byte accesses to get to that point.  That's fairly simple, but
it means that any initial 8-byte alignment will be done with cached
copies.

However, the code then is very careful to do any 4-byte _tail_ accesses
using an uncached 4-byte write, and that was claimed to be relevant in
commit a82eee742452 ("x86/uaccess/64: Handle the caching of 4-byte
nocache copies properly in __copy_user_nocache()").

So if you do a 4-byte copy using that function, it carefully uses a
4-byte 'movnti' for the destination.  But if you were to do a 12-byte
copy that is 4-byte aligned, it would _not_ do a 4-byte 'movnti'
followed by a 8-byte 'movnti' to keep it all uncached.

Instead, it would align the destination to 8 bytes using a
byte-at-a-time loop, and then do a 8-byte 'movnti' for the final 8
bytes.

The main caller that cares is __copy_user_flushcache(), which knows
about this insanity, and has odd cases for it all.  But I just can't
deal with looking at this kind of "it does one case right, and another
related case entirely wrong".

And the code really wasn't fixable without hard drugs, which I try to
avoid.

So instead, rewrite it in a form that hopefully not only gets this
right, but is a bit more maintainable.  Knock wood.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-20 18:53:49 -07:00
Linus Torvalds
e1f2750edc x86: remove 'zerorest' argument from __copy_user_nocache()
Every caller passes in zero, meaning they don't want any partial copy to
zero the remainder of the destination buffer.

Which is just as well, because the implementation of that function
didn't actually even look at that argument, and wasn't even aware it
existed, although some misleading comments did mention it still.

The 'zerorest' thing is a historical artifact of how "copy_from_user()"
worked, in that it would zero the rest of the kernel buffer that it
copied into.

That zeroing still exists, but it's long since been moved to generic
code, and the raw architecture-specific code doesn't do it.  See
_copy_from_user() in lib/usercopy.c for this all.

However, while __copy_user_nocache() shares some history and superficial
other similarities with copy_from_user(), it is in many ways also very
different.

In particular, while the code makes it *look* similar to the generic
user copy functions that can copy both to and from user space, and take
faults on both reads and writes as a result, __copy_user_nocache() does
no such thing at all.

__copy_user_nocache() always copies to kernel space, and will never take
a page fault on the destination.  What *can* happen, though, is that the
non-temporal stores take a machine check because one of the use cases is
for writing to stable memory, and any memory errors would then take
synchronous faults.

So __copy_user_nocache() does look a lot like copy_from_user(), but has
faulting behavior that is more akin to our old copy_in_user() (which no
longer exists, but copied from user space to user space and could fault
on both source and destination).

And it very much does not have the "zero the end of the destination
buffer", since a problem with the destination buffer is very possibly
the very source of the partial copy.

So this whole thing was just a confusing historical artifact from having
shared some code with a completely different function with completely
different use cases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-19 19:09:52 -07:00
Linus Torvalds
427fda2c8a x86: improve on the non-rep 'copy_user' function
The old 'copy_user_generic_unrolled' function was oddly implemented for
largely historical reasons: it had been largely based on the uncached
copy case, which has some other concerns.

For example, the __copy_user_nocache() function uses 'movnti' for the
destination stores, and those want the destination to be aligned.  In
contrast, the regular copy function doesn't really care, and trying to
align things only complicates matters.

Also, like the clear_user function, the copy function had some odd
handling of the repeat counts, complicating the exception handling for
no really good reason.  So as with clear_user, just write it to keep all
the byte counts in the %rcx register, exactly like the 'rep movs'
functionality that this replaces.

Unlike a real 'rep movs', we do allow for this to trash a few temporary
registers to not have to unnecessarily save/restore registers on the
stack.

And like the clearing case, rename this to what it now clearly is:
'rep_movs_alternative', and make it one coherent function, so that it
shows up as such in profiles (instead of the odd split between
"copy_user_generic_unrolled" and "copy_user_short_string", the latter of
which was not about strings at all, and which was shared with the
uncached case).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18 17:05:28 -07:00
Linus Torvalds
577e6a7fd5 x86: inline the 'rep movs' in user copies for the FSRM case
This does the same thing for the user copies as commit 0db7058e8e23
("x86/clear_user: Make it faster") did for clear_user().  In other
words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set,
avoiding the function call entirely.

In order to do that, it makes the calling convention for the out-of-line
case ("copy_user_generic_unrolled") match the 'rep movs' calling
convention, although it does also end up clobbering a number of
additional registers.

Also, to simplify code sharing in the low-level assembly with the
__copy_user_nocache() function (that uses the normal C calling
convention), we end up with a kind of mixed return value for the
low-level asm code: it will return the result in both %rcx (to work as
an alternative for the 'rep movs' case), _and_ in %rax (for the nocache
case).

We could avoid this by wrapping __copy_user_nocache() callers in an
inline asm, but since the cost is just an extra register copy, it's
probably not worth it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18 17:05:28 -07:00
Linus Torvalds
3639a53558 x86: move stac/clac from user copy routines into callers
This is preparatory work for inlining the 'rep movs' case, but also a
cleanup.  The __copy_user_nocache() function was mis-used by the rdma
code to do uncached kernel copies that don't actually want user copies
at all, and as a result doesn't want the stac/clac either.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18 17:05:28 -07:00
Linus Torvalds
adfcf4231b x86: don't use REP_GOOD or ERMS for user memory copies
The modern target to use is FSRM (Fast Short REP MOVS), and the other
cases should only be used for bigger areas (ie mainly things like page
clearing).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18 17:05:28 -07:00
Josh Poimboeuf
02041b3225 x86/uaccess: Don't jump between functions
For unwinding sanity, a function shouldn't jump to the middle of another
function.  Move the short string user copy code out to a separate
non-function code snippet.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9519e4853148b765e047967708f2b61e56c93186.1649718562.git.jpoimboe@redhat.com
2022-04-19 21:58:53 +02:00
Linus Torvalds
64ad946152 - Get rid of all the .fixup sections because this generates
misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
 LIVEPATCH as the backtrace misses the function which is being fixed up.
 
 - Add Straight Light Speculation mitigation support which uses a new
 compiler switch -mharden-sls= which sticks an INT3 after a RET or an
 indirect branch in order to block speculation after them. Reportedly,
 CPUs do speculate behind such insns.
 
 - The usual set of cleanups and improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHfKA0ACgkQEsHwGGHe
 VUqLJg/2I2X2xXr5filJVaK+sQgmvDzk67DKnbxRBW2xcPF+B5sSW5yhe3G5UPW7
 SJVdhQ3gHcTiliGGlBf/VE7KXbqxFN0vO4/VFHZm78r43g7OrXTxz6WXXQRJ1n67
 U3YwRH3b6cqXZNFMs+X4bJt6qsGJM1kdTTZ2as4aERnaFr5AOAfQvfKbyhxLe/XA
 3SakfYISVKCBQ2RkTfpMpwmqlsatGFhTC5IrvuDQ83dDsM7O+Dx1J6Gu3fwjKmie
 iVzPOjCh+xTpZQp/SIZmt7MzoduZvpSym4YVyHvEnMiexQT4AmyaRthWqrhnEXY/
 qOvj8/XIqxmix8EaooGqRIK0Y2ZegxkPckNFzaeC3lsWohwMIGIhNXwHNEeuhNyH
 yvNGAW9Cq6NeDRgz5MRUXcimYw4P4oQKYLObS1WqFZhNMqm4sNtoEAYpai/lPYfs
 zUDckgXF2AoPOsSqy3hFAVaGovAgzfDaJVzkt0Lk4kzzjX2WQiNLhmiior460w+K
 0l2Iej58IajSp3MkWmFH368Jo8YfUVmkjbbpsmjsBppA08e1xamJB7RmswI/Ezj6
 s5re6UioCD+UYdjWx41kgbvYdvIkkZ2RLrktoZd/hqHrOLWEIiwEbyFO2nRFJIAh
 YjvPkB1p7iNuAeYcP1x9Ft9GNYVIsUlJ+hK86wtFCqy+abV+zQ==
 =R52z
 -----END PGP SIGNATURE-----

Merge tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 core updates from Borislav Petkov:

 - Get rid of all the .fixup sections because this generates
   misleading/wrong stacktraces and confuse RELIABLE_STACKTRACE and
   LIVEPATCH as the backtrace misses the function which is being fixed
   up.

 - Add Straight Line Speculation mitigation support which uses a new
   compiler switch -mharden-sls= which sticks an INT3 after a RET or an
   indirect branch in order to block speculation after them. Reportedly,
   CPUs do speculate behind such insns.

 - The usual set of cleanups and improvements

* tag 'x86_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  x86/entry_32: Fix segment exceptions
  objtool: Remove .fixup handling
  x86: Remove .fixup section
  x86/word-at-a-time: Remove .fixup usage
  x86/usercopy: Remove .fixup usage
  x86/usercopy_32: Simplify __copy_user_intel_nocache()
  x86/sgx: Remove .fixup usage
  x86/checksum_32: Remove .fixup usage
  x86/vmx: Remove .fixup usage
  x86/kvm: Remove .fixup usage
  x86/segment: Remove .fixup usage
  x86/fpu: Remove .fixup usage
  x86/xen: Remove .fixup usage
  x86/uaccess: Remove .fixup usage
  x86/futex: Remove .fixup usage
  x86/msr: Remove .fixup usage
  x86/extable: Extend extable functionality
  x86/entry_32: Remove .fixup usage
  x86/entry_64: Remove .fixup usage
  x86/copy_mc_64: Remove .fixup usage
  ...
2022-01-12 16:31:19 -08:00
Linus Torvalds
7e740ae635 - First part of a series to move the AMD address translation code from
arch/x86/ to amd64_edac as that is its only user anyway
 
 - Some MCE error injection improvements to the AMD side
 
 - Reorganization of the #MC handler code and the facilities it calls to
 make it noinstr-safe
 
 - Add support for new AMD MCA bank types and non-uniform banks layout
 
 - The usual set of cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcGZ4ACgkQEsHwGGHe
 VUr6Zw//WBvNvfV/akQGsvVo94G0DaF+buYB+Tl1p0goMd7QfKA5iHxjB1alEJC2
 dTchIr7pjiiE3nr4svuWLLQZamx8kMwQNqipioBHXg3YThj0wD4PbUOC9TlIceBR
 3yxVbvwlD7Y7sb2PII6IMlagzTiIeW0/ps29DHFr5vqDBvEanNdAHoV/h2vQi+76
 Ma96psIxzTMSk11yGB6l9k66EASCdDGBU7sODjup7wuQmuRaQ/1oJAWY0wIJvJez
 frjpaz/YKmlTwTf9bxoJbky2FkeBsD4yXXUGwjDgMq0EyUUaeSbvaQkm8gSHX9Yr
 VDDv1WvT6QIw6x7Wc4skS8lWmZghNBbAHOoNS31BPJ2IDmFWkF5Q2bNEuHrtU4EC
 0mkNeyN6x48L/F8j/1aE/tm+SjiGexZX4zhi6MNWReTV140I1zqQq/r7CCu5+MEa
 PAB1YH/96k2dMPT6mbFrRIFJmkDuBuZOAkuwYWEjO/XjPl2SGBGj1jKolWW3qjRR
 Po7vBJnDt7wgigWFh6+R4rJv+fh87XfB7B2wEOt4Yn37jUkK6dNRIy0zFmDaC1J2
 bHgsJbWC+Sgs1G57gnYABJYzLj7RRdDyCu1/UUVyBBP7/WfZJw0kjABE7p3AaYTd
 15JV1L0c/Ypuv05LJf40LkyF2F5w2fnP5QM2Rr8U4xW/GumEyWs=
 =8Hu7
 -----END PGP SIGNATURE-----

Merge tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RAS updates from Borislav Petkov:
 "A relatively big amount of movements in RAS-land this time around:

   - First part of a series to move the AMD address translation code
     from arch/x86/ to amd64_edac as that is its only user anyway

   - Some MCE error injection improvements to the AMD side

   - Reorganization of the #MC handler code and the facilities it calls
     to make it noinstr-safe

   - Add support for new AMD MCA bank types and non-uniform banks layout

   - The usual set of cleanups and fixes"

* tag 'ras_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/mce: Reduce number of machine checks taken during recovery
  x86/mce/inject: Avoid out-of-bounds write when setting flags
  x86/MCE/AMD, EDAC/mce_amd: Support non-uniform MCA bank type enumeration
  x86/MCE/AMD, EDAC/mce_amd: Add new SMCA bank types
  x86/mce: Check regs before accessing it
  x86/mce: Mark mce_start() noinstr
  x86/mce: Mark mce_timed_out() noinstr
  x86/mce: Move the tainting outside of the noinstr region
  x86/mce: Mark mce_read_aux() noinstr
  x86/mce: Mark mce_end() noinstr
  x86/mce: Mark mce_panic() noinstr
  x86/mce: Prevent severity computation from being instrumented
  x86/mce: Allow instrumentation during task work queueing
  x86/mce: Remove noinstr annotation from mce_setup()
  x86/mce: Use mce_rdmsrl() in severity checking code
  x86/mce: Remove function-local cpus variables
  x86/mce: Do not use memset to clear the banks bitmaps
  x86/mce/inject: Set the valid bit in MCA_STATUS before error injection
  x86/mce/inject: Check if a bank is populated before injecting
  x86/mce: Get rid of cpu_missing
  ...
2022-01-10 11:43:09 -08:00
Youquan Song
3376136300 x86/mce: Reduce number of machine checks taken during recovery
When any of the copy functions in arch/x86/lib/copy_user_64.S take a
fault, the fixup code copies the remaining byte count from %ecx to %edx
and unconditionally jumps to .Lcopy_user_handle_tail to continue the
copy in case any more bytes can be copied.

If the fault was #PF this may copy more bytes (because the page fault
handler might have fixed the fault). But when the fault is a machine
check the original copy code will have copied all the way to the poisoned
cache line. So .Lcopy_user_handle_tail will just take another machine
check for no good reason.

Every code path to .Lcopy_user_handle_tail comes from an exception fixup
path, so add a check there to check the trap type (in %eax) and simply
return the count of remaining bytes if the trap was a machine check.

Doing this reduces the number of machine checks taken during synthetic
tests from four to three.

As well as reducing the number of machine checks, this also allows
Skylake generation Xeons to recover some cases that currently fail. The
is because REP; MOVSB is only recoverable when source and destination
are well aligned and the byte count is large. That useless call to
.Lcopy_user_handle_tail may violate one or more of these conditions and
generate a fatal machine check.

  [ Tony: Add more details to commit message. ]
  [ bp: Fixup comment.
    Also, another tip patchset which is adding straight-line speculation
    mitigation changes the "ret" instruction to an all-caps macro "RET".
    But, since gas is case-insensitive, use "RET" in the newly added asm block
    already in order to simplify tip branch merging on its way upstream.
  ]

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/YcTW5dh8yTGucDd+@agluck-desk2.amr.corp.intel.com
2021-12-31 18:22:32 +01:00
Tony Luck
244122b4d2 x86/lib: Add fast-short-rep-movs check to copy_user_enhanced_fast_string()
Commit

  f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")

fixed memmove() with an ALTERNATIVE that will use REP MOVSB for all
string lengths.

copy_user_enhanced_fast_string() has a similar run time check to avoid
using REP MOVSB for copies less that 64 bytes.

Add an ALTERNATIVE to patch out the short length check and always use
REP MOVSB on X86_FEATURE_FSRM CPUs.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211216172431.1396371-1-tony.luck@intel.com
2021-12-29 13:46:02 +01:00
Peter Zijlstra
acba44d243 x86/copy_user_64: Remove .fixup usage
Place the anonymous .fixup code at the tail of the regular functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211110101325.068505810@infradead.org
2021-12-11 09:09:45 +01:00
Peter Zijlstra
f94909ceb1 x86: Prepare asm files for straight-line-speculation
Replace all ret/retq instructions with RET in preparation of making
RET a macro. Since AS is case insensitive it's a big no-op without
RET defined.

  find arch/x86/ -name \*.S | while read file
  do
	sed -i 's/\<ret[q]*\>/RET/' $file
  done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org
2021-12-08 12:25:37 +01:00
Tony Luck
690658471b x86/mce: Drop copyin special case for #MC
Fixes to the iterator code to handle faults that are not on page
boundaries mean that the special case for machine check during copy from
user is no longer needed.

For a full list of those fixes, see the output of:

  git log --oneline v5.14 ^v5.13 -- lib/iov_iter.c

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210818002942.1607544-4-tony.luck@intel.com
2021-09-20 21:18:23 +02:00
Juergen Gross
5e21a3ecad x86/alternative: Merge include files
Merge arch/x86/include/asm/alternative-asm.h into
arch/x86/include/asm/alternative.h in order to make it easier to use
common definitions later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210311142319.4723-2-jgross@suse.com
2021-03-11 15:58:02 +01:00
Tony Luck
a2f73400e4 x86/mce: Avoid tail copy when machine check terminated a copy from user
In the page fault case it is ok to see if a few more unaligned bytes
can be copied from the source address. Worst case is that the page fault
will be triggered again.

Machine checks are more serious. Just give up at the point where the
main copy loop triggered the #MC and return from the copy code as if
the copy succeeded. The machine check handler will use task_work_add() to
make sure that the task is sent a SIGBUS.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201006210910.21062-5-tony.luck@intel.com
2020-10-07 11:26:56 +02:00
Youquan Song
278b917f8c x86/mce: Add _ASM_EXTABLE_CPY for copy user access
_ASM_EXTABLE_UA is a general exception entry to record the exception fixup
for all exception spots between kernel and user space access.

To enable recovery from machine checks while coping data from user
addresses it is necessary to be able to distinguish the places that are
looping copying data from those that copy a single byte/word/etc.

Add a new macro _ASM_EXTABLE_CPY and use it in place of _ASM_EXTABLE_UA
in the copy functions.

Record the exception reason number to regs->ax at
ex_handler_uaccess which is used to check MCE triggered.

The new fixup routine ex_handler_copy() is almost an exact copy of
ex_handler_uaccess() The difference is that it sets regs->ax to the trap
number. Following patches use this to avoid trying to copy remaining
bytes from the tail of the copy and possibly hitting the poison again.

New mce.kflags bit MCE_IN_KERNEL_COPYIN will be used by mce_severity()
calculation to indicate that a machine check is recoverable because the
kernel was copying from user space.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201006210910.21062-4-tony.luck@intel.com
2020-10-07 11:19:11 +02:00
Jiri Slaby
6dcc5627f6 x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*
These are all functions which are invoked from elsewhere, so annotate
them as global using the new SYM_FUNC_START and their ENDPROC's by
SYM_FUNC_END.

Make sure ENTRY/ENDPROC is not defined on X86_64, given these were the
last users.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [hibernate]
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen bits]
Acked-by: Herbert Xu <herbert@gondor.apana.org.au> [crypto]
Cc: Allison Randal <allison@lohutok.net>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Armijn Hemel <armijn@tjaldur.nl>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Enrico Weigelt <info@metux.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: kvm ML <kvm@vger.kernel.org>
Cc: Len Brown <len.brown@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-efi <linux-efi@vger.kernel.org>
Cc: linux-efi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: platform-driver-x86@vger.kernel.org
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: xen-devel@lists.xenproject.org
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20191011115108.12392-25-jslaby@suse.cz
2019-10-18 11:58:33 +02:00
Jiri Slaby
fa97220196 x86/uaccess: Annotate local function
.Lcopy_user_handle_tail is a self-standing local function, annotate it
as such using SYM_CODE_START_LOCAL.

Again, no functional change, just documentation.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191011115108.12392-9-jslaby@suse.cz
2019-10-18 10:31:42 +02:00
Jiri Slaby
98ededb61f x86/asm: Make some functions local labels
Boris suggests to make a local label (prepend ".L") to these functions
to eliminate them from the symbol table. These are functions with very
local names and really should not be visible anywhere.

Note that objtool won't see these functions anymore (to generate ORC
debug info). But all the functions are not annotated with ENDPROC, so
they won't have objtool's attention anyway.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steve Winslow <swinslow@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Huang <wei@redhat.com>
Cc: x86-ml <x86@kernel.org>
Cc: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Link: https://lkml.kernel.org/r/20190906075550.23435-2-jslaby@suse.cz
2019-09-06 10:41:11 +02:00
Josh Poimboeuf
3a6ab4bcc5 x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
After an objtool improvement, it's complaining about the CLAC in
copy_user_handle_tail():

  arch/x86/lib/copy_user_64.o: warning: objtool: .altinstr_replacement+0x12: redundant UACCESS disable
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x6: (alt)
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x2: (alt)
  arch/x86/lib/copy_user_64.o: warning: objtool:   copy_user_handle_tail()+0x0: <=== (func)

copy_user_handle_tail() is incorrectly marked as a callable function, so
objtool is rightfully concerned about the CLAC with no corresponding
STAC.

Remove the ELF function annotation.  The copy_user_handle_tail() code
path is already verified by objtool because it's jumped to by other
callable asm code (which does the corresponding STAC).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/6b6e436774678b4b9873811ff023bd29935bee5b.1563413318.git.jpoimboe@redhat.com
2019-07-18 21:01:05 +02:00
Thomas Gleixner
3fc2175113 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
Based on 1 normalized pattern(s):

  subject to the gnu public license v2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 1 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171440.222651153@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:29:55 -07:00
Peter Zijlstra
3693ca8115 x86/uaccess: Move copy_user_handle_tail() into asm
By writing the function in asm we avoid cross object code flow and
objtool no longer gets confused about a 'stray' CLAC.

Also; the asm version is actually _simpler_.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-03 09:36:29 +02:00
Jann Horn
75045f77f7 x86/extable: Introduce _ASM_EXTABLE_UA for uaccess fixups
Currently, most fixups for attempting to access userspace memory are
handled using _ASM_EXTABLE, which is also used for various other types of
fixups (e.g. safe MSR access, IRET failures, and a bunch of other things).
In order to make it possible to add special safety checks to uaccess fixups
(in particular, checking whether the fault address is actually in
userspace), introduce a new exception table handler ex_handler_uaccess()
and wire it up to all the user access fixups (excluding ones that
already use _ASM_EXTABLE_EX).

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: dvyukov@google.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-5-jannh@google.com
2018-09-03 15:12:09 +02:00
Paolo Abeni
236222d393 x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
According to the Intel datasheet, the REP MOVSB instruction
exposes a pretty heavy setup cost (50 ticks), which hurts
short string copy operations.

This change tries to avoid this cost by calling the explicit
loop available in the unrolled code for strings shorter
than 64 bytes.

The 64 bytes cutoff value is arbitrary from the code logic
point of view - it has been selected based on measurements,
as the largest value that still ensures a measurable gain.

Micro benchmarks of the __copy_from_user() function with
lengths in the [0-63] range show this performance gain
(shorter the string, larger the gain):

 - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
 - in the [72%-9%] range on Intel Core i7-4810MQ

Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
8216 - show no difference, because they do not expose the
ERMS feature bit.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
[ Clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-30 09:52:51 +02:00
Borislav Petkov
adb402cd14 x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants
We already have the same functionality in usercopy_32.c. Share it with
64-bit and get rid of some more asm glue which is not needed anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161031151015.22087-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01 07:41:27 +01:00
Al Viro
784d5699ed x86: move exports to actual definitions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-07 23:47:15 -04:00
Andy Lutomirski
13d4ea097d x86/uaccess: Move thread_info::addr_limit to thread_struct
struct thread_info is a legacy mess.  To prepare for its partial removal,
move thread_info::addr_limit out.

As an added benefit, this way is simpler.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-15 10:26:30 +02:00
Ingo Molnar
3a2f2ac9b9 Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:28:03 +01:00
Toshi Kani
a82eee7424 x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
Data corruption issues were observed in tests which initiated
a system crash/reset while accessing BTT devices.  This problem
is reproducible.

The BTT driver calls pmem_rw_bytes() to update data in pmem
devices.  This interface calls __copy_user_nocache(), which
uses non-temporal stores so that the stores to pmem are
persistent.

__copy_user_nocache() uses non-temporal stores when a request
size is 8 bytes or larger (and is aligned by 8 bytes).  The
BTT driver updates the BTT map table, which entry size is
4 bytes.  Therefore, updates to the map table entries remain
cached, and are not written to pmem after a crash.

Change __copy_user_nocache() to use non-temporal store when
a request size is 4 bytes.  The change extends the current
byte-copy path for a less-than-8-bytes request, and does not
add any overhead to the regular path.

Reported-and-tested-by: Micah Parrish <micah.parrish@hpe.com>
Reported-and-tested-by: Brian Boylston <brian.boylston@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 09:10:23 +01:00
Toshi Kani
ee9737c924 x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
Add comments to __copy_user_nocache() to clarify its procedures
and alignment requirements.

Also change numeric branch target labels to named local labels.

No code changed:

 arch/x86/lib/copy_user_64.o:

    text    data     bss     dec     hex filename
    1239       0       0    1239     4d7 copy_user_64.o.before
    1239       0       0    1239     4d7 copy_user_64.o.after

 md5:
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
    58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: brian.boylston@hpe.com
Cc: dan.j.williams@intel.com
Cc: linux-nvdimm@lists.01.org
Cc: micah.parrish@hpe.com
Cc: ross.zwisler@linux.intel.com
Cc: vishal.l.verma@intel.com
Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
[ Small readability edits and added object file comparison. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 09:10:22 +01:00
Borislav Petkov
cd4d09ec6f x86/cpufeature: Carve out X86_FEATURE_*
Move them to a separate header and have the following
dependency:

  x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:17 +01:00
Ingo Molnar
131484c8da x86/debug: Remove perpetually broken, unmaintainable dwarf annotations
So the dwarf2 annotations in low level assembly code have
become an increasing hindrance: unreadable, messy macros
mixed into some of the most security sensitive code paths
of the Linux kernel.

These debug info annotations don't even buy the upstream
kernel anything: dwarf driven stack unwinding has caused
problems in the past so it's out of tree, and the upstream
kernel only uses the much more robust framepointers based
stack unwinding method.

In addition to that there's a steady, slow bitrot going
on with these annotations, requiring frequent fixups.
There's no tooling and no functionality upstream that
keeps it correct.

So burn down the sick forest, allowing new, healthier growth:

   27 files changed, 350 insertions(+), 1101 deletions(-)

Someone who has the willingness and time to do this
properly can attempt to reintroduce dwarf debuginfo in x86
assembly code plus dwarf unwinding from first principles,
with the following conditions:

 - it should be maximally readable, and maximally low-key to
   'ordinary' code reading and maintenance.

 - find a build time method to insert dwarf annotations
   automatically in the most common cases, for pop/push
   instructions that manipulate the stack pointer. This could
   be done for example via a preprocessing step that just
   looks for common patterns - plus special annotations for
   the few cases where we want to depart from the default.
   We have hundreds of CFI annotations, so automating most of
   that makes sense.

 - it should come with build tooling checks that ensure that
   CFI annotations are sensible. We've seen such efforts from
   the framepointer side, and there's no reason it couldn't be
   done on the dwarf side.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 07:57:48 +02:00
Borislav Petkov
b41e6ec242 x86/asm/uaccess: Get rid of copy_user_nocache_64.S
Move __copy_user_nocache() to arch/x86/lib/copy_user_64.S and
kill the containing file.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431538944-27724-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-14 07:25:35 +02:00
Borislav Petkov
9e6b13f761 x86/asm/uaccess: Unify the ALIGN_DESTINATION macro
Pull it up into the header and kill duplicate versions.
Separately, both macros are identical:

 35948b2bd3431aee7149e85cfe4becbc  /tmp/a
 35948b2bd3431aee7149e85cfe4becbc  /tmp/b

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1431538944-27724-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-14 07:25:34 +02:00
Borislav Petkov
de2ff88884 x86/lib/copy_user_64.S: Convert to ALTERNATIVE_2
Use the asm macro and drop the locally grown version.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:13 +01:00
Borislav Petkov
48c7a2509f x86/alternatives: Make JMPs more robust
Up until now we had to pay attention to relative JMPs in alternatives
about how their relative offset gets computed so that the jump target
is still correct. Or, as it is the case for near CALLs (opcode e8), we
still have to go and readjust the offset at patching time.

What is more, the static_cpu_has_safe() facility had to forcefully
generate 5-byte JMPs since we couldn't rely on the compiler to generate
properly sized ones so we had to force the longest ones. Worse than
that, sometimes it would generate a replacement JMP which is longer than
the original one, thus overwriting the beginning of the next instruction
at patching time.

So, in order to alleviate all that and make using JMPs more
straight-forward we go and pad the original instruction in an
alternative block with NOPs at build time, should the replacement(s) be
longer. This way, alternatives users shouldn't pay special attention
so that original and replacement instruction sizes are fine but the
assembler would simply add padding where needed and not do anything
otherwise.

As a second aspect, we go and recompute JMPs at patching time so that we
can try to make 5-byte JMPs into two-byte ones if possible. If not, we
still have to recompute the offsets as the replacement JMP gets put far
away in the .altinstr_replacement section leading to a wrong offset if
copied verbatim.

For example, on a locally generated kernel image

  old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
  __switch_to:
   ffffffff810014bd:      eb 21                   jmp ffffffff810014e0
  repl insn: size: 5
  ffffffff81d0b23c:       e9 b1 62 2f ff          jmpq ffffffff810014f2

gets corrected to a 2-byte JMP:

  apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
  alt_insn: e9 b1 62 2f ff
  recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
  converted to: eb 33 90 90 90

and a 5-byte JMP:

  old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
  __switch_to:
   ffffffff81001516:      eb 30                   jmp ffffffff81001548
  repl insn: size: 5
   ffffffff81d0b241:      e9 10 63 2f ff          jmpq ffffffff81001556

gets shortened into a two-byte one:

  apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
  alt_insn: e9 10 63 2f ff
  recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
  converted to: eb 3e 90 90 90

... and so on.

This leads to a net win of around

40ish replacements * 3 bytes savings =~ 120 bytes of I$

on an AMD guest which means some savings of precious instruction cache
bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
which on smart microarchitectures means discarding NOPs at decode time
and thus freeing up execution bandwidth.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:11 +01:00
Borislav Petkov
4332195c56 x86/alternatives: Add instruction padding
Up until now we have always paid attention to make sure the length of
the new instruction replacing the old one is at least less or equal to
the length of the old instruction. If the new instruction is longer, at
the time it replaces the old instruction it will overwrite the beginning
of the next instruction in the kernel image and cause your pants to
catch fire.

So instead of having to pay attention, teach the alternatives framework
to pad shorter old instructions with NOPs at buildtime - but only in the
case when

  len(old instruction(s)) < len(new instruction(s))

and add nothing in the >= case. (In that case we do add_nops() when
patching).

This way the alternatives user shouldn't have to care about instruction
sizes and simply use the macros.

Add asm ALTERNATIVE* flavor macros too, while at it.

Also, we need to save the pad length in a separate struct alt_instr
member for NOP optimization and the way to do that reliably is to carry
the pad length instead of trying to detect whether we're looking at
single-byte NOPs or at pathological instruction offsets like e9 90 90 90
90, for example, which is a valid instruction.

Thanks to Michael Matz for the great help with toolchain questions.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:00 +01:00
Borislav Petkov
338ea55579 x86/lib/copy_user_64.S: Remove FIX_ALIGNMENT define
It is unconditionally enabled so remove it. No object file change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:35:49 +01:00
H. Peter Anvin
661c80192d x86-64, copy_user: Use leal to produce 32-bit results
When we are using lea to produce a 32-bit result, we can use the leal
form, rather than using leaq and worry about truncation elsewhere.

Make the leal explicit, both to be more obvious and since that is what
gcc generates and thus is less likely to trigger obscure gas bugs.

Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1384634221-6006-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-20 13:57:07 -08:00
Fenghua Yu
f4cb1cc18f x86-64, copy_user: Remove zero byte check before copy user buffer.
Operation of rep movsb instruction handles zero byte copy. As pointed out by
Linus, there is no need to check zero size in kernel. Removing this redundant
check saves a few cycles in copy user functions.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1384634221-6006-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-16 18:00:58 -08:00
H. Peter Anvin
63bcff2a30 x86, smap: Add STAC and CLAC instructions to control user space access
When Supervisor Mode Access Prevention (SMAP) is enabled, access to
userspace from the kernel is controlled by the AC flag.  To make the
performance of manipulating that flag acceptable, there are two new
instructions, STAC and CLAC, to set and clear it.

This patch adds those instructions, via alternative(), when the SMAP
feature is enabled.  It also adds X86_EFLAGS_AC unconditionally to the
SYSCALL entry mask; there is simply no reason to make that one
conditional.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
2012-09-21 12:45:27 -07:00
H. Peter Anvin
9732da8ca8 x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S
Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S,
and replace them with _ASM_EXTABLE() macros; this will allow us to
change the format and type of the exception table entries.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: David Daney <david.daney@cavium.com>
Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com
2012-04-20 13:51:38 -07:00
Jiri Olsa
26afb7c661 x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit
As reported in BZ #30352:

  https://bugzilla.kernel.org/show_bug.cgi?id=30352

there's a kernel bug related to reading the last allowed page on x86_64.

The _copy_to_user() and _copy_from_user() functions use the following
check for address limit:

  if (buf + size >= limit)
	fail();

while it should be more permissive:

  if (buf + size > limit)
	fail();

That's because the size represents the number of bytes being
read/write from/to buf address AND including the buf address.
So the copy function will actually never touch the limit
address even if "buf + size == limit".

Following program fails to use the last page as buffer
due to the wrong limit check:

 #include <sys/mman.h>
 #include <sys/socket.h>
 #include <assert.h>

 #define PAGE_SIZE       (4096)
 #define LAST_PAGE       ((void*)(0x7fffffffe000))

 int main()
 {
        int fds[2], err;
        void * ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ | PROT_WRITE,
                          MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
        assert(ptr == LAST_PAGE);
        err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
        assert(err == 0);
        err = send(fds[0], ptr, PAGE_SIZE, 0);
        perror("send");
        assert(err == PAGE_SIZE);
        err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL);
        perror("recv");
        assert(err == PAGE_SIZE);
        return 0;
 }

The other place checking the addr limit is the access_ok() function,
which is working properly. There's just a misleading comment
for the __range_not_ok() macro - which this patch fixes as well.

The last page of the user-space address range is a guard page and
Brian Gerst observed that the guard page itself due to an erratum on K8 cpus
(#121 Sequential Execution Across Non-Canonical Boundary Causes Processor
Hang).

However, the test code is using the last valid page before the guard page.
The bug is that the last byte before the guard page can't be read
because of the off-by-one error. The guard page is left in place.

This bug would normally not show up because the last page is
part of the process stack and never accessed via syscalls.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-18 12:49:00 +02:00
Fenghua Yu
4307bec934 x86, mem: copy_user_64.S: Support copy_to/from_user by enhanced REP MOVSB/STOSB
Support copy_to_user/copy_from_user() by enhanced REP MOVSB/STOSB.
On processors supporting enhanced REP MOVSB/STOSB, the alternative
copy_user_enhanced_fast_string function using enhanced rep movsb overrides the
original function and the fast string function.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1305671358-14478-7-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-17 15:40:28 -07:00
Lucas De Marchi
0d2eb44f63 x86: Fix common misspellings
They were generated by 'codespell' and then manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18 10:39:30 +01:00
H. Peter Anvin
df378ccfc4 x86, alternatives: Fix one more open-coded 8-bit alternative number
Fix a missing case of an 8-bit alternative number, buried inside an
assembly macro.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reported-by: Yinghai Lu <yinhai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4C3BDDA3.2060900@kernel.org>
2010-07-13 14:56:16 -07:00