2019-06-04 10:11:33 +02:00
/* SPDX-License-Identifier: GPL-2.0-only */
2005-11-01 19:52:24 +00:00
/ *
* linux/ a r c h / a r m / l i b / c o p y _ t o _ u s e r . S
*
* Author : Nicolas P i t r e
* Created : Sep 2 9 , 2 0 0 5
* Copyright : MontaVista S o f t w a r e , I n c .
* /
# include < l i n u x / l i n k a g e . h >
# include < a s m / a s s e m b l e r . h >
2014-11-26 14:38:33 +01:00
# include < a s m / u n w i n d . h >
2005-11-01 19:52:24 +00:00
/ *
* Prototype :
*
2015-08-19 11:02:28 +01:00
* size_ t a r m _ c o p y _ t o _ u s e r ( v o i d * t o , c o n s t v o i d * f r o m , s i z e _ t n )
2005-11-01 19:52:24 +00:00
*
* Purpose :
*
* copy a b l o c k t o u s e r m e m o r y f r o m k e r n e l m e m o r y
*
* Params :
*
* to = u s e r m e m o r y
* from = k e r n e l m e m o r y
* n = n u m b e r o f b y t e s t o c o p y
*
* Return v a l u e :
*
* Number o f b y t e s N O T c o p i e d .
* /
2009-07-24 12:32:57 +01:00
# define L D R 1 W _ S H I F T 0
2005-11-01 19:52:24 +00:00
.macro ldr1w ptr r e g a b o r t
2009-07-24 12:32:57 +01:00
W( l d r ) \ r e g , [ \ p t r ] , #4
2005-11-01 19:52:24 +00:00
.endm
.macro ldr4w ptr r e g 1 r e g 2 r e g 3 r e g 4 a b o r t
ldmia \ p t r ! , { \ r e g 1 , \ r e g 2 , \ r e g 3 , \ r e g 4 }
.endm
.macro ldr8w ptr r e g 1 r e g 2 r e g 3 r e g 4 r e g 5 r e g 6 r e g 7 r e g 8 a b o r t
ldmia \ p t r ! , { \ r e g 1 , \ r e g 2 , \ r e g 3 , \ r e g 4 , \ r e g 5 , \ r e g 6 , \ r e g 7 , \ r e g 8 }
.endm
.macro ldr1b ptr r e g c o n d =al a b o r t
2019-02-18 00:54:36 +01:00
ldrb\ c o n d \ r e g , [ \ p t r ] , #1
2005-11-01 19:52:24 +00:00
.endm
ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
ARMv6+ processors do not use CONFIG_CPU_USE_DOMAINS and use privileged
ldr/str instructions in copy_{from/to}_user. They are currently
unnecessarily using single ldr/str instructions and can use ldm/stm
instructions instead like memcpy does (but with appropriate fixup
tables).
This speeds up a "dd if=foo of=bar bs=32k" on a tmpfs filesystem by
about 4% on my Cortex-A9.
before:134217728 bytes (128.0MB) copied, 0.543848 seconds, 235.4MB/s
before:134217728 bytes (128.0MB) copied, 0.538610 seconds, 237.6MB/s
before:134217728 bytes (128.0MB) copied, 0.544356 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.544364 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.537130 seconds, 238.3MB/s
before:134217728 bytes (128.0MB) copied, 0.533443 seconds, 240.0MB/s
before:134217728 bytes (128.0MB) copied, 0.545691 seconds, 234.6MB/s
before:134217728 bytes (128.0MB) copied, 0.534695 seconds, 239.4MB/s
before:134217728 bytes (128.0MB) copied, 0.540561 seconds, 236.8MB/s
before:134217728 bytes (128.0MB) copied, 0.541025 seconds, 236.6MB/s
after:134217728 bytes (128.0MB) copied, 0.520445 seconds, 245.9MB/s
after:134217728 bytes (128.0MB) copied, 0.527846 seconds, 242.5MB/s
after:134217728 bytes (128.0MB) copied, 0.519510 seconds, 246.4MB/s
after:134217728 bytes (128.0MB) copied, 0.527231 seconds, 242.8MB/s
after:134217728 bytes (128.0MB) copied, 0.525030 seconds, 243.8MB/s
after:134217728 bytes (128.0MB) copied, 0.524236 seconds, 244.2MB/s
after:134217728 bytes (128.0MB) copied, 0.523659 seconds, 244.4MB/s
after:134217728 bytes (128.0MB) copied, 0.525018 seconds, 243.8MB/s
after:134217728 bytes (128.0MB) copied, 0.519249 seconds, 246.5MB/s
after:134217728 bytes (128.0MB) copied, 0.518527 seconds, 246.9MB/s
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-11-09 10:09:48 +01:00
# ifdef C O N F I G _ C P U _ U S E _ D O M A I N S
# ifndef C O N F I G _ T H U M B 2 _ K E R N E L
# define S T R 1 W _ S H I F T 0
# else
# define S T R 1 W _ S H I F T 1
# endif
2005-11-01 19:52:24 +00:00
.macro str1w ptr r e g a b o r t
2009-07-24 12:32:57 +01:00
strusr \ r e g , \ p t r , 4 , a b o r t = \ a b o r t
2005-11-01 19:52:24 +00:00
.endm
.macro str8w ptr r e g 1 r e g 2 r e g 3 r e g 4 r e g 5 r e g 6 r e g 7 r e g 8 a b o r t
str1 w \ p t r , \ r e g 1 , \ a b o r t
str1 w \ p t r , \ r e g 2 , \ a b o r t
str1 w \ p t r , \ r e g 3 , \ a b o r t
str1 w \ p t r , \ r e g 4 , \ a b o r t
str1 w \ p t r , \ r e g 5 , \ a b o r t
str1 w \ p t r , \ r e g 6 , \ a b o r t
str1 w \ p t r , \ r e g 7 , \ a b o r t
str1 w \ p t r , \ r e g 8 , \ a b o r t
.endm
ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
ARMv6+ processors do not use CONFIG_CPU_USE_DOMAINS and use privileged
ldr/str instructions in copy_{from/to}_user. They are currently
unnecessarily using single ldr/str instructions and can use ldm/stm
instructions instead like memcpy does (but with appropriate fixup
tables).
This speeds up a "dd if=foo of=bar bs=32k" on a tmpfs filesystem by
about 4% on my Cortex-A9.
before:134217728 bytes (128.0MB) copied, 0.543848 seconds, 235.4MB/s
before:134217728 bytes (128.0MB) copied, 0.538610 seconds, 237.6MB/s
before:134217728 bytes (128.0MB) copied, 0.544356 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.544364 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.537130 seconds, 238.3MB/s
before:134217728 bytes (128.0MB) copied, 0.533443 seconds, 240.0MB/s
before:134217728 bytes (128.0MB) copied, 0.545691 seconds, 234.6MB/s
before:134217728 bytes (128.0MB) copied, 0.534695 seconds, 239.4MB/s
before:134217728 bytes (128.0MB) copied, 0.540561 seconds, 236.8MB/s
before:134217728 bytes (128.0MB) copied, 0.541025 seconds, 236.6MB/s
after:134217728 bytes (128.0MB) copied, 0.520445 seconds, 245.9MB/s
after:134217728 bytes (128.0MB) copied, 0.527846 seconds, 242.5MB/s
after:134217728 bytes (128.0MB) copied, 0.519510 seconds, 246.4MB/s
after:134217728 bytes (128.0MB) copied, 0.527231 seconds, 242.8MB/s
after:134217728 bytes (128.0MB) copied, 0.525030 seconds, 243.8MB/s
after:134217728 bytes (128.0MB) copied, 0.524236 seconds, 244.2MB/s
after:134217728 bytes (128.0MB) copied, 0.523659 seconds, 244.4MB/s
after:134217728 bytes (128.0MB) copied, 0.525018 seconds, 243.8MB/s
after:134217728 bytes (128.0MB) copied, 0.519249 seconds, 246.5MB/s
after:134217728 bytes (128.0MB) copied, 0.518527 seconds, 246.9MB/s
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-11-09 10:09:48 +01:00
# else
# define S T R 1 W _ S H I F T 0
.macro str1w ptr r e g a b o r t
USERL( \ a b o r t , W ( s t r ) \ r e g , [ \ p t r ] , #4 )
.endm
.macro str8w ptr r e g 1 r e g 2 r e g 3 r e g 4 r e g 5 r e g 6 r e g 7 r e g 8 a b o r t
USERL( \ a b o r t , s t m i a \ p t r ! , { \ r e g 1 , \ r e g 2 , \ r e g 3 , \ r e g 4 , \ r e g 5 , \ r e g 6 , \ r e g 7 , \ r e g 8 } )
.endm
# endif / * C O N F I G _ C P U _ U S E _ D O M A I N S * /
2005-11-01 19:52:24 +00:00
.macro str1b ptr r e g c o n d =al a b o r t
2009-07-24 12:32:57 +01:00
strusr \ r e g , \ p t r , 1 , \ c o n d , a b o r t = \ a b o r t
2005-11-01 19:52:24 +00:00
.endm
ARM: memcpy: use frame pointer as unwind anchor
The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-10-03 19:05:53 +02:00
.macro enter regs : vararg
2005-11-01 19:52:24 +00:00
mov r3 , #0
ARM: memcpy: use frame pointer as unwind anchor
The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-10-03 19:05:53 +02:00
UNWIND( . s a v e { r0 , r2 , r3 , \ r e g s } )
stmdb s p ! , { r0 , r2 , r3 , \ r e g s }
2005-11-01 19:52:24 +00:00
.endm
ARM: memcpy: use frame pointer as unwind anchor
The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-10-03 19:05:53 +02:00
.macro exit regs : vararg
2005-11-01 19:52:24 +00:00
add s p , s p , #8
ARM: memcpy: use frame pointer as unwind anchor
The memcpy template is a bit unusual in the way it manages the stack
pointer: depending on the execution path through the function, the SP
assumes different values as different subsets of the register file are
preserved and restored again. This is problematic when it comes to EHABI
unwind info, as it is not instruction accurate, and does not allow
tracking the SP value as it changes.
Commit 279f487e0b471 ("ARM: 8225/1: Add unwinding support for memory
copy functions") addressed this by carving up the function in different
chunks as far as the unwinder is concerned, and keeping a set of unwind
directives for each of them, each corresponding with the state of the
stack pointer during execution of the chunk in question. This not only
duplicates unwind info unnecessarily, but it also complicates unwinding
the stack upon overflow.
Instead, let's do what the compiler does when the SP is updated halfway
through a function, which is to use a frame pointer and emit the
appropriate unwind directives to communicate this to the unwinder.
Note that Thumb-2 uses R7 for this, while ARM uses R11 aka FP. So let's
avoid touching R7 in the body of the template, so that Thumb-2 can use
it as the frame pointer. R11 was not modified in the first place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Keith Packard <keithpac@amazon.com>
Tested-by: Marc Zyngier <maz@kernel.org>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> # ARMv7M
2021-10-03 19:05:53 +02:00
ldmfd s p ! , { r0 , \ r e g s }
2005-11-01 19:52:24 +00:00
.endm
.text
2009-03-08 22:34:45 -04:00
ENTRY( _ _ c o p y _ t o _ u s e r _ s t d )
2015-08-19 11:02:28 +01:00
WEAK( a r m _ c o p y _ t o _ u s e r )
2018-09-11 10:15:12 +01:00
# ifdef C O N F I G _ C P U _ S P E C T R E
2021-08-11 08:30:26 +01:00
ldr r3 , =TASK_SIZE
2018-09-11 10:15:12 +01:00
uaccess_ m a s k _ r a n g e _ p t r r0 , r2 , r3 , i p
# endif
2005-11-01 19:52:24 +00:00
# include " c o p y _ t e m p l a t e . S "
2015-08-19 11:02:28 +01:00
ENDPROC( a r m _ c o p y _ t o _ u s e r )
2010-05-07 10:52:32 +01:00
ENDPROC( _ _ c o p y _ t o _ u s e r _ s t d )
2008-08-28 11:22:32 +01:00
2015-03-24 10:41:09 +01:00
.pushsection .text .fixup , " ax"
2005-11-01 19:52:24 +00:00
.align 0
copy_ a b o r t _ p r e a m b l e
ldmfd s p ! , { r1 , r2 , r3 }
sub r0 , r0 , r1
rsb r0 , r0 , r2
copy_ a b o r t _ e n d
2010-04-19 10:15:03 +01:00
.popsection