powerpc/lib: optimise 32 bits __clear_user()
Rewrite clear_user() on the same principle as memset(0), making use
of dcbz to clear complete cache lines.
This code is a copy/paste of memset(), with some modifications
in order to retrieve remaining number of bytes to be cleared,
as it needs to be returned in case of error.
On the same way as done on PPC64 in commit 17968fbbd19f1
("powerpc: 64bit optimised __clear_user"), the patch moves
__clear_user() into a dedicated file string_32.S
On a MPC885, throughput is almost doubled:
Before:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s
After:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s
On a MPC8321, throughput is multiplied by 2.12:
Before:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s
After:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-30 07:06:13 +00:00
/* SPDX-License-Identifier: GPL-2.0 */
/ *
* String h a n d l i n g f u n c t i o n s f o r P o w e r P C 3 2
*
* Copyright ( C ) 1 9 9 6 P a u l M a c k e r r a s .
*
* /
# include < a s m / p p c _ a s m . h >
# include < a s m / e x p o r t . h >
# include < a s m / c a c h e . h >
.text
CACHELINE_ B Y T E S = L 1 _ C A C H E _ B Y T E S
LG_ C A C H E L I N E _ B Y T E S = L 1 _ C A C H E _ S H I F T
CACHELINE_ M A S K = ( L 1 _ C A C H E _ B Y T E S - 1 )
2019-12-10 00:22:21 +11:00
_ GLOBAL( _ _ a r c h _ c l e a r _ u s e r )
powerpc/lib: optimise 32 bits __clear_user()
Rewrite clear_user() on the same principle as memset(0), making use
of dcbz to clear complete cache lines.
This code is a copy/paste of memset(), with some modifications
in order to retrieve remaining number of bytes to be cleared,
as it needs to be returned in case of error.
On the same way as done on PPC64 in commit 17968fbbd19f1
("powerpc: 64bit optimised __clear_user"), the patch moves
__clear_user() into a dedicated file string_32.S
On a MPC885, throughput is almost doubled:
Before:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s
After:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s
On a MPC8321, throughput is multiplied by 2.12:
Before:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s
After:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-05-30 07:06:13 +00:00
/ *
* Use d c b z o n t h e c o m p l e t e c a c h e l i n e s i n t h e d e s t i n a t i o n
* to s e t t h e m t o z e r o . T h i s r e q u i r e s t h a t t h e d e s t i n a t i o n
* area i s c a c h e a b l e .
* /
cmplwi c r0 , r4 , 4
mr r10 , r3
li r3 , 0
blt 7 f
11 : stw r3 , 0 ( r10 )
beqlr
andi. r0 , r10 , 3
add r11 , r0 , r4
subf r6 , r0 , r10
clrlwi r7 , r6 , 3 2 - L G _ C A C H E L I N E _ B Y T E S
add r8 , r7 , r11
srwi r9 , r8 , L G _ C A C H E L I N E _ B Y T E S
addic. r9 , r9 , - 1 / * t o t a l n u m b e r o f c o m p l e t e c a c h e l i n e s * /
ble 2 f
xori r0 , r7 , C A C H E L I N E _ M A S K & ~ 3
srwi. r0 , r0 , 2
beq 3 f
mtctr r0
4 : stwu r3 , 4 ( r6 )
bdnz 4 b
3 : mtctr r9
li r7 , 4
10 : dcbz r7 , r6
addi r6 , r6 , C A C H E L I N E _ B Y T E S
bdnz 1 0 b
clrlwi r11 , r8 , 3 2 - L G _ C A C H E L I N E _ B Y T E S
addi r11 , r11 , 4
2 : srwi r0 ,r11 ,2
mtctr r0
bdz 6 f
1 : stwu r3 , 4 ( r6 )
bdnz 1 b
6 : andi. r11 , r11 , 3
beqlr
mtctr r11
addi r6 , r6 , 3
8 : stbu r3 , 1 ( r6 )
bdnz 8 b
blr
7 : cmpwi c r0 , r4 , 0
beqlr
mtctr r4
addi r6 , r10 , - 1
9 : stbu r3 , 1 ( r6 )
bdnz 9 b
blr
90 : mr r3 , r4
blr
91 : add r3 , r10 , r4
subf r3 , r6 , r3
blr
EX_ T A B L E ( 1 1 b , 9 0 b )
EX_ T A B L E ( 4 b , 9 1 b )
EX_ T A B L E ( 1 0 b , 9 1 b )
EX_ T A B L E ( 1 b , 9 1 b )
EX_ T A B L E ( 8 b , 9 1 b )
EX_ T A B L E ( 9 b , 9 1 b )
2019-12-10 00:22:21 +11:00
EXPORT_ S Y M B O L ( _ _ a r c h _ c l e a r _ u s e r )