2016-12-05 18:42:28 +00:00
/ *
* Accelerated C R C 3 2 ( C ) u s i n g A R M C R C , N E O N a n d C r y p t o E x t e n s i o n s i n s t r u c t i o n s
*
* Copyright ( C ) 2 0 1 6 L i n a r o L t d < a r d . b i e s h e u v e l @linaro.org>
*
* This p r o g r a m i s f r e e s o f t w a r e ; you can redistribute it and/or modify
* it u n d e r t h e t e r m s o f t h e G N U G e n e r a l P u b l i c L i c e n s e v e r s i o n 2 a s
* published b y t h e F r e e S o f t w a r e F o u n d a t i o n .
* /
/ * GPL H E A D E R S T A R T
*
* DO N O T A L T E R O R R E M O V E C O P Y R I G H T N O T I C E S O R T H I S F I L E H E A D E R .
*
* This p r o g r a m i s f r e e s o f t w a r e ; you can redistribute it and/or modify
* it u n d e r t h e t e r m s o f t h e G N U G e n e r a l P u b l i c L i c e n s e v e r s i o n 2 o n l y ,
* as p u b l i s h e d b y t h e F r e e S o f t w a r e F o u n d a t i o n .
*
* This p r o g r a m i s d i s t r i b u t e d i n t h e h o p e t h a t i t w i l l b e u s e f u l , b u t
* WITHOUT A N Y W A R R A N T Y ; without even the implied warranty of
* MERCHANTABILITY o r F I T N E S S F O R A P A R T I C U L A R P U R P O S E . S e e t h e G N U
* General P u b l i c L i c e n s e v e r s i o n 2 f o r m o r e d e t a i l s ( a c o p y i s i n c l u d e d
* in t h e L I C E N S E f i l e t h a t a c c o m p a n i e d t h i s c o d e ) .
*
* You s h o u l d h a v e r e c e i v e d a c o p y o f t h e G N U G e n e r a l P u b l i c L i c e n s e
* version 2 a l o n g w i t h t h i s p r o g r a m ; If not, see http://www.gnu.org/licenses
*
* Please v i s i t h t t p : / / w w w . x y r a t e x . c o m / c o n t a c t i f y o u n e e d a d d i t i o n a l
* information o r h a v e a n y q u e s t i o n s .
*
* GPL H E A D E R E N D
* /
/ *
* Copyright 2 0 1 2 X y r a t e x T e c h n o l o g y L i m i t e d
*
* Using h a r d w a r e p r o v i d e d P C L M U L Q D Q i n s t r u c t i o n t o a c c e l e r a t e t h e C R C 3 2
* calculation.
* CRC3 2 p o l y n o m i a l : 0 x04 c11 d b7 ( B E ) / 0 x E D B 8 8 3 2 0 ( L E )
* PCLMULQDQ i s a n e w i n s t r u c t i o n i n I n t e l S S E 4 . 2 , t h e r e f e r e n c e c a n b e f o u n d
* at :
* http : / / www. i n t e l . c o m / p r o d u c t s / p r o c e s s o r / m a n u a l s /
* Intel( R ) 6 4 a n d I A - 3 2 A r c h i t e c t u r e s S o f t w a r e D e v e l o p e r ' s M a n u a l
* Volume 2 B : I n s t r u c t i o n S e t R e f e r e n c e , N - Z
*
* Authors : Gregory P r e s t a s < G r e g o r y _ P r e s t a s @us.xyratex.com>
* Alexander B o y k o < A l e x a n d e r _ B o y k o @xyratex.com>
* /
# include < l i n u x / l i n k a g e . h >
# include < a s m / a s s e m b l e r . h >
.text
.align 6
.arch armv8 - a
.arch_extension crc
.fpu crypto- n e o n - f p - a r m v8
.Lcrc32_constants :
/ *
* [ x4 * 1 2 8 + 3 2 m o d P ( x ) < < 3 2 ) ] ' < < 1 = 0 x15 4 4 4 2 b d4
* # define C O N S T A N T _ R 1 0 x15 4 4 4 2 b d4 L L
*
* [ ( x4 * 1 2 8 - 3 2 m o d P ( x ) < < 3 2 ) ] ' < < 1 = 0 x1 c6 e 4 1 5 9 6
* # define C O N S T A N T _ R 2 0 x1 c6 e 4 1 5 9 6 L L
* /
.quad 0x0000000154442bd4
.quad 0x00000001c6e41596
/ *
* [ ( x1 2 8 + 3 2 m o d P ( x ) < < 3 2 ) ] ' < < 1 = 0 x17 5 1 9 9 7 d0
* # define C O N S T A N T _ R 3 0 x17 5 1 9 9 7 d0 L L
*
* [ ( x1 2 8 - 3 2 m o d P ( x ) < < 3 2 ) ] ' < < 1 = 0 x0 c c a a00 9 e
* # define C O N S T A N T _ R 4 0 x0 c c a a00 9 e L L
* /
.quad 0x00000001751997d0
.quad 0x00000000ccaa009e
/ *
* [ ( x6 4 m o d P ( x ) < < 3 2 ) ] ' < < 1 = 0 x16 3 c d61 2 4
* # define C O N S T A N T _ R 5 0 x16 3 c d61 2 4 L L
* /
.quad 0x0000000163cd6124
.quad 0x00000000FFFFFFFF
/ *
* # define C R C P O L Y _ T R U E _ L E _ F U L L 0 x1 D B 7 1 0 6 4 1 L L
*
* Barrett R e d u c t i o n c o n s t a n t ( u 6 4 ` ) = u ` = ( x * * 6 4 / P ( x ) ) `
* = 0 x1 F 7 0 1 1 6 4 1 L L
* # define C O N S T A N T _ R U 0 x1 F 7 0 1 1 6 4 1 L L
* /
.quad 0x00000001DB710641
.quad 0x00000001F7011641
.Lcrc32c_constants :
.quad 0x00000000740eef02
.quad 0x000000009e4addf8
.quad 0x00000000f20c0dfe
.quad 0x000000014cd00bd6
.quad 0x00000000dd45aab8
.quad 0x00000000FFFFFFFF
.quad 0x0000000105ec76f0
.quad 0x00000000dea713f1
dCONSTANTl . r e q d0
dCONSTANTh . r e q d1
qCONSTANT . r e q q0
BUF . r e q r0
LEN . r e q r1
CRC . r e q r2
qzr . r e q q9
/ * *
* Calculate c r c32
* BUF - b u f f e r
* LEN - s i z e o f b u f f e r ( m u l t i p l e o f 1 6 b y t e s ) , L E N s h o u l d b e > 6 3
* CRC - i n i t i a l c r c32
* return % e a x c r c32
* uint c r c32 _ p m u l l _ l e ( u n s i g n e d c h a r c o n s t * b u f f e r ,
* size_ t l e n , u i n t c r c32 )
* /
ENTRY( c r c32 _ p m u l l _ l e )
adr r3 , . L c r c32 _ c o n s t a n t s
b 0 f
ENTRY( c r c32 c _ p m u l l _ l e )
adr r3 , . L c r c32 c _ c o n s t a n t s
0 : bic L E N , L E N , #15
vld1 . 8 { q1 - q2 } , [ B U F , : 1 2 8 ] !
vld1 . 8 { q3 - q4 } , [ B U F , : 1 2 8 ] !
vmov. i 8 q z r , #0
vmov. i 8 q C O N S T A N T , #0
2017-02-28 14:36:56 +00:00
vmov. 3 2 d C O N S T A N T l [ 0 ] , C R C
2016-12-05 18:42:28 +00:00
veor. 8 d2 , d2 , d C O N S T A N T l
sub L E N , L E N , #0x40
cmp L E N , #0x40
blt l e s s _ 6 4
vld1 . 6 4 { q C O N S T A N T } , [ r3 ]
loop_64 : /* 64 bytes Full cache line folding */
sub L E N , L E N , #0x40
vmull. p64 q5 , d3 , d C O N S T A N T h
vmull. p64 q6 , d5 , d C O N S T A N T h
vmull. p64 q7 , d7 , d C O N S T A N T h
vmull. p64 q8 , d9 , d C O N S T A N T h
vmull. p64 q1 , d2 , d C O N S T A N T l
vmull. p64 q2 , d4 , d C O N S T A N T l
vmull. p64 q3 , d6 , d C O N S T A N T l
vmull. p64 q4 , d8 , d C O N S T A N T l
veor. 8 q1 , q1 , q5
vld1 . 8 { q5 } , [ B U F , : 1 2 8 ] !
veor. 8 q2 , q2 , q6
vld1 . 8 { q6 } , [ B U F , : 1 2 8 ] !
veor. 8 q3 , q3 , q7
vld1 . 8 { q7 } , [ B U F , : 1 2 8 ] !
veor. 8 q4 , q4 , q8
vld1 . 8 { q8 } , [ B U F , : 1 2 8 ] !
veor. 8 q1 , q1 , q5
veor. 8 q2 , q2 , q6
veor. 8 q3 , q3 , q7
veor. 8 q4 , q4 , q8
cmp L E N , #0x40
bge l o o p _ 6 4
less_64 : /* Folding cache line into 128bit */
vldr d C O N S T A N T l , [ r3 , #16 ]
vldr d C O N S T A N T h , [ r3 , #24 ]
vmull. p64 q5 , d3 , d C O N S T A N T h
vmull. p64 q1 , d2 , d C O N S T A N T l
veor. 8 q1 , q1 , q5
veor. 8 q1 , q1 , q2
vmull. p64 q5 , d3 , d C O N S T A N T h
vmull. p64 q1 , d2 , d C O N S T A N T l
veor. 8 q1 , q1 , q5
veor. 8 q1 , q1 , q3
vmull. p64 q5 , d3 , d C O N S T A N T h
vmull. p64 q1 , d2 , d C O N S T A N T l
veor. 8 q1 , q1 , q5
veor. 8 q1 , q1 , q4
teq L E N , #0
beq f o l d _ 6 4
loop_16 : /* Folding rest buffer into 128bit */
subs L E N , L E N , #0x10
vld1 . 8 { q2 } , [ B U F , : 1 2 8 ] !
vmull. p64 q5 , d3 , d C O N S T A N T h
vmull. p64 q1 , d2 , d C O N S T A N T l
veor. 8 q1 , q1 , q5
veor. 8 q1 , q1 , q2
bne l o o p _ 1 6
fold_64 :
/ * perform t h e l a s t 6 4 b i t f o l d , a l s o a d d s 3 2 z e r o e s
* to t h e i n p u t s t r e a m * /
vmull. p64 q2 , d2 , d C O N S T A N T h
vext. 8 q1 , q1 , q z r , #8
veor. 8 q1 , q1 , q2
/* final 32-bit fold */
vldr d C O N S T A N T l , [ r3 , #32 ]
vldr d6 , [ r3 , #40 ]
vmov. i 8 d7 , #0
vext. 8 q2 , q1 , q z r , #4
vand. 8 d2 , d2 , d6
vmull. p64 q1 , d2 , d C O N S T A N T l
veor. 8 q1 , q1 , q2
/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
vldr d C O N S T A N T l , [ r3 , #48 ]
vldr d C O N S T A N T h , [ r3 , #56 ]
vand. 8 q2 , q1 , q3
vext. 8 q2 , q z r , q2 , #8
vmull. p64 q2 , d5 , d C O N S T A N T h
vand. 8 q2 , q2 , q3
vmull. p64 q2 , d4 , d C O N S T A N T l
veor. 8 q1 , q1 , q2
vmov r0 , s5
bx l r
ENDPROC( c r c32 _ p m u l l _ l e )
ENDPROC( c r c32 c _ p m u l l _ l e )
.macro _ _ crc3 2 , c
subs i p , r2 , #8
bmi . L t a i l \ c
tst r1 , #3
bne . L u n a l i g n e d \ c
teq i p , #0
.Laligned8 \ c :
ldrd r2 , r3 , [ r1 ] , #8
ARM_ B E 8 ( r e v r2 , r2 )
ARM_ B E 8 ( r e v r3 , r3 )
crc3 2 \ c \ ( ) w r0 , r0 , r2
crc3 2 \ c \ ( ) w r0 , r0 , r3
bxeq l r
subs i p , i p , #8
bpl . L a l i g n e d8 \ c
.Ltail \ c :
tst i p , #4
beq 2 f
ldr r3 , [ r1 ] , #4
ARM_ B E 8 ( r e v r3 , r3 )
crc3 2 \ c \ ( ) w r0 , r0 , r3
2 : tst i p , #2
beq 1 f
ldrh r3 , [ r1 ] , #2
ARM_ B E 8 ( r e v16 r3 , r3 )
crc3 2 \ c \ ( ) h r0 , r0 , r3
1 : tst i p , #1
bxeq l r
ldrb r3 , [ r1 ]
crc3 2 \ c \ ( ) b r0 , r0 , r3
bx l r
.Lunaligned \ c :
tst r1 , #1
beq 2 f
ldrb r3 , [ r1 ] , #1
subs r2 , r2 , #1
crc3 2 \ c \ ( ) b r0 , r0 , r3
tst r1 , #2
beq 0 f
2 : ldrh r3 , [ r1 ] , #2
subs r2 , r2 , #2
ARM_ B E 8 ( r e v16 r3 , r3 )
crc3 2 \ c \ ( ) h r0 , r0 , r3
0 : subs i p , r2 , #8
bpl . L a l i g n e d8 \ c
b . L t a i l \ c
.endm
.align 5
ENTRY( c r c32 _ a r m v8 _ l e )
_ _ crc3 2
ENDPROC( c r c32 _ a r m v8 _ l e )
.align 5
ENTRY( c r c32 c _ a r m v8 _ l e )
_ _ crc3 2 c
ENDPROC( c r c32 c _ a r m v8 _ l e )