2019-06-04 11:11:33 +03:00
/* SPDX-License-Identifier: GPL-2.0-only */
2017-01-11 19:41:52 +03:00
/ *
* Scalar A E S c o r e t r a n s f o r m
*
* Copyright ( C ) 2 0 1 7 L i n a r o L t d < a r d . b i e s h e u v e l @linaro.org>
* /
# include < l i n u x / l i n k a g e . h >
# include < a s m / a s s e m b l e r . h >
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
# include < a s m / c a c h e . h >
2017-01-11 19:41:52 +03:00
.text
rk . r e q x0
out . r e q x1
in . r e q x2
rounds . r e q x3
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
tt . r e q x2
2017-01-11 19:41:52 +03:00
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.macro _ _ pair1 , s z , o p , r e g 0 , r e g 1 , i n 0 , i n 1 e , i n 1 d , s h i f t
.ifc \ op\ s h i f t , b0
ubfiz \ r e g 0 , \ i n 0 , #2 , #8
ubfiz \ r e g 1 , \ i n 1 e , #2 , #8
.else
2017-01-29 02:25:37 +03:00
ubfx \ r e g 0 , \ i n 0 , #\ s h i f t , # 8
ubfx \ r e g 1 , \ i n 1 e , #\ s h i f t , # 8
2017-01-11 19:41:52 +03:00
.endif
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
/ *
* AArch6 4 c a n n o t d o b y t e s i z e i n d e x e d l o a d s f r o m a t a b l e c o n t a i n i n g
* 3 2 - bit q u a n t i t i e s , i . e . , ' l d r b w12 , [ t t , w12 , u x t w #2 ] ' i s n o t a
* valid i n s t r u c t i o n . S o p e r f o r m t h e s h i f t e x p l i c i t l y f i r s t f o r t h e
* high b y t e s ( t h e l o w b y t e i s s h i f t e d i m p l i c i t l y b y u s i n g u b f i z r a t h e r
* than u b f x a b o v e )
* /
.ifnc \ op, b
2017-01-29 02:25:37 +03:00
ldr \ r e g 0 , [ t t , \ r e g 0 , u x t w #2 ]
ldr \ r e g 1 , [ t t , \ r e g 1 , u x t w #2 ]
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.else
.if \ shift > 0
lsl \ r e g 0 , \ r e g 0 , #2
lsl \ r e g 1 , \ r e g 1 , #2
.endif
ldrb \ r e g 0 , [ t t , \ r e g 0 , u x t w ]
ldrb \ r e g 1 , [ t t , \ r e g 1 , u x t w ]
.endif
2017-01-29 02:25:37 +03:00
.endm
2017-01-11 19:41:52 +03:00
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.macro _ _ pair0 , s z , o p , r e g 0 , r e g 1 , i n 0 , i n 1 e , i n 1 d , s h i f t
ubfx \ r e g 0 , \ i n 0 , #\ s h i f t , # 8
ubfx \ r e g 1 , \ i n 1 d , #\ s h i f t , # 8
ldr\ o p \ r e g 0 , [ t t , \ r e g 0 , u x t w #\ s z ]
ldr\ o p \ r e g 1 , [ t t , \ r e g 1 , u x t w #\ s z ]
.endm
.macro _ _ hround, o u t 0 , o u t 1 , i n 0 , i n 1 , i n 2 , i n 3 , t 0 , t 1 , e n c , s z , o p
2017-01-29 02:25:37 +03:00
ldp \ o u t 0 , \ o u t 1 , [ r k ] , #8
2017-01-11 19:41:52 +03:00
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
_ _ pair\ e n c \ s z , \ o p , w12 , w13 , \ i n 0 , \ i n 1 , \ i n 3 , 0
_ _ pair\ e n c \ s z , \ o p , w14 , w15 , \ i n 1 , \ i n 2 , \ i n 0 , 8
_ _ pair\ e n c \ s z , \ o p , w16 , w17 , \ i n 2 , \ i n 3 , \ i n 1 , 1 6
_ _ pair\ e n c \ s z , \ o p , \ t 0 , \ t 1 , \ i n 3 , \ i n 0 , \ i n 2 , 2 4
eor \ o u t 0 , \ o u t 0 , w12
eor \ o u t 1 , \ o u t 1 , w13
eor \ o u t 0 , \ o u t 0 , w14 , r o r #24
eor \ o u t 1 , \ o u t 1 , w15 , r o r #24
eor \ o u t 0 , \ o u t 0 , w16 , r o r #16
eor \ o u t 1 , \ o u t 1 , w17 , r o r #16
2017-01-29 02:25:37 +03:00
eor \ o u t 0 , \ o u t 0 , \ t 0 , r o r #8
2017-01-11 19:41:52 +03:00
eor \ o u t 1 , \ o u t 1 , \ t 1 , r o r #8
.endm
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.macro fround, o u t 0 , o u t 1 , o u t 2 , o u t 3 , i n 0 , i n 1 , i n 2 , i n 3 , s z =2 , o p
_ _ hround \ o u t 0 , \ o u t 1 , \ i n 0 , \ i n 1 , \ i n 2 , \ i n 3 , \ o u t 2 , \ o u t 3 , 1 , \ s z , \ o p
_ _ hround \ o u t 2 , \ o u t 3 , \ i n 2 , \ i n 3 , \ i n 0 , \ i n 1 , \ i n 1 , \ i n 2 , 1 , \ s z , \ o p
2017-01-11 19:41:52 +03:00
.endm
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.macro iround, o u t 0 , o u t 1 , o u t 2 , o u t 3 , i n 0 , i n 1 , i n 2 , i n 3 , s z =2 , o p
_ _ hround \ o u t 0 , \ o u t 1 , \ i n 0 , \ i n 3 , \ i n 2 , \ i n 1 , \ o u t 2 , \ o u t 3 , 0 , \ s z , \ o p
_ _ hround \ o u t 2 , \ o u t 3 , \ i n 2 , \ i n 1 , \ i n 0 , \ i n 3 , \ i n 1 , \ i n 0 , 0 , \ s z , \ o p
2017-01-11 19:41:52 +03:00
.endm
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
.macro do_ c r y p t , r o u n d , t t a b , l t a b , b s z
ldp w4 , w5 , [ i n ]
ldp w6 , w7 , [ i n , #8 ]
ldp w8 , w9 , [ r k ] , #16
ldp w10 , w11 , [ r k , #- 8 ]
2017-01-11 19:41:52 +03:00
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
CPU_ B E ( r e v w4 , w4 )
2017-01-11 19:41:52 +03:00
CPU_ B E ( r e v w5 , w5 )
CPU_ B E ( r e v w6 , w6 )
CPU_ B E ( r e v w7 , w7 )
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
eor w4 , w4 , w8
2017-01-11 19:41:52 +03:00
eor w5 , w5 , w9
eor w6 , w6 , w10
eor w7 , w7 , w11
2017-01-29 02:25:36 +03:00
adr_ l t t , \ t t a b
2017-01-11 19:41:52 +03:00
tbnz r o u n d s , #1 , 1 f
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
0 : \ round w8 , w9 , w10 , w11 , w4 , w5 , w6 , w7
\ round w4 , w5 , w6 , w7 , w8 , w9 , w10 , w11
2017-01-11 19:41:52 +03:00
1 : subs r o u n d s , r o u n d s , #4
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
\ round w8 , w9 , w10 , w11 , w4 , w5 , w6 , w7
b. l s 3 f
2 : \ round w4 , w5 , w6 , w7 , w8 , w9 , w10 , w11
b 0 b
3 : adr_ l t t , \ l t a b
\ round w4 , w5 , w6 , w7 , w8 , w9 , w10 , w11 , \ b s z , b
CPU_ B E ( r e v w4 , w4 )
2017-01-11 19:41:52 +03:00
CPU_ B E ( r e v w5 , w5 )
CPU_ B E ( r e v w6 , w6 )
CPU_ B E ( r e v w7 , w7 )
crypto: arm64/aes - avoid expanded lookup tables in the final round
For the final round, avoid the expanded and padded lookup tables
exported by the generic AES driver. Instead, for encryption, we can
perform byte loads from the same table we used for the inner rounds,
which will still be hot in the caches. For decryption, use the inverse
AES Sbox directly, which is 4x smaller than the inverse lookup table
exported by the generic driver.
This should significantly reduce the Dcache footprint of our code,
which makes the code more robust against timing attacks. It does not
introduce any additional module dependencies, given that we already
rely on the core AES module for the shared key expansion routines.
It also frees up register x18, which is not available as a scratch
register on all platforms, which and so avoiding it improves
shareability of this code.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-24 13:28:20 +03:00
stp w4 , w5 , [ o u t ]
stp w6 , w7 , [ o u t , #8 ]
2017-01-11 19:41:52 +03:00
ret
.endm
2019-12-13 18:49:10 +03:00
SYM_ F U N C _ S T A R T ( _ _ a e s _ a r m 6 4 _ e n c r y p t )
2018-01-10 15:11:37 +03:00
do_ c r y p t f r o u n d , c r y p t o _ f t _ t a b , c r y p t o _ f t _ t a b + 1 , 2
2019-12-13 18:49:10 +03:00
SYM_ F U N C _ E N D ( _ _ a e s _ a r m 6 4 _ e n c r y p t )
2018-01-10 15:11:37 +03:00
.align 5
2019-12-13 18:49:10 +03:00
SYM_ F U N C _ S T A R T ( _ _ a e s _ a r m 6 4 _ d e c r y p t )
2019-07-02 22:41:49 +03:00
do_ c r y p t i r o u n d , c r y p t o _ i t _ t a b , c r y p t o _ a e s _ i n v _ s b o x , 0
2019-12-13 18:49:10 +03:00
SYM_ F U N C _ E N D ( _ _ a e s _ a r m 6 4 _ d e c r y p t )