IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Today's implementation of csum_shift() leads to branching based on
parity of 'offset'
000002f8 <csum_block_add>:
2f8: 70 a5 00 01 andi. r5,r5,1
2fc: 41 a2 00 08 beq 304 <csum_block_add+0xc>
300: 54 84 c0 3e rotlwi r4,r4,24
304: 7c 63 20 14 addc r3,r3,r4
308: 7c 63 01 94 addze r3,r3
30c: 4e 80 00 20 blr
Use first bit of 'offset' directly as input of the rotation instead of
branching.
000002f8 <csum_block_add>:
2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28
2fc: 20 a5 00 20 subfic r5,r5,32
300: 5c 84 28 3e rotlw r4,r4,r5
304: 7c 63 20 14 addc r3,r3,r4
308: 7c 63 01 94 addze r3,r3
30c: 4e 80 00 20 blr
And change to left shift instead of right shift to skip one more
instruction. This has no impact on the final sum.
000002f8 <csum_block_add>:
2f8: 54 a5 1f 38 rlwinm r5,r5,3,28,28
2fc: 5c 84 28 3e rotlw r4,r4,r5
300: 7c 63 20 14 addc r3,r3,r4
304: 7c 63 01 94 addze r3,r3
308: 4e 80 00 20 blr
Seems like only powerpc benefits from a branchless implementation.
Other main architectures like ARM or X86 get better code with
the generic implementation and its branch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>