Christophe Leroy f867d556dd powerpc32: optimise csum_partial() loop
On the 8xx, load latency is 2 cycles and taking branches also takes
2 cycles. So let's unroll the loop.

This patch improves csum_partial() speed by around 10% on both:
* 8xx (single issue processor with parallel execution)
* 83xx (superscalar 6xx processor with dual instruction fetch
and parallel execution)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 23:03:45 -06:00
..
2016-02-07 15:23:20 -08:00
2016-01-26 10:18:29 +02:00