IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The generic memcpy routine provided in kernel does only byte copies.
Using word copies we can lower boot time and cycles spend in memcpy
quite significantly.
Booting on my de0 nano I see boot times go from 7.2 to 5.6 seconds.
The avg cycles in memcpy during boot go from 6467 to 1887.
I tested several algorithms (see code in previous patch mails)
The implementations I tested and avg cycles:
- Word Copies + Loop Unrolls + Non Aligned 1882
- Word Copies + Loop Unrolls 1887
- Word Copies 2441
- Byte Copies + Loop Unrolls 6467
- Byte Copies 7600
In the end I ended up going with Word Copies + Loop Unrolls as it
provides best tradeoff between simplicity and boot speedups.
Signed-off-by: Stafford Horne <shorne@gmail.com>
This adds a hand-optimized assembler version of memset and sets
__HAVE_ARCH_MEMSET to use this version instead of the generic C
routine
Signed-off-by: Olof Kindgren <olof.kindgren@gmail.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
If the counter overflows during a __delay operation, we will exit the
loop prematurely. For example, calling __delay(0x100) with the counter
at 0xffffff00 gives us a target of 0x0. The unsigned comparison in the
while loop will likely be false on the first iteration (if the counter
is now anything other than 0) and we will return early.
This patch fixes the problem by comparing deltas in the loop rather than
absolute values.
Cc: Jon Masters <jcm@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
The openrisc implementation of __const_udelay casts the result of a
32-bit multiplication to 64 bits and passes the top 32 bits to __delay.
Since there are no casts on the arguments, this results in a __delay of
zero, regardless of the xloops parameter.
This patch fixes the problem by casting xloops to (unsigned long long),
ensuring that the multiplication is not truncated.
Cc: Jon Masters <jcm@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
As per commits 2922585b93 ("lib: Sparc's strncpy_from_user is generic
enough, move under lib/") and 92ae03f2ef ("x86: merge 32/64-bit
versions of 'strncpy_from_user()' and speed it up"), and corresponding
discussion on linux-arch.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>