WANG Rui 8941e93ca5 LoongArch: Optimize memory ops (memset/memcpy/memmove)
To optimize memset()/memcpy()/memmove() and so on, we use a jump table
to dispatch cases for short data lengths; and for long data lengths, we
split the destination into head part (first 8 bytes), tail part (last 8
bytes) and middle part. The head part and tail part may be at unaligned
addresses, while the middle part is always aligned (the middle part is
allowed to overlap the head/tail part). In this way, the first and last
8 bytes may be unaligned accesses, but we can make sure the data in the
middle is processed at an aligned destination address.

We have tested micro-bench[1] on a Loongson-3C5000 16-core machine (2.2GHz):

1. memset

| length | src offset | dst offset | speed before | speed after | %       |
|--------|------------|------------|--------------|-------------|---------|
| 8      | 0          | 0          | 696.191      | 1518.785    | 118.16% |
| 8      | 0          | 1          | 696.325      | 1518.937    | 118.14% |
| 50     | 0          | 0          | 969.976      | 8053.902    | 730.32% |
| 50     | 0          | 1          | 970.034      | 8058.475    | 730.74% |
| 300    | 0          | 0          | 5876.612     | 16544.703   | 181.53% |
| 300    | 0          | 1          | 5030.849     | 16549.011   | 228.95% |
| 1200   | 0          | 0          | 11797.077    | 16752.137   | 42.00%  |
| 1200   | 0          | 1          | 5687.141     | 16645.233   | 192.68% |
| 4000   | 0          | 0          | 15723.27     | 16761.557   | 6.60%   |
| 4000   | 0          | 1          | 5906.114     | 16732.316   | 183.30% |
| 8000   | 0          | 0          | 16751.403    | 16770.002   | 0.11%   |
| 8000   | 0          | 1          | 5995.449     | 16754.07    | 179.45% |

2. memcpy

| length | src offset | dst offset | speed before | speed after | %       |
|--------|------------|------------|--------------|-------------|---------|
| 8      | 0          | 0          | 696.2        | 1670.605    | 139.96% |
| 8      | 0          | 1          | 696.325      | 1671.138    | 139.99% |
| 50     | 0          | 0          | 969.974      | 8724.999    | 799.51% |
| 50     | 0          | 1          | 970.032      | 8730.138    | 799.98% |
| 300    | 0          | 0          | 5564.662     | 16272.652   | 192.43% |
| 300    | 0          | 1          | 4670.436     | 14972.842   | 220.59% |
| 1200   | 0          | 0          | 10740.23     | 16751.728   | 55.97%  |
| 1200   | 0          | 1          | 5027.741     | 14874.564   | 195.85% |
| 4000   | 0          | 0          | 15122.367    | 16737.642   | 10.68%  |
| 4000   | 0          | 1          | 5536.918     | 14890.397   | 168.93% |
| 8000   | 0          | 0          | 16505.453    | 16553.543   | 0.29%   |
| 8000   | 0          | 1          | 5821.619     | 14841.804   | 154.94% |

3. memmove

| length | src offset | dst offset | speed before | speed after | %       |
|--------|------------|------------|--------------|-------------|---------|
| 8      | 0          | 0          | 982.693      | 1670.568    | 70.00%  |
| 8      | 0          | 1          | 983.023      | 1671.174    | 70.00%  |
| 50     | 0          | 0          | 1230.87      | 8727.625    | 609.06% |
| 50     | 0          | 1          | 1232.515     | 8730.138    | 608.32% |
| 300    | 0          | 0          | 6490.375     | 16296.993   | 151.09% |
| 300    | 0          | 1          | 4282.687     | 14972.842   | 249.61% |
| 1200   | 0          | 0          | 11742.755    | 16752.546   | 42.66%  |
| 1200   | 0          | 1          | 5039.338     | 14872.951   | 195.14% |
| 4000   | 0          | 0          | 15467.786    | 16737.09    | 8.21%   |
| 4000   | 0          | 1          | 5009.905     | 14890.542   | 197.22% |
| 8000   | 0          | 0          | 16489.664    | 16553.273   | 0.39%   |
| 8000   | 0          | 1          | 5823.786     | 14858.646   | 155.14% |

* speed: MB/s
* length: byte

[1] https://github.com/heiher/mem-bench

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:43 +08:00
2023-02-26 11:53:25 -08:00
2023-04-20 12:46:18 -07:00
2023-04-22 09:18:35 -07:00
2023-04-15 10:29:53 -07:00
2023-03-01 09:27:00 -08:00
2023-04-23 08:22:25 -07:00
2023-04-19 17:55:45 -07:00
2022-09-28 09:02:20 +02:00
2022-10-10 12:00:45 -07:00
2023-04-23 12:02:52 -07:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Description
No description provided
Readme 5.7 GiB
Languages
C 97.6%
Assembly 1%
Shell 0.5%
Python 0.3%
Makefile 0.3%