IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* * *
Add arch_canon statements for "armh", "armv7l", "armv8l". Re-order
them to be more similar to the current upstream rpmrc.in (say,
rpm-4.15 or rpm-4.13.0.1-alt22). Note, however, that this change
doesn't seem to be essential for anything, since "the arch in the lead
[the arch number] is not used for any purpose for most of this
century".[1]
[1]: https://stackoverflow.com/a/39426935/94687 "answer by Jeff Johnson"
* * *
Re-order ARM arch_compat statements to get an order similar to the
upstream (say, rpm-4.15 or rpm-4.13.0.1-alt22) to be able to compare
them more easily; compare to our rpmrc.in:
* the upstream first lists the big-endian archs (noarch -> armv4b);
* then the little-endian ones without hardfloat (noarch -> armv3l -> armv4l ->
-> armv4tl -> armv5tl -> armv5tel -> armv5tejl -> armv6l -> armv7l -> armv8l);
* the the little-endian ones with hardfloat
(noarch -> armv6hl -> armv7hl -> armv7hnl -> armv8hl);
* and separately the 64-bit one (noarch -> aarch64).
In our version of rpm-build we don't have any code for the detection
of the 'h' (hardfloat) or 'n' (neon) CPU features, but we actually
insert our "armh" arch into this chain (with 'h' hardfloat) between
what ought to be "armv6hl" and "armv7hl", and drop "armv6hl" from our
chain; in our compatibility chain, 'h' is silently supposed for "armv7l"
and "armv8l".
However, note a bad thing about this discrepancy: when our rpm-build
builds a package on armv8l targeting this arch, it's arch is armv8l.
However, when installing such a package on a normal ARMv8-A machine,
the 'h' feature must have been detected by "rpm -i", so our just built
package must not match the system arch and the installation must be
denied. However, in practice, I don't see such bad behavior in our
Girar builder when rpminstall-test-archcompat-checkinstall is
invoked... (I don't know why. Perhaps, the 'h' detection code doesn't
work as expected in rpm.)
* * *
Re-order similarly buildarch_compat statements.
Targeting armv8l should trigger all the general ARM conditions in
specfiles and set the custom optflags.
Fixes: c18f1b7d ("installplatform, rpmrc.in: made armv8l compatible with armh")
Add arm and armb CANONARCH: common CANONARCH for general purpose distribution. It's armv4 without Thumb now.
Update %optflags for ARM
Add %arm macros: list of all ARM processors
-mtune=i686 does not differ from -mtune=generic for gcc-4.1.x
(see gcc/config/i386/i386.c for details),
but -mtune=generic is not implemented in older gcc.
We use i586 as our default generic arch for x86 processors.
But -mtune=pentium4 is preferable only for Intel processors,
and possibly disadvantageous for AMD chips.
I suggest we use -mtune=generic instead. Here is what "man gcc" says
about -mtune=generic:
Produce code optimized for the most common IA32/AMD64/EM64T
processors. If you know the CPU on which your code will run, then
you should use the corresponding -mtune option instead of
-mtune=generic. But, if you do not know exactly what CPU users of
your application will have, then you should use this option.
As new processors are deployed in the marketplace, the behavior of
this option will change. Therefore, if you upgrade to a newer
version of GCC, the code generated option will change to reflect the
processors that were most common when that version of GCC was
released.
Now if you're willing to take a look at gcc/gcc/config/i386/i386.c,
you can see that -mtune= option affects only "instruction costs".
For example, AMD chips take fewer cycles to execute some divide/mod
instructions than Intel processors. Instruction costs can affect
peephole optimizer or something to make the resulting instruction
sequence take fewer cycles. It appears that "generic32_cost" provides
reasonable compromise so that the resulting code runs quite well
on all modern CPUs.
Update. I've been requested to provide some numbers.
I use perlbench-0.93 suite to measure libperl.so performance.
A) libperl.so compiled with -march=i586 -mtune=pentium4
B) libperl.so compiled with -march=i586 -mtune=generic
AMD Athlon 64 A B
------------- --- ---
arith/mixed 100 106
arith/trig 100 100
array/copy 100 104
array/foreach 100 94
array/index 100 108
array/pop 100 109
array/shift 100 107
array/sort-num 100 103
array/sort 100 100
call/0arg 100 105
call/1arg 100 96
call/2arg 100 101
call/9arg 100 107
call/empty 100 108
call/fib 100 103
call/method 100 106
call/wantarray 100 107
hash/copy 100 99
hash/each 100 91
hash/foreach-sort 100 96
hash/foreach 100 100
hash/get 100 102
hash/set 100 110
loop/for-c 100 104
loop/for-range-const 100 102
loop/for-range 100 103
loop/getline 100 106
loop/while-my 100 109
loop/while 100 113
re/const 100 104
re/w 100 102
startup/fewmod 100 104
startup/lotsofsub 100 107
startup/noprog 100 100
string/base64 100 102
string/htmlparser 100 102
string/index-const 100 110
string/index-var 100 74
string/ipol 100 105
string/tr 100 102
AVERAGE 100 103
Intel Xeon A B
---------- --- ---
arith/mixed 100 98
arith/trig 100 138
array/copy 100 101
array/foreach 100 100
array/index 100 94
array/pop 100 99
array/shift 100 117
array/sort-num 100 103
array/sort 100 105
call/0arg 100 101
call/1arg 100 97
call/2arg 100 93
call/9arg 100 98
call/empty 100 100
call/fib 100 116
call/method 100 92
call/wantarray 100 101
hash/copy 100 104
hash/each 100 102
hash/foreach-sort 100 102
hash/foreach 100 98
hash/get 100 102
hash/set 100 96
loop/for-c 100 128
loop/for-range-const 100 100
loop/for-range 100 103
loop/getline 100 94
loop/while-my 100 107
loop/while 100 102
re/const 100 99
re/w 100 92
startup/fewmod 100 101
startup/lotsofsub 100 98
startup/noprog 100 101
string/base64 100 100
string/htmlparser 100 70
string/index-const 100 103
string/index-var 100 101
string/ipol 100 105
string/tr 100 94
AVERAGE 100 101
Look ma, I've got about 3% performance boost on Athlon64 and even some
minor improvement on Intel Xeon! Also notice that, on Xeon, the
numbers are more diverse. I believe that the numbers prove that,
compared to -mtune=pentium4, -mtune=generic is beneficial for Athlon64
and at least makes no harm for Xeon.
Here is how to run perlbench:
$ echo ${PWD##*/}
perlbench-0.93
$ cat perl1 perl2
LD_LIBRARY_PATH=$PWD/lib1 exec /usr/bin/perl "$@"
LD_LIBRARY_PATH=$PWD/lib2 exec /usr/bin/perl "$@"
$ ls -l lib?/libperl*
-rw-r--r-- 1 at at 1173944 Jan 9 03:42 lib1/libperl.so.5.8
-rw-r--r-- 1 at at 1204984 Jan 9 03:46 lib2/libperl.so.5.8
$ ./perlbench-run ./perl1 ./perl2
...