Commit Graph

13 Commits

Author SHA1 Message Date
Alexey Tourbin
0e8ab4e05c rpmrc.in: use -mtune=generic instead of -mtune=pentium4 for i[3456]86
We use i586 as our default generic arch for x86 processors.
But -mtune=pentium4 is preferable only for Intel processors,
and possibly disadvantageous for AMD chips.

I suggest we use -mtune=generic instead.  Here is what "man gcc" says
about -mtune=generic:

    Produce code optimized for the most common IA32/AMD64/EM64T
    processors.  If you know the CPU on which your code will run, then
    you should use the corresponding -mtune option instead of
    -mtune=generic.  But, if you do not know exactly what CPU users of
    your application will have, then you should use this option.

    As new processors are deployed in the marketplace, the behavior of
    this option will change.  Therefore, if you upgrade to a newer
    version of GCC, the code generated option will change to reflect the
    processors that were most common when that version of GCC was
    released.

Now if you're willing to take a look at gcc/gcc/config/i386/i386.c,
you can see that -mtune= option affects only "instruction costs".
For example, AMD chips take fewer cycles to execute some divide/mod
instructions than Intel processors.  Instruction costs can affect
peephole optimizer or something to make the resulting instruction
sequence take fewer cycles.  It appears that "generic32_cost" provides
reasonable compromise so that the resulting code runs quite well
on all modern CPUs.

Update.  I've been requested to provide some numbers.
I use perlbench-0.93 suite to measure libperl.so performance.

A) libperl.so compiled with -march=i586 -mtune=pentium4
B) libperl.so compiled with -march=i586 -mtune=generic

AMD Athlon 64              A       B
-------------            ---     ---
arith/mixed              100     106
arith/trig               100     100
array/copy               100     104
array/foreach            100      94
array/index              100     108
array/pop                100     109
array/shift              100     107
array/sort-num           100     103
array/sort               100     100
call/0arg                100     105
call/1arg                100      96
call/2arg                100     101
call/9arg                100     107
call/empty               100     108
call/fib                 100     103
call/method              100     106
call/wantarray           100     107
hash/copy                100      99
hash/each                100      91
hash/foreach-sort        100      96
hash/foreach             100     100
hash/get                 100     102
hash/set                 100     110
loop/for-c               100     104
loop/for-range-const     100     102
loop/for-range           100     103
loop/getline             100     106
loop/while-my            100     109
loop/while               100     113
re/const                 100     104
re/w                     100     102
startup/fewmod           100     104
startup/lotsofsub        100     107
startup/noprog           100     100
string/base64            100     102
string/htmlparser        100     102
string/index-const       100     110
string/index-var         100      74
string/ipol              100     105
string/tr                100     102
AVERAGE                  100     103

Intel Xeon                 A       B
----------               ---     ---
arith/mixed              100      98
arith/trig               100     138
array/copy               100     101
array/foreach            100     100
array/index              100      94
array/pop                100      99
array/shift              100     117
array/sort-num           100     103
array/sort               100     105
call/0arg                100     101
call/1arg                100      97
call/2arg                100      93
call/9arg                100      98
call/empty               100     100
call/fib                 100     116
call/method              100      92
call/wantarray           100     101
hash/copy                100     104
hash/each                100     102
hash/foreach-sort        100     102
hash/foreach             100      98
hash/get                 100     102
hash/set                 100      96
loop/for-c               100     128
loop/for-range-const     100     100
loop/for-range           100     103
loop/getline             100      94
loop/while-my            100     107
loop/while               100     102
re/const                 100      99
re/w                     100      92
startup/fewmod           100     101
startup/lotsofsub        100      98
startup/noprog           100     101
string/base64            100     100
string/htmlparser        100      70
string/index-const       100     103
string/index-var         100     101
string/ipol              100     105
string/tr                100      94
AVERAGE                  100     101

Look ma, I've got about 3% performance boost on Athlon64 and even some
minor improvement on Intel Xeon!  Also notice that, on Xeon, the
numbers are more diverse.  I believe that the numbers prove that,
compared to -mtune=pentium4, -mtune=generic is beneficial for Athlon64
and at least makes no harm for Xeon.

Here is how to run perlbench:

$ echo ${PWD##*/}
perlbench-0.93
$ cat perl1 perl2
LD_LIBRARY_PATH=$PWD/lib1 exec /usr/bin/perl "$@"
LD_LIBRARY_PATH=$PWD/lib2 exec /usr/bin/perl "$@"
$ ls -l lib?/libperl*
-rw-r--r-- 1 at at 1173944 Jan  9 03:42 lib1/libperl.so.5.8
-rw-r--r-- 1 at at 1204984 Jan  9 03:46 lib2/libperl.so.5.8
$ ./perlbench-run ./perl1 ./perl2
...
2007-01-09 04:54:07 +03:00
894d09e31b Rename athlonxp platform to athlon_xp (#9991). 2006-09-16 21:50:43 +00:00
5416277102 Removed cvsid tags. 2006-05-14 17:05:34 +04:00
7412cdf5ca add athlonxp 2006-03-20 14:50:02 +00:00
c5f9cd8146 added pentium2, pentium3, sparcv8, darwin, macosx; replaced -mcpu=i686 with -mtune=pentium4; added -mtune=athlon-4 for k6-compatibles 2006-03-20 01:18:49 +00:00
7296d18e9a define optflags for noarch 2005-10-10 15:30:58 +00:00
18ec76875b removed never used @SYSCONFIGDIR@/%{_target_platform}/macros from macrofiles list 2005-06-29 18:09:22 +00:00
9835799623 Added x86_64 support. 2005-06-16 16:18:15 +00:00
65aaf3b23d Added pentium4 arch support (Raorn, #5259) 2004-10-31 19:08:39 +00:00
83ed90bde3 added armv5 arch support 2003-09-12 16:09:59 +00:00
84b95c5f91 added %_sysconfdir/%name/macros.d/* to macrofiles search list 2002-08-29 16:17:08 +00:00
7609dfc177 sync with rpm4 branch 2002-03-25 22:32:55 +00:00
82a4763c66 Initial revision 2002-03-25 20:16:26 +00:00