• No results found

This chapter enumerates the best-known peak switches (as of May 2004) for SPEC®-CPU2000 benchmarks compiled for AMD Athlon 64™ and AMD Opteron™ processor-based platforms by different compilers.

5.1 SuSE GCC 3.3.3 (64-Bit) C/C++ Compiler for Linux

Table 12 shows the best-known peak switches for various benchmarks in the SPEC-CPU2000 suite for the SuSE 64-bit GCC C/C++ compiler for Linux on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 12. Best-Known Peak Switches for the 64-Bit SuSE GCC 3.3.3 C/C++ Compiler for Linux

Benchmark Program Best-Known Peak Switches 164.gzip: -O3 -funroll-all-loops -finline-limit=900

-freduce-all-givs and

-fprofile-arcs/-fbranch-probabilities

175.vpr: -O3 -funroll-all-loops -finline-limit=1000 and -fprofile-arcs/-fbranch-probabilities

176.gcc: -O3 -funroll-all-loops -finline-limit=900 and -fprofile-arcs/-fbranch-probabilities

181.mcf: -O3 -funroll-all-loops -m32, and -fprofile-arcs/-fbranch-probabilities

186.crafty: -O3 -funroll-all-loops and -fprefetch-loop-arrays 197.parser: -O3 -funroll-all-loops -m32, and

-fprofile-arcs/-fbranch-probabilities

252.eon: -O3 -funroll-all-loops -ffast-math -finline-limit=3000 and -fprofile-arcs/-fbranch-probabilities

253.perlbmk: -O3 -funroll-all-loops -finline-limit=1000 and -fprofile-arcs/-fbranch-probabilities

254.gap: -O3 -funroll-all-loops and

-fprofile-arcs/-fbranch-probabilities

255.vortex: -O3 -funroll-all-loops -finline-limit=1000 and -fprofile-arcs/-fbranch-probabilities

Table 12. Best-Known Peak Switches for the 64-Bit SuSE GCC 3.3.3 C/C++ Compiler for Linux (Continued)

Benchmark Program Best-Known Peak Switches

256.bzip2: -O3 -funroll-all-loops -freduce-all-givs -finline-limit=2700 and

-fprofile-arcs/-fbranch-probabilities 300.twolf: -O3 -funroll-all-loops -freduce-all-givs

-finline-limit=2000 and

-fprofile-arcs/-fbranch-probabilities

17.mesa: -O3 -funroll-all-loops -finline-limit=2000 and -fprofile-arcs/-fbranch-probabilities

179.art: -O3 -funroll-all-loops -ffast-math -finline-limit=1500 and

-fprofile-arcs/-fbranch-probabilities 183.equake: -O3 -funroll-all-loops -ffast-math

-finline-limit=2000 and

-fprofile-arcs/-fbranch-probabilities 188.ammp: -O3 -funroll-all-loops -ffast-math

-finline-limit=2000 and

-fprofile-arcs/-fbranch-probabilities

Note: The -m32 switch improves the performance of 181.mcf, 197.parser and 300.twolf by reducing memory footprint.

5.2 Pathscale EKO 2.1 C/C++ Compiler (64-Bit) for Linux

Table 13 shows the best-known peak switches for various benchmarks in the SPEC-CPU2000 suite for the PathScale C/C++ compiler (64-bit) for Linux on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 13. Best-Known Peak Switches for the Pathscale 1.4 C/C++ Compiler for Linux

Benchmark Program Best-Known Peak Switches

164.gzip: -O3 -ipa -m3dnow -WOPT:val=0 +FDO

175.vpr:

-O2 -ipa -OPT:alias=disjoint -CG:p2align_freq=500000 +FDO -INLINE:aggressive=on

-IPA:space=300:plimit=10000:callee_limit=5000:linear=on 176.gcc: -O3 -ipa -OPT:goto=off +FDO

181.mcf: -O3 -ipa -IPA:field_reorder=on -m32 +FDO

32035 Rev. 3.18 June 2005 Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms Table 13. Best-Known Peak Switches for the 32-Bit Intel 7.0 C/C++ Compiler for Linux

(Continued)

Benchmark Program Best-Known Peak Switches

186.crafty: -O3 -OPT:goto=off +FDO

197.parser: -O3 -ipa -m32 -IPA:ctype=on +FDO

252.eon:

-Ofast -CG:gcm=off:p2align_freq=1:prefetch=off -OPT:treeheight=on -LNO:fu=10:full_unroll_outer=on -TENV:X=4:frame_pointer=off -fno-exceptions -IPA:plimit=4000 +FDO

253.perlbmk: -O3 -ipa +FDO -IPA:plimit=10000

254.gap: -Ofast -WOPT:aggstr=off +FDO

255.vortex -Ofast -IPA:plimit=1800 -OPT:goto=off -CG:p2align=on +FDO

256.bzip2 -Ofast +FDO

300.twolf

-O2 -CG:gcm=off:p2align_freq=100000 -WOPT:mem_opnds=on +FDO -m32

-OPT:unroll_times_max=8:unroll_size=256:alias=disjoint:Ofast 177.mesa -O2 -ipa -OPT:Ofast -fno-math-errno -CG:local_fwd_sched=on

+FDO

179.art -O3 -OPT:ro=2:div_split=on:alias=typed -fno-math-errno -m32 +FDO

183.equake -Ofast -WOPT:mem_opnds=on -m32

188.ammp -O3 -OPT:alias=disjoint:unroll_times_max=8:Ofast:ro=3 -fno-math-errno -TENV:X=4 +FDO

Note: +FDO is feedback optimization—PASS1= -fb_create fbdata and PASS2= -fb_opt fbdata.

5.3 Pathscale EKO 2.1 Fortran Compiler (64-bit) for Linux

Table 14 on page 70 shows the best-known peak switches for various benchmarks in the SPEC-CPU2000 suite for the Pathscale Fortran compiler (64-bit) for Linux on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 14. Best-Known Peak Switches for the 64-bit Pathscale 1.4 Fortran Compiler for Linux

Benchmark Program Best-Known Peak Switches

168.wupwise: -Ofast -LNO:prefetch_ahead=5:prefetch=3

-OPT:unroll_times_max=8:unroll_size=128:IEEE_NaN_Inf=off:ro=3 -TENV:X=4 +FDO

-IPA:space=1000:linear=on:plimit=50000:callee_limit=5000 -INLINE:aggressive=on

171.swim: -Ofast -LNO:fusion=2 -m3dnow 172.mgrid: -O3 -LNO:fusion=2:blocking=off

-OPT:Ofast:unroll_times_max=8:unroll_size=256:ro=3 -CG:gcm=off:cflow=off -m3dnow

173.applu: -Ofast -CG:local_fwd_sched=on

-LNO:fusion=2:fission=2:full_unroll_size=10000:prefetch=3 -OPT:ro=3 -TENV:X=3 -WOPT:val=2

178.galgel: -Ofast -OPT:fast_complex -CG:use_movlpd=on +ACML

187.facerec: -Ofast -OPT:treeheight=on:IEEE_NaN_Inf=off:ro=3 -CG:load_exe=0 -LNO:fusion=2 -IPA:plimit=1500 -WOPT:if_conv=off +FDO

189.lucas: -Ofast -CG:local_fwd_sched=on -LNO:fusion=2 +FDO

191.fma3d: -O2 -ipa -WOPT:mem_opnds=on:retype_expr=on -CG:load_exe=1 -OPT:Ofast:IEEE_arith=3:ro=3 +FDO -IPA:pu_reorder=1

200.sixtrack: -O3 -CG:load_exe=1 -OPT:Ofast:Olimit=6000 -fno-math-errno +FDO 301.apsi: -Ofast -TENV:X=4 -LNO:fusion=2:prefetch=0

Note: +FDO is feedback optimization—PASS1= -fb_create fbdata and PASS2= -fb_opt fbdata.

5.4 PGI 6.0 Fortran Compiler (64-Bit) for Linux

Table 15 on page 71 shows the best-known peak switches for various benchmarks in the SPEC-CPU2000 suite for the 64-bit PGI Fortran compiler (version 6.0) for Linux on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

32035 Rev. 3.18 June 2005 Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms

Table 15. Best-Known Peak Switches for the 64-Bit PGI Fortran Compiler for Linux Benchmark Program Best-Known Peak Switches

168.wupwise: -fastsse -Mipa=fast,inline 171.swim: -fastsse and -Mipa=fast,inline 172.mgrid: -fastsse and -Mipa=fast,inline 173.applu: -fastsse and -Mipa=fast

178.galgel:* -fastsse -mp and -lacml (link with ACML) 187.facerec: -fastsse and -Mipa=fast

189.lucas: -fastsse and -Mipa=fast,inline 191.fma3d: -fastsse and -Mipa=fast,inline 200.sixtrack: -fastsse and -Mipa=fast,inline 301.apsi: -fastsse and -Mipa=fast

Note:* The benchmark, 178.galgel, uses some LAPACK routines. Performance is improved by using the ACML routines for these functions instead of the generic Fortran implementation included in galgel source.

5.5 Intel 8.0 C/C++ Compiler for (32-Bit) Microsoft

®

Windows

®

Table 16 shows the best-known peak switches for various programs in the SPEC-CPU2000 benchmarks for the 32-bit Intel 8.0 C/C++ compiler for Microsoft Windows on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 16. Best-Known Peak Switches for the 32-Bit Intel 8.0 C/C++ Compiler for Microsoft® Windows®

Benchmark Program Best-Known Peak Switches

164.gzip: -fast, -arch:SSE, shlW32M6.lib, and -prof_gen/-prof_use 175.vpr: -fast, -arch:SSE2, -prof_gen/-prof_use,

-Qoption,c,-ip_ninl_max_stats=2000, and -Qoption,c,-ip_ninl_max_total_stats=4500

176.gcc: -fast, -arch:SSE2, -prof_gen/-prof_use, -Oi-, and -Qunroll3 181.mcf: -fast, -QaxN, and -prof_gen/-prof_use

Table 16. Best-Known Peak Switches for the 32-Bit Intel 8.0 C/C++ Compiler for Microsoft® Windows® (Continued)

Benchmark Program Best-Known Peak Switches 186.crafty: -fast, -arch:SSE2, and -prof_gen/-prof_use 197.parser: -arch:SSE2, -prof_gen/-prof_use, -Oi-, and -Qipo 252.eon: -fast -arch:SSE2 -prof_gen/-prof_use -Qansi_alias,

-Qoption,c,-ip_ninl_max_stats=2000 and -Qoption,c,-ip_ninl_max_total_stats=4500

253.perlbmk: -arch:SSE2 -prof_gen/-prof_use -Qipo and shlW32M6.lib 254.gap: -fast -arch:SSE2 -prof_gen/-prof_use -Oi- -Oa

-Qoption,c,-ip_ninl_max_stats=500 and -Qoption,c,-ip_ninl_max_total_stats=3000

255.vortex: -fast -arch:SSE -prof_gen/-prof_use -Oi- shlW32M6.lib -Qoption,c,-ip_ninl_max_stats=2000 and

-Qoption,c,- ip_ninl_max_total_stats=4500 256.bzip2: -fast and -Qunroll2

300.twolf: -fast -arch:SSE2 -prof_gen/-prof_use -Qunroll3 shlW32M6.lib and -Qansi_alias

177.mesa: -Qipo -arch:SSE2 -Qunroll1 -Qansi_alias -Qoption,f,-ip_ninl_max_stats=1500

-Qoption,f,-ip_ninl_max_total_stats=4500 and -Qprof_gen/-Qprof_use

179.art: -Qipo and -Zp4

183.equake: -fast -arch:SSE2 -QaxW -Qansi_alias and -Qprof_gen/-Qprof_use

188.ammp: -Oa -arch:SSE2 -Zp4 -Qansi_alias and -Qprof_gen/-Qprof_use

5.6 PGI 6.0 Fortran Compiler (32-Bit) for Microsoft

®

Windows

®

Table 17 on page 73 shows the best-known peak switches for various programs in the SPEC-CPU2000 benchmarks for the 32-bit PGI Fortran compiler (version 6.0) for Microsoft Windows on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

32035 Rev. 3.18 June 2005 Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms Table 17. Best-Known Peak Switches for the 32-Bit PGI Fortran Compiler for Microsoft®

Windows®

Benchmark Program Best-Known Peak Switches 168.wupwise: -fastsse and -Mipa=fast,inline

171.swim: -fastsse and -Mipa=fast,inline 172.mgrid: -fastsse and -Mipa=fast,inline 173.applu: -fastsse and -Mipa=fast,inline

178.galgel: -fastsse -Mipa=fast,safe -Munix -mp and -lacml (linking with ACML)

187.facerec: -fastsse and -Mipa=fast,inline 189.lucas: -fastsse and -Mipa=fast,inline 191.fma3d: -fastsse and -Mipa=fast,inline 200.sixtrack: -fastsse and -Mipa=fast,inline 301.apsi: -fastsse and -Mipa=fast,align

5.7 Sun C/C++ Compiler (64-bit) for Solaris

Table 18 shows the best-known peak switches for various programs in the SPEC-CPU2000 benchmarks for the 64-bit Sun C and C++ compilers (version 5.7) for Solaris on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 18. Best-Known Peak Switches for the 64-bit Sun C/C++ Compilers for Solaris Benchmark Program Best-Known Peak Switches

164.gzip: -fast -xpagesize=2m –xcrossfile -M /usr/lib/ld/map.bssalign 175.vpr: -fast -xpagesize=2m -W2,-Ainline:inc=200:cs=500 -M

/usr/lib/ld/map.bssalign -lmopt

176.gcc: -fast -xpagesize=2m -M /usr/lib/ld/map.bssalign

181.mcf: -fast -xpagesize=2m -xcrossfile -M /usr/lib/ld/map.bssalign 186.crafty: -fast -xpagesize=2m -xcrossfile -xarch=amd64 -M

/usr/lib/ld/map.bssalign -lbsdmalloc

197.parser: fast xpagesize=2m xipo=2 W2,Ainline:inc=200:cs=500 -M /usr/lib/ld/map.bssalign

Table 18. Best-Known Peak Switches for the 64-bit Sun C/C++ Compilers for Solaris (Continued)

Benchmark Program Best-Known Peak Switches

252.eon: -fast -xpagesize=2m -xcrossfile -Qoption ube -ZB -Qoption ube -xcallee=yes -xarch=amd64 -M /usr/lib/ld/map.bssalign

253.perlbmk: -fast -xcrossfile -M /usr/lib/ld/map.bssalign -lbsdmalloc 254.gap: -Xc -fast -xipo=2 -M /usr/lib/ld/map.bssalign

255.vortex: -fast -xcrossfile -xarch=amd64 -Xc -Wu,-ZB -Wu,-xcallee=yes -M /usr/lib/ld/map.bssalign

256.bzip2: -fast -xpagesize=2m -xcrossfile -xarch=sse2 -Xc -M /usr/lib/ld/map.bssalign -lbsdmalloc

300.twolf: -fast -xpagesize=2m -xcrossfile -M /usr/lib/ld/map.bssalign 177.mesa: fast xipo=2 xarch=amd64 xalias_level=strong

-xpagesize=2m +FDO

179.art: fast xipo=2 xarch=amd64 xalias_level=std xpagesize=2m -Xc -M /usr/lib/ld/map.bssalign -lm

183.equake: -fast -xipo=2 -xprefetch -xalias_level=strong -xpagesize=2m – lmopt –lm +FDO

188.ammp: fast xipo=2 xarch=amd64 xalias_level=std -xpagesize_heap=2m –lmopt –lm

Note: +FDO is feedback optimization—PASS1= xprofile=collect and PASS2= -xprofile=use.

5.8 Sun Fortran Compiler (64-bit) for Solaris

Table 19 shows the best-known peak switches for various programs in the SPEC-CPU2000 benchmarks for the 64-bit Sun Fortran compiler (version 5.7) for Solaris on AMD Athlon 64 processor-based platforms and AMD Opteron processor-based platforms.

Table 19. Best-Known Peak Switches for the 64-bit Sun Fortran Compiler for Solaris Benchmark Program Best-Known Peak Switches

168.wupwise: -fast -xipo=2 -xarch=amd64 -xprefetch_level=3 -xpagesize_heap=2m

171.swim: -fast -xipo=2 -xprefetch_level=3 -Qoption iropt -Atile:skewp,-Ainline:cs=700 -xarch=amd64 -qoption ube_ipa -inl_alt -xpagesize_stack=2m

32035 Rev. 3.18 June 2005 Compiler Usage Guidelines for 64-Bit Operating Systems on AMD64 Platforms Table 19. Best-Known Peak Switches for the 64-bit Sun Fortran Compiler for Solaris

(Continued)

Benchmark Program Best-Known Peak Switches

172.mgrid: fast xipo=2 xarch=amd64 xprefetch_level=3 xvector -xpagesize=2m -Qoption ld -M,/usr/lib/ld/map.bssalign

173.applu: -fast -xipo=2 -xprefetch_level=3 -xarch=amd64 -Qoption iropt -Aujam:inner=g -xpagesize_heap=2m

178.galgel: -fast -xipo=2 -xprefetch_level=3 -xvector=simd -xarch=amd64 xpagesize_heap=2m Qoption ld M,/usr/lib/ld/map.bssalign -xlic_lib=sunperf

187.facerec: fast xipo=2 xprefetch_level=3 xpagesize=2m -xlic_lib=sunperf

189.lucas: fast xprefetch_level=3 Qoption ld

-M,/usr/lib/ld/map.bssalign -xpagesize_stack=2m 191.fma3d: fast xipo=2 xprefetch_level=3 xarch=amd64

-xpagesize_heap=2m +FDO

200.sixtrack: fast xipo=2 xprefetch_level=3 xarch=amd64

-xpagesize_heap=2m -Qoption ld -M,/usr/lib/ld/map.bssalign +FDO

301.apsi: -fast -xipo=2 -xprefetch_level=3 -xarch=amd64 -xpagesize=2m Note: +FDO is feedback optimization—PASS1= -xprofile=collect and PASS2=

-xprofile=use.

Related documents