• No results found

Chapter 8 Conclusions and Future Work

B.2 Sweep3D Validations

Table B.6:Sweep3D Model Validation on Jaguar (Cray XT4) -10003total problem size,

Htile= 2,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

1K 32 32 32 32 3.16E−7 3.69E−7 36.01 38.56 -6.62 34.99 1.03 2K 64 32 16 32 3.71E−7 4.26E−7 21.78 24.98 -12.81 20.78 1.00

4K 64 64 16 16 3.71E−7 4.26E−7 11.78 13.36 -11.83 10.79 0.99

8K 128 64 8 16 4.14E−7 5.00E−7 7.34 8.43 -12.87 6.42 0.92

Table B.7:Sweep3D Model Validation on Jaguar (Cray XT4) -20×106total problem size, Htile= 2,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

1K 32 32 9 9 3.58E−7 3.89E−7 1.23 1.38 -10.68 0.99 0.24

2K 64 32 5 9 4.33E−7 4.78E−7 0.97 1.11 -13.05 0.72 0.25

4K 64 64 5 5 4.40E−7 5.00E−7 0.73 0.94 -22.41 0.47 0.26

8K 128 64 3 5 6.00E−7 6.33E−7 0.68 0.90 -23.85 0.41 0.27

Table B.8:Sweep3D Model Validation on Jaguar (Cray XT4) -5×5×400per processor problem size,Htile = 5,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

4 2 2 5 5 4.64E−7 5.19E−7 0.62 0.59 4.81 0.47 0.15 8 4 2 5 5 4.64E−7 5.19E−7 0.62 0.62 0.06 0.48 0.15 16 4 4 5 5 4.64E−7 5.19E−7 0.63 0.67 -6.1 0.48 0.15 32 8 4 5 5 4.64E−7 5.19E−7 0.64 0.67 -4.79 0.49 0.15 64 8 8 5 5 4.64E−7 5.19E−7 0.65 0.7 -6.16 0.5 0.16 128 16 8 5 5 4.64E−7 5.19E−7 0.67 0.71 -6.47 0.51 0.16 256 16 16 5 5 4.64E−7 5.19E−7 0.7 0.77 -9.17 0.53 0.16 1K 32 32 5 5 4.64E−7 5.19E−7 0.78 0.84 -7.35 0.6 0.18 2K 64 32 5 5 4.64E−7 5.19E−7 0.84 0.95 -11.49 0.65 0.19 4K 64 64 5 5 4.64E−7 5.19E−7 0.96 1.07 -10.95 0.74 0.21

Table B.9:Sweep3D Model Validation on Jaguar (Cray XT4) -14×14×255per processor problem size,Htile= 2.5,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

4 2 2 14 14 3.61E−7 3.69E−7 1.99 1.97 1.17 1.8 0.19

16 4 4 14 14 3.61E−7 3.69E−7 2.02 2.07 -2.54 1.83 0.2

64 8 8 14 14 3.61E−7 3.69E−7 2.08 2.19 -5.31 1.88 0.2

256 16 16 14 14 3.61E−7 3.69E−7 2.19 2.31 -5.27 1.98 0.21

1K 32 32 14 14 3.61E−7 3.69E−7 2.41 2.68 -10.22 2.19 0.22

Table B.10:Sweep3D Model Validation on Jaguar (Cray XT4) -20×20×1000per processor problem size,Htile = 5,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

4 2 2 20 20 3.91E−7 4.35E−7 16.29 16.55 -1.58 15.87 0.43

16 4 4 20 20 3.91E−7 4.35E−7 16.41 16.75 -2.01 15.98 0.43 64 8 8 20 20 3.91E−7 4.35E−7 16.65 16.96 -1.85 16.22 0.43

256 16 16 20 20 3.91E−7 4.35E−7 17.13 17.87 -4.14 16.69 0.44 1K 32 32 20 20 3.91E−7 4.35E−7 18.09 19.52 -7.33 17.63 0.46

Table B.11:Sweep3D Model Validation on Jaguar (Cray XT4) -45×45×1000per processor problem size,Htile = 5,mmi= 6

NPE n m Nx

n Ny

m Wg,nf Wg,f Pred Exec Error Compute Comm

(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)

32 8 4 45 45 3.32E−7 3.63E−7 68.77 69.28 -0.73 68.25 0.52 256 16 16 45 45 3.32E−7 3.63E−7 71.47 74.23 -3.72 70.93 0.54

512 32 16 45 45 3.32E−7 3.63E−7 72.82 75.08 -3.01 72.27 0.54 1K 32 32 45 45 3.32E−7 3.63E−7 75.51 78.33 -3.6 74.96 0.55

C

cflow

work

from

sweep.x

Listing C.1:sweep.x (∗ ∗ CHIP3S ∗ A p p l i c a t i o n C h a r a c t e r i s a t i o n Tool ∗ Source : sweep . c ∗ RUV Type : c l c ∗) . . . . . . . .

proc cflow work { (∗ Defined a t sweep . c : 6 9 7 ∗) [ 6 9 7 ] compute <i s c l c , FCAL>;

[ 7 4 1 ] c a s e (<i s c l c , IFBR>) { do dsa :

[ 7 4 3 ] compute <i s c l c , AILL , TILL , SILL>; [ 7 4 5 ] loop (<i s c l c , LFOR>, mmi) {

[ 7 4 5 ] compute <i s c l c , CMLL, AILL , TILL , SILL>; [ 7 4 9 ] loop (<i s c l c , LFOR>, nk ) {

[ 7 4 9 ] compute <i s c l c , CMLL, AILL>; [ 7 5 1 ] compute <i s c l c , AILL>; [ 7 5 1 ] c a l l cflow s i g n ;

[ 7 5 1 ] compute <i s c l c , TILL , SILL>; [ 7 5 3 ] loop (<i s c l c , LFOR>, i t ) {

[ 7 5 3 ] compute <i s c l c , CMLL, 3∗ARD3, ARD1, MFDL, AFDL , TFDL , INLL>;

}

[ 7 4 9 ] compute <i s c l c , INLL>; }

[ 7 4 5 ] compute <i s c l c , INLL>; }

}

[ 7 6 5 ] compute <i s c l c , SILL>; [ 7 6 5 ] loop (<i s c l c , LFOR>, mmi) {

[ 7 6 5 ] compute <i s c l c , CMLL, ARL1 , SILL , INLL>; }

[ 7 6 8 ] compute <i s c l c , SILL>;

[ 7 6 8 ] loop (<i s c l c , LFOR>, j t +nk−1+mmi−1) {

[ 7 6 8 ] compute <i s c l c , 4∗AILL , CMLL, SILL , TILL>; [ 7 7 2 ] loop (<i s c l c , LFOR>, mmi−1) {

[ 7 7 2 ] compute <i s c l c , CMLL, 3∗ARL1 , 2∗TILL , AILL , INLL>; }

[ 7 7 7 ] compute <i s c l c , 2∗AILL>; [ 7 7 7 ] c a l l cflow min ;

[ 7 7 7 ] c a l l cflow min ; [ 7 7 7 ] c a l l cflow min ; [ 7 7 7 ] c a l l cflow max ;

[ 7 7 7 ] compute <i s c l c , 2∗ARL1 , 2∗TILL , AILL , 2∗SILL>; [ 8 0 0 ] loop (<i s c l c , LFOR>, ndiag ) {

[ 8 0 4 ] loop (<i s c l c , LFOR>, mmi−1) {

[ 8 0 4 ] compute <i s c l c , 2∗AILL , CMLL, ARL1 , TILL , INLL>; }

[ 8 1 1 ] compute <i s c l c , 2∗TILL , 3∗AILL>; [ 8 1 3 ] c a l l cflow min ;

[ 8 1 3 ] compute <i s c l c , AILL>; [ 8 1 3 ] c a l l cflow s i g n ;

[ 8 1 3 ] compute <i s c l c , TILL , 3∗AILL>; [ 8 1 4 ] c a l l cflow max ;

[ 8 1 4 ] compute <i s c l c , AILL>; [ 8 1 4 ] c a l l cflow s i g n ;

[ 8 1 4 ] compute <i s c l c , 3∗TILL , 2∗AILL , ABSI , 5∗ARD1, 2∗MFDL , 4∗TFDL , ARD3, SILL>;

[ 8 4 0 ] loop (<i s c l c , LFOR>, i t ) {

[ 8 4 0 ] compute <i s c l c , CMLL, ARD3, ARD1, TFDL , INLL>; }

[ 8 4 2 ] compute <i s c l c , SILL>; [ 8 4 2 ] loop (<i s c l c , LFOR>, nm−1) { [ 8 4 2 ] compute <i s c l c , CMLL, SILL>; [ 8 4 4 ] loop (<i s c l c , LFOR>, i t ) {

[ 8 4 4 ] compute <i s c l c , CMLL, 2∗ARD1, 2∗ARD3, MFDL, AFDL , TFDL , INLL>;

}

[ 8 4 2 ] compute <i s c l c , INLL>; }

[ 8 4 8 ] c a s e (<i s c l c , IFBR>) { (−i f i x u p s )/(−e p s i ) :

[ 8 5 5 ] compute <i s c l c , TILL>; [ 8 5 5 ] loop (<i s c l c , LFOR>, i t ) {

[ 8 5 5 ] compute <i s c l c , 4∗CMLL, 3∗ANDL, 8∗ARD1, 8∗MFDL , 9∗TFDL , 7∗ARD3, 9∗AFDL, DFDL, AILL , TILL>; }

1−((−i f i x u p s )/(−e p s i ) ) : [ 8 8 1 ] compute <i s c l c , TILL>; [ 8 8 1 ] loop (<i s c l c , LFOR>, i t ) {

[ 8 8 1 ] compute <i s c l c , 4∗CMLL, 3∗ANDL, 7∗ARD1, 8∗MFDL , 8∗TFDL , 5∗ARD3, 9∗AFDL, DFDL, SILL , CMDL>; [ 9 0 2 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 0 4 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD1, SFDL , CMDL>;

[ 9 1 0 ] c a s e (<i s c l c , IFBR>) { 0 . 5 :

[ 9 1 0 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL

, TFDL>; }

[ 9 1 2 ] compute <i s c l c , CMDL>; [ 9 1 2 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 1 2 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL

, TFDL>; }

[ 9 1 3 ] compute <i s c l c , SILL>; }

[ 9 1 6 ] compute <i s c l c , CMDL>; [ 9 1 6 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 1 8 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD3, ARD1, SFDL , CMDL>;

0 . 5 :

[ 9 2 4 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL

, TFDL>; }

[ 9 2 6 ] compute <i s c l c , CMDL>; [ 9 2 6 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 2 6 ] compute <i s c l c , ARD1, MFDL, AFDL, TFDL>; }

[ 9 2 7 ] compute <i s c l c , SILL>; }

[ 9 3 1 ] compute <i s c l c , CMDL>; [ 9 3 1 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 3 3 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD3, ARD1, SFDL , CMDL>;

[ 9 3 9 ] c a s e (<i s c l c , IFBR>) { 0 . 5 :

[ 9 3 9 ] compute <i s c l c , ARD1, MFDL, AFDL, TFDL>; }

[ 9 4 1 ] compute <i s c l c , CMDL>; [ 9 4 1 ] c a s e (<i s c l c , IFBR>) {

0 . 5 :

[ 9 4 1 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL

, TFDL>; }

[ 9 4 2 ] compute <i s c l c , SILL>; }

[ 9 4 5 ] compute <i s c l c , 4∗TFDL , ARD1, 2∗ARD3, 2∗AILL , 2∗TILL>;

} }

[ 9 5 6 ] compute <i s c l c , SILL>; [ 9 5 6 ] loop (<i s c l c , LFOR>, i t ) {

[ 9 5 6 ] compute <i s c l c , CMLL, 2∗ARD3, 2∗ARD1, MFDL, AFDL , TFDL , INLL>;

}

[ 9 5 9 ] compute <i s c l c , SILL>; [ 9 5 9 ] loop (<i s c l c , LFOR>, nm−1) { [ 9 5 9 ] compute <i s c l c , CMLL, SILL>; [ 9 6 1 ] loop (<i s c l c , LFOR>, i t ) {

[ 9 6 1 ] compute <i s c l c , CMLL, 3∗ARD3, 2∗ARD1, 2∗MFDL , AFDL, TFDL , INLL>;

}

[ 9 5 9 ] compute <i s c l c , INLL>; }

[ 9 6 7 ] c a s e (<i s c l c , IFBR>) { do dsa :

[ 9 7 0 ] compute <i s c l c , SILL>; [ 9 7 0 ] loop (<i s c l c , LFOR>, i t ) {

[ 9 7 0 ] compute <i s c l c , CMLL, 8∗ARD3, 4∗ARD1, 3∗MFDL , 3∗AFDL, 3∗TFDL , INLL>;

} }

[ 9 8 1 ] compute <i s c l c , ARD3, TFDL , INLL>;

[ 9 8 7 ] compute <i s c l c , 2∗POL1 , AILL , TILL , INLL>; }

}

} (∗ End o f work ∗) . . . .

D

Wavefront Model and Extensions

D.1

Model Parameters

Table4.1Plug-and-Play Reusable Model Application Parameters

Parameter LU Sweep3D Chimaera

Nx, Ny, Nz Inputsize Inputsize Inputsize

Wg measured measured measured

Wg,pre measured 0 0

Htile(cells) 1 mk×mmi/mmo 1

nsweeps 2 8 8

nf ull 2 2 4

ndiag 0 2 2

Tnonwavef ront Tstencil+δh 2Tallreduce+δh Tallreduce+δh

M essageSizeEW 40Ny/m 8Htile×#angles 8Htile×#angles

(Bytes) ×Ny/m ×Ny/m

M essageSizeN S 40Nx/m 8Htile×#angles 8Htile×#angles

(Bytes) ×Nx/m ×Nx/m

Related documents