Chapter 8 Conclusions and Future Work
B.2 Sweep3D Validations
Table B.6:Sweep3D Model Validation on Jaguar (Cray XT4) -10003total problem size,
Htile= 2,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
1K 32 32 32 32 3.16E−7 3.69E−7 36.01 38.56 -6.62 34.99 1.03 2K 64 32 16 32 3.71E−7 4.26E−7 21.78 24.98 -12.81 20.78 1.00
4K 64 64 16 16 3.71E−7 4.26E−7 11.78 13.36 -11.83 10.79 0.99
8K 128 64 8 16 4.14E−7 5.00E−7 7.34 8.43 -12.87 6.42 0.92
Table B.7:Sweep3D Model Validation on Jaguar (Cray XT4) -20×106total problem size, Htile= 2,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
1K 32 32 9 9 3.58E−7 3.89E−7 1.23 1.38 -10.68 0.99 0.24
2K 64 32 5 9 4.33E−7 4.78E−7 0.97 1.11 -13.05 0.72 0.25
4K 64 64 5 5 4.40E−7 5.00E−7 0.73 0.94 -22.41 0.47 0.26
8K 128 64 3 5 6.00E−7 6.33E−7 0.68 0.90 -23.85 0.41 0.27
Table B.8:Sweep3D Model Validation on Jaguar (Cray XT4) -5×5×400per processor problem size,Htile = 5,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
4 2 2 5 5 4.64E−7 5.19E−7 0.62 0.59 4.81 0.47 0.15 8 4 2 5 5 4.64E−7 5.19E−7 0.62 0.62 0.06 0.48 0.15 16 4 4 5 5 4.64E−7 5.19E−7 0.63 0.67 -6.1 0.48 0.15 32 8 4 5 5 4.64E−7 5.19E−7 0.64 0.67 -4.79 0.49 0.15 64 8 8 5 5 4.64E−7 5.19E−7 0.65 0.7 -6.16 0.5 0.16 128 16 8 5 5 4.64E−7 5.19E−7 0.67 0.71 -6.47 0.51 0.16 256 16 16 5 5 4.64E−7 5.19E−7 0.7 0.77 -9.17 0.53 0.16 1K 32 32 5 5 4.64E−7 5.19E−7 0.78 0.84 -7.35 0.6 0.18 2K 64 32 5 5 4.64E−7 5.19E−7 0.84 0.95 -11.49 0.65 0.19 4K 64 64 5 5 4.64E−7 5.19E−7 0.96 1.07 -10.95 0.74 0.21
Table B.9:Sweep3D Model Validation on Jaguar (Cray XT4) -14×14×255per processor problem size,Htile= 2.5,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
4 2 2 14 14 3.61E−7 3.69E−7 1.99 1.97 1.17 1.8 0.19
16 4 4 14 14 3.61E−7 3.69E−7 2.02 2.07 -2.54 1.83 0.2
64 8 8 14 14 3.61E−7 3.69E−7 2.08 2.19 -5.31 1.88 0.2
256 16 16 14 14 3.61E−7 3.69E−7 2.19 2.31 -5.27 1.98 0.21
1K 32 32 14 14 3.61E−7 3.69E−7 2.41 2.68 -10.22 2.19 0.22
Table B.10:Sweep3D Model Validation on Jaguar (Cray XT4) -20×20×1000per processor problem size,Htile = 5,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
4 2 2 20 20 3.91E−7 4.35E−7 16.29 16.55 -1.58 15.87 0.43
16 4 4 20 20 3.91E−7 4.35E−7 16.41 16.75 -2.01 15.98 0.43 64 8 8 20 20 3.91E−7 4.35E−7 16.65 16.96 -1.85 16.22 0.43
256 16 16 20 20 3.91E−7 4.35E−7 17.13 17.87 -4.14 16.69 0.44 1K 32 32 20 20 3.91E−7 4.35E−7 18.09 19.52 -7.33 17.63 0.46
Table B.11:Sweep3D Model Validation on Jaguar (Cray XT4) -45×45×1000per processor problem size,Htile = 5,mmi= 6
NPE n m Nx
n Ny
m Wg,nf Wg,f Pred Exec Error Compute Comm
(Sec) (Sec) (Sec) (Sec) (%) (Sec) (Sec)
32 8 4 45 45 3.32E−7 3.63E−7 68.77 69.28 -0.73 68.25 0.52 256 16 16 45 45 3.32E−7 3.63E−7 71.47 74.23 -3.72 70.93 0.54
512 32 16 45 45 3.32E−7 3.63E−7 72.82 75.08 -3.01 72.27 0.54 1K 32 32 45 45 3.32E−7 3.63E−7 75.51 78.33 -3.6 74.96 0.55
C
cflow
work
from
sweep.x
Listing C.1:sweep.x (∗ ∗ CHIP3S ∗ A p p l i c a t i o n C h a r a c t e r i s a t i o n Tool ∗ Source : sweep . c ∗ RUV Type : c l c ∗) . . . . . . . .proc cflow work { (∗ Defined a t sweep . c : 6 9 7 ∗) [ 6 9 7 ] compute <i s c l c , FCAL>;
[ 7 4 1 ] c a s e (<i s c l c , IFBR>) { do dsa :
[ 7 4 3 ] compute <i s c l c , AILL , TILL , SILL>; [ 7 4 5 ] loop (<i s c l c , LFOR>, mmi) {
[ 7 4 5 ] compute <i s c l c , CMLL, AILL , TILL , SILL>; [ 7 4 9 ] loop (<i s c l c , LFOR>, nk ) {
[ 7 4 9 ] compute <i s c l c , CMLL, AILL>; [ 7 5 1 ] compute <i s c l c , AILL>; [ 7 5 1 ] c a l l cflow s i g n ;
[ 7 5 1 ] compute <i s c l c , TILL , SILL>; [ 7 5 3 ] loop (<i s c l c , LFOR>, i t ) {
[ 7 5 3 ] compute <i s c l c , CMLL, 3∗ARD3, ARD1, MFDL, AFDL , TFDL , INLL>;
}
[ 7 4 9 ] compute <i s c l c , INLL>; }
[ 7 4 5 ] compute <i s c l c , INLL>; }
}
[ 7 6 5 ] compute <i s c l c , SILL>; [ 7 6 5 ] loop (<i s c l c , LFOR>, mmi) {
[ 7 6 5 ] compute <i s c l c , CMLL, ARL1 , SILL , INLL>; }
[ 7 6 8 ] compute <i s c l c , SILL>;
[ 7 6 8 ] loop (<i s c l c , LFOR>, j t +nk−1+mmi−1) {
[ 7 6 8 ] compute <i s c l c , 4∗AILL , CMLL, SILL , TILL>; [ 7 7 2 ] loop (<i s c l c , LFOR>, mmi−1) {
[ 7 7 2 ] compute <i s c l c , CMLL, 3∗ARL1 , 2∗TILL , AILL , INLL>; }
[ 7 7 7 ] compute <i s c l c , 2∗AILL>; [ 7 7 7 ] c a l l cflow min ;
[ 7 7 7 ] c a l l cflow min ; [ 7 7 7 ] c a l l cflow min ; [ 7 7 7 ] c a l l cflow max ;
[ 7 7 7 ] compute <i s c l c , 2∗ARL1 , 2∗TILL , AILL , 2∗SILL>; [ 8 0 0 ] loop (<i s c l c , LFOR>, ndiag ) {
[ 8 0 4 ] loop (<i s c l c , LFOR>, mmi−1) {
[ 8 0 4 ] compute <i s c l c , 2∗AILL , CMLL, ARL1 , TILL , INLL>; }
[ 8 1 1 ] compute <i s c l c , 2∗TILL , 3∗AILL>; [ 8 1 3 ] c a l l cflow min ;
[ 8 1 3 ] compute <i s c l c , AILL>; [ 8 1 3 ] c a l l cflow s i g n ;
[ 8 1 3 ] compute <i s c l c , TILL , 3∗AILL>; [ 8 1 4 ] c a l l cflow max ;
[ 8 1 4 ] compute <i s c l c , AILL>; [ 8 1 4 ] c a l l cflow s i g n ;
[ 8 1 4 ] compute <i s c l c , 3∗TILL , 2∗AILL , ABSI , 5∗ARD1, 2∗MFDL , 4∗TFDL , ARD3, SILL>;
[ 8 4 0 ] loop (<i s c l c , LFOR>, i t ) {
[ 8 4 0 ] compute <i s c l c , CMLL, ARD3, ARD1, TFDL , INLL>; }
[ 8 4 2 ] compute <i s c l c , SILL>; [ 8 4 2 ] loop (<i s c l c , LFOR>, nm−1) { [ 8 4 2 ] compute <i s c l c , CMLL, SILL>; [ 8 4 4 ] loop (<i s c l c , LFOR>, i t ) {
[ 8 4 4 ] compute <i s c l c , CMLL, 2∗ARD1, 2∗ARD3, MFDL, AFDL , TFDL , INLL>;
}
[ 8 4 2 ] compute <i s c l c , INLL>; }
[ 8 4 8 ] c a s e (<i s c l c , IFBR>) { (−i f i x u p s )/(−e p s i ) :
[ 8 5 5 ] compute <i s c l c , TILL>; [ 8 5 5 ] loop (<i s c l c , LFOR>, i t ) {
[ 8 5 5 ] compute <i s c l c , 4∗CMLL, 3∗ANDL, 8∗ARD1, 8∗MFDL , 9∗TFDL , 7∗ARD3, 9∗AFDL, DFDL, AILL , TILL>; }
1−((−i f i x u p s )/(−e p s i ) ) : [ 8 8 1 ] compute <i s c l c , TILL>; [ 8 8 1 ] loop (<i s c l c , LFOR>, i t ) {
[ 8 8 1 ] compute <i s c l c , 4∗CMLL, 3∗ANDL, 7∗ARD1, 8∗MFDL , 8∗TFDL , 5∗ARD3, 9∗AFDL, DFDL, SILL , CMDL>; [ 9 0 2 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 0 4 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD1, SFDL , CMDL>;
[ 9 1 0 ] c a s e (<i s c l c , IFBR>) { 0 . 5 :
[ 9 1 0 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL
, TFDL>; }
[ 9 1 2 ] compute <i s c l c , CMDL>; [ 9 1 2 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 1 2 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL
, TFDL>; }
[ 9 1 3 ] compute <i s c l c , SILL>; }
[ 9 1 6 ] compute <i s c l c , CMDL>; [ 9 1 6 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 1 8 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD3, ARD1, SFDL , CMDL>;
0 . 5 :
[ 9 2 4 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL
, TFDL>; }
[ 9 2 6 ] compute <i s c l c , CMDL>; [ 9 2 6 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 2 6 ] compute <i s c l c , ARD1, MFDL, AFDL, TFDL>; }
[ 9 2 7 ] compute <i s c l c , SILL>; }
[ 9 3 1 ] compute <i s c l c , CMDL>; [ 9 3 1 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 3 3 ] compute <i s c l c , 2∗AFDL, 4∗TFDL , DFDL, 3∗MFDL , ARD3, ARD1, SFDL , CMDL>;
[ 9 3 9 ] c a s e (<i s c l c , IFBR>) { 0 . 5 :
[ 9 3 9 ] compute <i s c l c , ARD1, MFDL, AFDL, TFDL>; }
[ 9 4 1 ] compute <i s c l c , CMDL>; [ 9 4 1 ] c a s e (<i s c l c , IFBR>) {
0 . 5 :
[ 9 4 1 ] compute <i s c l c , ARD1, MFDL, ARD3, AFDL
, TFDL>; }
[ 9 4 2 ] compute <i s c l c , SILL>; }
[ 9 4 5 ] compute <i s c l c , 4∗TFDL , ARD1, 2∗ARD3, 2∗AILL , 2∗TILL>;
} }
[ 9 5 6 ] compute <i s c l c , SILL>; [ 9 5 6 ] loop (<i s c l c , LFOR>, i t ) {
[ 9 5 6 ] compute <i s c l c , CMLL, 2∗ARD3, 2∗ARD1, MFDL, AFDL , TFDL , INLL>;
}
[ 9 5 9 ] compute <i s c l c , SILL>; [ 9 5 9 ] loop (<i s c l c , LFOR>, nm−1) { [ 9 5 9 ] compute <i s c l c , CMLL, SILL>; [ 9 6 1 ] loop (<i s c l c , LFOR>, i t ) {
[ 9 6 1 ] compute <i s c l c , CMLL, 3∗ARD3, 2∗ARD1, 2∗MFDL , AFDL, TFDL , INLL>;
}
[ 9 5 9 ] compute <i s c l c , INLL>; }
[ 9 6 7 ] c a s e (<i s c l c , IFBR>) { do dsa :
[ 9 7 0 ] compute <i s c l c , SILL>; [ 9 7 0 ] loop (<i s c l c , LFOR>, i t ) {
[ 9 7 0 ] compute <i s c l c , CMLL, 8∗ARD3, 4∗ARD1, 3∗MFDL , 3∗AFDL, 3∗TFDL , INLL>;
} }
[ 9 8 1 ] compute <i s c l c , ARD3, TFDL , INLL>;
[ 9 8 7 ] compute <i s c l c , 2∗POL1 , AILL , TILL , INLL>; }
}
} (∗ End o f work ∗) . . . .
D
Wavefront Model and Extensions
D.1
Model Parameters
Table4.1Plug-and-Play Reusable Model Application Parameters
Parameter LU Sweep3D Chimaera
Nx, Ny, Nz Inputsize Inputsize Inputsize
Wg measured measured measured
Wg,pre measured 0 0
Htile(cells) 1 mk×mmi/mmo 1
nsweeps 2 8 8
nf ull 2 2 4
ndiag 0 2 2
Tnonwavef ront Tstencil+δh 2Tallreduce+δh Tallreduce+δh
M essageSizeEW 40Ny/m 8Htile×#angles 8Htile×#angles
(Bytes) ×Ny/m ×Ny/m
M essageSizeN S 40Nx/m 8Htile×#angles 8Htile×#angles
(Bytes) ×Nx/m ×Nx/m