• No results found

Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures

N/A
N/A
Protected

Academic year: 2021

Share "Scalable Distributed Schur Complement Solvers for Internal and External Flow Computations on Many-Core Architectures"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Scalable Distributed Schur Complement Solvers

for Internal and External Flow Computations

on Many-Core Architectures

Dr.-Ing. Achim Basermann, Dr. Hans-Peter Kersken, Melven Zöllner**

German Aerospace Center (DLR)

Simulation- and Software Technology

Distributed Systems and Component Software

Porz-Wahnheide, Linder Höhe, D-51147 Cologne, Germany

[email protected]

(2)

Survey

Internal and external flow computations at DLR

Storage Schemes for sparse matrices

The Distributed Schur Complement method (DSC)

Experiments with TRACE and TAU matrices

(3)

Parallel Simulation System TRACE

TRACE: Turbo-machinery Research

Aerodynamic Computational Environment

Developed by the Institute for Propulsion Technology of the German Aerospace Center (DLR-AT)

Calculates internal turbo-machinery flows

Finite volume method with block-structured grids

The linearized TRACE modules require the parallel, iterative solution with preconditioning of large, sparse, non-symmetric real or

(4)

TAU: developed for the aerodynamic design of aircrafts by the DLR Institute of Aerodynamics and Flow Technology

Unstructured RANS solver (Reynolds-averaged Navier-Stokes), exploits finite volumes

Requires the parallel, iterative solution with preconditioning of large, sparse, real, non-symmetric systems of linear equations

Solvers used: geometric Multigrid, AMG preconditoned GMRes Here: experiments with DSC methods

(5)

Storage Schemes for Sparse Matrices

TRACE and TAU apply BCSR with 5x5 blocks. Avantage: less indirect addressing

Disadvantage: A few zeros are stored.

Compressed Row Storage (CSR) and Block Compressed Row Storage (BCSR)

Matrix:

Non-zero values, row-wise:

Column indices, row-wise:

(6)

DSC Method (1)

Distributed matrix,

2 processors

(7)

DSC Method (2)

DSC Algorithm

Schematic view on

each processor

BiCGstab or GMRes iteration for the local interface rows (unknowns)

(8)

DSC Method (3)

(9)

DSC Method: Strong Scaling (CSR, real)

(

Dual-processor nodes; Quad-Core Intel Harpertown; 2.83 GHz

)

1 10 100 1000 8 16 32 64 128 Processor cores T im e i n seco n d s

(10)

DSC Method: CSR versus BCSR Format (real)

(

2x Intel XEON E5520 with 4 cores each, 2.26 GHz

)

102 19 29 20 9 121 0 20 40 60 80 100 120 Ti m e i n s e c onds dsc2011, bs=1 dsc2011, bs=5 TAU matrix: n=541,980; nz=170,610,950; threshold = 10-3; |rel. residual| < 10-7 Results on 8 cores

(11)

DSC Method: Strong Scaling (CSR, complex)

(

Dual-processor nodes; Quad-Core Intel Harpertown; 2.83 GHz

)

1 10 100 1000 10000 4 8 16 32 64 128 192 Processor cores Ti m e i n s e c ond s TRACE matrix THD (n=378,400; nz=45,456,500;

threshold = 10-3; |rel. residual| < 10-5)

100 1000 10000 32 64 128 192 Processor cores T im e i n se co n d s

TRACE matrix UHBR

(n=4,497,520; nz=552,324,700;

(12)

linearTRACE matrix

(8 processes, dim = 56,240, nnz = 2.6 Mio)

0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5

total ilut construction solver iteration

T ime in s e c o n d s real real blocked (bs=5) complex complex blocked (bs=5) # 76 # 40 # 32 # 29

DSC Solver: CSR versus BCSR Format

(13)

dsc2011 solver for linearTRACE

(8 processes, test case "THD stator": dim = 0.8 Mio, nnz = 90 Mio)

0 10 20 30 40 50 60 70 T im e in s e c o n d

s trace (setup matrix etc)

solver iteration

prec. preparation (ilut) # 140

iterations

# 57 iterations

(14)

Conclusions

Complex computations distinctly faster than real ones

BCSR format application significantly outperforms CSR format

application for TRACE and TAU problems.

DSC method distinctly faster than block-local methods

DSC method very suitable for TRACE and TAU problems

(15)

Questions?

Questions?

(16)

DLR

German Aerospace Center

Research Institution Space Agency

(17)

KoelnLampoldshausenStuttgartOberpfaffenhofen Braunschweig Goettingen Berlin- Bonn Trauen  HamburgNeustrelitz WeilheimBremen-

Locations and employees

Germany: 6,900 employees across 33 research institutes and

facilities at

 15 sites.

Offices in Brussels, Paris and Washington.

(18)

DSC Method: Effect of the Interface Iteration (real)

(

2x Intel XEON E5520 with 4 cores each, 2.26 GHz

)

TAU matrix: n=541,980; nz=170,610,950; threshold = 10-3; |rel. residual| < 10-7 Results on 8 cores 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9 10 S o lv er i ter at io n t im e i n seco n d s interf-bicgstab, bs=1 interf-gmres, bs=1 interf-bicgstab, bs=5 interf-gmres, bs=5

References

Related documents

We considered using the periodogram, the spectrogram and the Morlet wavelet scalogram to cover a diversity of time- frequency samplings (see Fig 1); the Wigner-Ville

A Snap+ Recognition Order training video will be available for Volunteers on Smart Cookie U by

The Swiss fiscal rule at the federal level, the “debt brake”, effectively corrects the budget process in a way that is compatible with the principles of debt stabilization

The retailer may choose to implement a single AIRS application tightly integrated with their legacy systems at the store or enterprise level, or they may choose a more

Based on the group of node failure information and available actor position, base station will calculate the group connectivity and distance b/w each member of group

Sakwa (2006), also assessed the skill of the High Resolution Regional Model (HRM) in simula- tion of airflow and rainfall over East Africa, found that the model was able to

Figure 4.20: An photograph of the clock and DC power distribution box located on the back of the array panel. The DC cables are shown on the right side of the box while the 80-MHz