• No results found

Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2

N/A
N/A
Protected

Academic year: 2021

Share "Performance Comparison of ISV Simulation Codes on Microsoft Windows HPC Server 2008 and SUSE Linux Enterprise Server 10.2"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Performance Comparison of ISV Simulation Codes on

Microsoft Windows HPC Server 2008 and

SUSE Linux Enterprise Server 10.2

Karsten Reineck und Horst Schwichtenberg 31.03.2009

(2)

The Fraunhofer* Society

Founded in 1949, non-profit organization

Focus on application-oriented basic and industrial research

57 research institutes throughout Germany

Staff of approx. 12.500 people, majority of qualified scientists and engineers

Annual research volume around 1 billion euro

*Joseph von Fraunhofer (1787-1826) Researcher, inventor and entrepreneur

(3)

Benchmark – Introduction

ISV defined test cases

CFX: Internal flow through a flow channel with 5 to 20 million elements

FLUENT: External flow over a truck body of around 14 million cells

The same test cases (problems) are solved on equal hardware on Windows and Linux

14 million cells

LS-DYNA: „Neon-Refined Crash Test simulation“ (frontal crash with initial speed at 31.5 mph)

PAMCRASH: Front crash of a Neon car with 1 million cells

(4)

Benchmark – ISV Simulation Software

SIMULIA Abaqus/Standard: “implicit solutions and a range of contact

and nonlinear material options for static, dynamics, thermal, and multiphysics analyses”, Abaqus/Explicit: the explicit method for high-speed, nonlinear, transient response and multi-physics applications.”

ANSYS“CFX is a powerful and flexible general-purpose

computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity.”

ANSYSFLUENT is a powerful and flexible general-purpose

computational fluid dynamics (CFD) package used for engineering computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity.”

DYNAMORELS-DYNA is a multi-purpose, explicit and implicit finite

element program used to analyze linear and nonlinear static and dynamic behavior of physical procedures.

ESI Group “PAM-CRASH is the most widely used crash simulation

(5)

Benchmark – Hardware

Hardware

“Twin Servers“ with Supermicro X7DWT main boards and quad core CPUs

Attention when only one local node is involved: How does the scheduler attach 4 processes on 8 cores?

Node

Core 1

Core 2

Core 3

Core 4

CPU 2

Core 1

Core 2

Core 3

Core 4

CPU 1

(6)
(7)

6 8 10 12 14 ru n t im e i n m in u te s

Infiniband

Windows Linux 6 8 10 12 14 ru n t im e i n m in u te s

Ethernet

Windows Linux

CFX Benchmark – Results

15 20 25 30 35 ru n t im e i n m in u te s

Local

Windows Linux 0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better –

-9

%

-9

%

1

%

1

%

-1

7

%

-1

8

%

3

%

0 5 10 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes

4

5

%

(8)

6 8 10 12 ru n t im e i n m in u te s

Ethernet

Windows Linux 6 8 10 12 ru n t im e i n m in u te s

Infiniband

Windows Linux

FLUENT Benchmark – Results

8 10 12 14 16 18 20 ru n t im e i n m in u te s

Local

Windows Linux 0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better –

1

0

%

9

%

7

%

1

0

%

7

%

9

%

1

0

%

5

%

7

%

7

%

0 2 4 6 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes

3

4

%

1

2

%

(9)

15 20 25 30 35 40 ru n t im e i n m in u te s

Ethernet

Windows Linux

LS-DYNA Benchmark – Results (single precision)

15 20 25 30 35 40 ru n t im e i n m in u te s

Infiniband

Windows Linux 30 40 50 60 ru n t im e i n m in u te s

Local

Windows Linux 0 5 10 15 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better – 0 5 10 15 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

4

%

3

%

-1

%

3

3

%

2

0

%

5

%

7

%

1

7

%

7

%

8

%

0 10 20 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes

2

2

%

1

%

(10)

300 400 500 600 700 800 ru n t im e i n m in u te s

Infiniband

Windows Linux

PAM-CRASH Benchmark – Results

600 800 1000 1200 ru n t im e i n m in u te s

Local

Windows Linux 0 100 200 300 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better – 0 200 400 4-1 8-1 ru n t im e i n m in u te s

number of processes - number of nodes

2

6

%

1

%

3

%

(11)

15 20 25 30 ru n t im e i n m in u te s

Infiniband

Windows Linux 15 20 25 30 ru n t im e i n m in u te s

Ethernet

Windows Linux

Abaqus/Explicit Benchmark – Results

25 30 ru n t im e i n m in u te s

Local

Windows Linux The scheduler pauses

the jobs in Windows after about 12 minutes

because there are not enough available cores.

0 5 10 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

0 5 10 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better –

1

8

%

1

4

%

1

2

%

2

9

%

4

2

%

1

4

%

2

4

%

3

1

%

1

8

%

1

8

%

0 5 10 15 20 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes Linux

(12)

30 40 50 60 ru n t im e i n m in u te s

Ethernet

Windows Linux 30 40 50 60 ru n t im e i n m in u te s

Infiniband

Windows Linux

Abaqus/Standard Benchmark – Results

100 120 140 ru n t im e i n m in u te s

Local

The scheduler pauses the jobs in Windows after about 12 minutes

because there are not enough available cores.

0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

– lower numbers (a lower run time) is better –

1

7

%

1

8

%

1

8

%

3

3

%

3

1

%

5

%

1

2

%

7

%

1

3

%

2

0

%

0 20 40 60 80 100 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes Windows Linux

(13)

Abaqus/Standard – New Version 6.8-4

During the benchmark the Abaqus beta version 6.8-4 has been released for Windows

Some issues for Windows were resolved

Abaqus 6.8-4 has a performance improvement of about 30% in our scenarios 30 40 50 60 ru n t im e i n m in u te s

Ethernet

Windows 6.8-2 Windows 6.8-4

lower numbers (a lower run time) is better 0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s

number of processes - number of nodes

2

8

%

3

8

%

2

6

%

3

0

%

2

9

%

(14)

Conclusion – Deviation from Linux to Windows (=0%)

Deviation

CFX

4%

Deviation

Infiniband

7%

Ethernet

13%

Local

18%

-10% -5% 0% 5% 10% 15% 20% 25% CFX FLUENT Abaqus Explicit

CFX

4%

FLUENT

13%

LS-DYNA

11%

PAM-CRASH

11%

Abaqus Standard

17%

Abaqus Explicit

22%

LS-DYNA PAM-CRASH Abaqus Standard

(15)

„Open MS HPC“ Portal

„Open MS HPC“ Portal

„Open MS HPC“ Portal

„Open MS HPC“ Portal

Porting Open Source HPC Software to Microsoft Windows Platforms

Portal for open source software developed and ported by Fraunhofer-SCAI (Elmer, OpenFOAM)

In the future: uploads and downloads of YOUR open source software

Best practises

(16)

Thanks for your attention!

(17)

Appendix: Cluster Configuration – Hardware and Network

Head Node MICRO-STAR MS-9172-01S

2x Intel Xeon E5330 @ 2.13GHz (ES) 4 GB FB-DDR2 RAM

2x 1000 Mbps LAN

Mellanox ConnectX (MT25418) Infiniband DDR Channel Adapter Compute

Nodes

Supermicro X7DWT

2x Intel Xeon E5472 @ 3.00GHz (Quad Core) 16 GB FB-DDR2 RAM

2x 1000 Mbps LAN

Mellanox ConnectX (MT26418) Infiniband, 20Gbps PCI-E 2.0 (onboard)

Network Hardware

Switch 1: HP procurve switch 2724 J4897A, 1000 Mbps Ethernet, 24 ports

Switch 2: Extreme Networks Summit X450-24t, 1000 Mbps Ethernet, 24 ports

Switch 3: Voltaire ISR-9024D_M 24 4x DDR, Infiniband, 24 ports Network

Configuration

Network 1: 1GBit/s Ethernet (management) Network 2: 1GBit/s Ethernet (MPI)

(18)

Appendix: Operating Systems and ISV Softwares

Windows Windows Server HPC Edition, Build 6001, 64 bit Server Manager Version 6.0.6001.18078

HPC Cluster Manager Version 2.0.1551.0 Infiniband Driver: Mellanox Version 1.4.1.3223 Linux SUSE Linux Enterprise Server 10.2

Kernel 2.6.16.60-0.21, libc 2.9 Infiniband: OFED Version 1.3.1

Abaqus 6.8-2

6.8-4 (for Windows only) Fluent 12.07 beta

Fluent 12.07 beta

CFX 11 SP1 with Arch detect fix for Quad core CPUs Pamcrash v2008.0 with modified pamworld on Linux LS-Dyna 971_R3.2.1 double precision

Linux partitioning setup

Device Boot Start End Blocks Id File System /dev/sdb1 * 1 26 208813+ 83 Ext3 (/boot) /dev/sdb2 27 26135 209720542+ 83 XFS (/scratch) /dev/sdb3 26136 30313 33559785 82 swap

(19)

3. Treffen

3. Treffen

der deutschsprachigen Windows HPC Benutzergruppe

der deutschsprachigen Windows HPC Benutzergruppe

8.

8.--9. März 2010

9. März 2010

im Institutszentrum „Schloss Birlinghoven“

im Institutszentrum „Schloss Birlinghoven“

der Fraunhofer

der Fraunhofer--Gesellschaft in St. Augustin bei Bonn

Gesellschaft in St. Augustin bei Bonn

www.izb.fraunhofer.de

References

Related documents