Performance Comparison of ISV Simulation Codes on
Microsoft Windows HPC Server 2008 and
SUSE Linux Enterprise Server 10.2
Karsten Reineck und Horst Schwichtenberg 31.03.2009
The Fraunhofer* Society
Founded in 1949, non-profit organization
Focus on application-oriented basic and industrial research
57 research institutes throughout Germany
Staff of approx. 12.500 people, majority of qualified scientists and engineers
Annual research volume around 1 billion euro
*Joseph von Fraunhofer (1787-1826) Researcher, inventor and entrepreneur
Benchmark – Introduction
ISV defined test cases
CFX: Internal flow through a flow channel with 5 to 20 million elements
FLUENT: External flow over a truck body of around 14 million cells
The same test cases (problems) are solved on equal hardware on Windows and Linux
14 million cells
LS-DYNA: „Neon-Refined Crash Test simulation“ (frontal crash with initial speed at 31.5 mph)
PAMCRASH: Front crash of a Neon car with 1 million cells
Benchmark – ISV Simulation Software
SIMULIA Abaqus/Standard: “implicit solutions and a range of contact
and nonlinear material options for static, dynamics, thermal, and multiphysics analyses”, Abaqus/Explicit: “the explicit method for high-speed, nonlinear, transient response and multi-physics applications.”
ANSYS“CFX is a powerful and flexible general-purpose
computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity.”
ANSYS“FLUENT is a powerful and flexible general-purpose
computational fluid dynamics (CFD) package used for engineering computational fluid dynamics (CFD) package used for engineering simulations of all levels of complexity.”
DYNAMORE “LS-DYNA is a multi-purpose, explicit and implicit finite
element program used to analyze linear and nonlinear static and dynamic behavior of physical procedures.”
ESI Group “PAM-CRASH is the most widely used crash simulation
Benchmark – Hardware
Hardware
“Twin Servers“ with Supermicro X7DWT main boards and quad core CPUs
Attention when only one local node is involved: How does the scheduler attach 4 processes on 8 cores?
Node
Core 1
Core 2
Core 3
Core 4
CPU 2
Core 1
Core 2
Core 3
Core 4
CPU 1
6 8 10 12 14 ru n t im e i n m in u te s
Infiniband
Windows Linux 6 8 10 12 14 ru n t im e i n m in u te sEthernet
Windows LinuxCFX Benchmark – Results
15 20 25 30 35 ru n t im e i n m in u te sLocal
Windows Linux 0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te snumber of processes - number of nodes
0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
– lower numbers (a lower run time) is better –
-9
%
-9
%
1
%
1
%
-1
7
%
-1
8
%
3
%
0 5 10 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes4
5
%
6 8 10 12 ru n t im e i n m in u te s
Ethernet
Windows Linux 6 8 10 12 ru n t im e i n m in u te sInfiniband
Windows LinuxFLUENT Benchmark – Results
8 10 12 14 16 18 20 ru n t im e i n m in u te s
Local
Windows Linux 0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te snumber of processes - number of nodes
0 2 4 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
– lower numbers (a lower run time) is better –
1
0
%
9
%
7
%
1
0
%
7
%
9
%
1
0
%
5
%
7
%
7
%
0 2 4 6 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes3
4
%
1
2
%
15 20 25 30 35 40 ru n t im e i n m in u te s
Ethernet
Windows LinuxLS-DYNA Benchmark – Results (single precision)
15 20 25 30 35 40 ru n t im e i n m in u te s
Infiniband
Windows Linux 30 40 50 60 ru n t im e i n m in u te sLocal
Windows Linux 0 5 10 15 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te snumber of processes - number of nodes
– lower numbers (a lower run time) is better – 0 5 10 15 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
4
%
3
%
-1
%
3
3
%
2
0
%
5
%
7
%
1
7
%
7
%
8
%
0 10 20 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes2
2
%
1
%
300 400 500 600 700 800 ru n t im e i n m in u te s
Infiniband
Windows LinuxPAM-CRASH Benchmark – Results
600 800 1000 1200 ru n t im e i n m in u te s
Local
Windows Linux 0 100 200 300 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te snumber of processes - number of nodes
– lower numbers (a lower run time) is better – 0 200 400 4-1 8-1 ru n t im e i n m in u te s
number of processes - number of nodes
2
6
%
1
%
3
%
15 20 25 30 ru n t im e i n m in u te s
Infiniband
Windows Linux 15 20 25 30 ru n t im e i n m in u te sEthernet
Windows LinuxAbaqus/Explicit Benchmark – Results
25 30 ru n t im e i n m in u te s
Local
Windows Linux The scheduler pausesthe jobs in Windows after about 12 minutes
because there are not enough available cores.
0 5 10 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
0 5 10 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
– lower numbers (a lower run time) is better –
1
8
%
1
4
%
1
2
%
2
9
%
4
2
%
1
4
%
2
4
%
3
1
%
1
8
%
1
8
%
0 5 10 15 20 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes Linux30 40 50 60 ru n t im e i n m in u te s
Ethernet
Windows Linux 30 40 50 60 ru n t im e i n m in u te sInfiniband
Windows LinuxAbaqus/Standard Benchmark – Results
100 120 140 ru n t im e i n m in u te s
Local
The scheduler pauses the jobs in Windows after about 12 minutes
because there are not enough available cores.
0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
– lower numbers (a lower run time) is better –
1
7
%
1
8
%
1
8
%
3
3
%
3
1
%
5
%
1
2
%
7
%
1
3
%
2
0
%
0 20 40 60 80 100 4-1 8-1 ru n t im e i n m in u te s number of processes -number of nodes Windows LinuxAbaqus/Standard – New Version 6.8-4
During the benchmark the Abaqus beta version 6.8-4 has been released for Windows
Some issues for Windows were resolved
Abaqus 6.8-4 has a performance improvement of about 30% in our scenarios 30 40 50 60 ru n t im e i n m in u te s
Ethernet
Windows 6.8-2 Windows 6.8-4lower numbers (a lower run time) is better 0 10 20 4-4 8-8 16-2 32-4 64-8 ru n t im e i n m in u te s
number of processes - number of nodes
2
8
%
3
8
%
2
6
%
3
0
%
2
9
%
Conclusion – Deviation from Linux to Windows (=0%)
Deviation
CFX
4%
Deviation
Infiniband
7%
Ethernet
13%
Local
18%
-10% -5% 0% 5% 10% 15% 20% 25% CFX FLUENT Abaqus ExplicitCFX
4%
FLUENT
13%
LS-DYNA
11%
PAM-CRASH
11%
Abaqus Standard
17%
Abaqus Explicit
22%
LS-DYNA PAM-CRASH Abaqus Standard„Open MS HPC“ Portal
„Open MS HPC“ Portal
„Open MS HPC“ Portal
„Open MS HPC“ Portal
Porting Open Source HPC Software to Microsoft Windows Platforms
Portal for open source software developed and ported by Fraunhofer-SCAI (Elmer, OpenFOAM)
In the future: uploads and downloads of YOUR open source software
Best practises
Thanks for your attention!
Appendix: Cluster Configuration – Hardware and Network
Head Node MICRO-STAR MS-9172-01S
2x Intel Xeon E5330 @ 2.13GHz (ES) 4 GB FB-DDR2 RAM
2x 1000 Mbps LAN
Mellanox ConnectX (MT25418) Infiniband DDR Channel Adapter Compute
Nodes
Supermicro X7DWT
2x Intel Xeon E5472 @ 3.00GHz (Quad Core) 16 GB FB-DDR2 RAM
2x 1000 Mbps LAN
Mellanox ConnectX (MT26418) Infiniband, 20Gbps PCI-E 2.0 (onboard)
Network Hardware
Switch 1: HP procurve switch 2724 J4897A, 1000 Mbps Ethernet, 24 ports
Switch 2: Extreme Networks Summit X450-24t, 1000 Mbps Ethernet, 24 ports
Switch 3: Voltaire ISR-9024D_M 24 4x DDR, Infiniband, 24 ports Network
Configuration
Network 1: 1GBit/s Ethernet (management) Network 2: 1GBit/s Ethernet (MPI)
Appendix: Operating Systems and ISV Softwares
Windows Windows Server HPC Edition, Build 6001, 64 bit Server Manager Version 6.0.6001.18078
HPC Cluster Manager Version 2.0.1551.0 Infiniband Driver: Mellanox Version 1.4.1.3223 Linux SUSE Linux Enterprise Server 10.2
Kernel 2.6.16.60-0.21, libc 2.9 Infiniband: OFED Version 1.3.1
Abaqus 6.8-2
6.8-4 (for Windows only) Fluent 12.07 beta
Fluent 12.07 beta
CFX 11 SP1 with Arch detect fix for Quad core CPUs Pamcrash v2008.0 with modified pamworld on Linux LS-Dyna 971_R3.2.1 double precision
Linux partitioning setup
Device Boot Start End Blocks Id File System /dev/sdb1 * 1 26 208813+ 83 Ext3 (/boot) /dev/sdb2 27 26135 209720542+ 83 XFS (/scratch) /dev/sdb3 26136 30313 33559785 82 swap