Clusters: Mainstream Technology for CAE
Alanna Dwyer
HPC Division, HP
2
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Linux and Clusters Sparked a
Revolution in High Performance
Computing!
•
Supercomputing performance now affordable and accessible
•
Linux enabled the use industry-standard technologies
•
Many more users and new applications
•
Cluster growth rate is over 50% per year! (volume is half of HPC)
•
Now a critical resource in meeting today’s CAE challenges
•
Increasingly complex CAE analysis demands more
•
larger models; more jobs to run; longer runs
•
Market is responding, adding enterprise RAS features to clusters
•
Treating CLUSTERS like PRODUCTS, not custom deployments
•
Integration with large SMP systems allows one to optimize resource
deployment
Why cluster?
•
Budget:
•
Price-performance (+10 GFLOPs system today < $4K)
•
Scale beyond practical SMP limits
•
Faster time to market and profit, improved insights
•
Resource consolidation
•
Centralized management, optimize utilization
•
Clusters aren’t just for compute engines
•
Can apply same principles to file systems and visualization
4
4
Application Experience
Application Experience
–
–
User Application (Courtesy of NTUST)
User Application (Courtesy of NTUST)
•A large-scale FE model (nonlinear continuum mechanics)
•Computing time of 80 days was necessary with 1-CPU in year 2000
•14 processors of AMD Athlon 1600+ with Myrinet Æ 67 hours
•96 processor cores of HP Opteron 270 at NTUST cluster < 12 hours
•A home-made application ported in less than a day.
SMP vs. Cluster (farm) Example
MSC.Nastran: XLTDF Com parison
0
1000
2000
3000
4000
5000
6000
7000
1
2
4
Number of processes
T
o
tal
el
ap
sed
ti
m
e
Integrity rx5670 4 way SMP
Integrity rx2620 - 2 node cluster
ProLiant DL145 G2 - 2 node cluster
6
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
CAE Application Sub-Segments
CAE Domain:
Pre/Post
Structures
Impact
Fluids
Parallelized
Serial (SMP*)
SMP (MPI*)
1 – 4 (8*) cores
Integrity
SMP or Farm30%
30%
MPI
Job Scalability
32 – 64 GB
MPI
2 – 16 (32*) cores
X64 Cluster
4 – 128 (256*) cores
Typical Solution
Workstation or
SMP server
X64 Cluster
60%
20%
CPU cycles – Auto
CPU cycles – Aero
All jobs
10%
50%
HPC Cluster
Implementation challenges
•
System and workload
management
•
Scalable performance
•
Scalable data management
•
Interconnect/Network
Complexity
•
Application availability and
scalability
•
Power and cooling
8
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Latest Advancements in Clustering
•
Multi-core delivering continued price-performance
improvements
•
Improvements in clustering software and tools
•
More applications are being developed and tuned to
leverage cluster/DMP solutions
•
Principles of compute clusters being applied to storage
and visualization
•
InfiniBand now established in HPC
•
Solutions now coming to market that address power
10
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Applications:
ISVs standardizing on HP-MPI
Powerful Solver Technology
AMLS
Molpro
University of Cardiff
“One of the top reasons that we went with HP-MPI is that we've had a great working relationship with HP. It was a win-win for ANSYS, HP and our customers - in terms of cost, interconnects, support and performance compared to other message passing interfaces for Linux and Unix. In addition, I've always had great turnaround from HP in response to hardware and software issues.”
Lisa Fordanich, Senior Systems Specialist, ANSYS www.ansys.com/services/ss-interconnects.htm “HP-MPI is an absolute godsend,” notes Keith Glassford, director of the Materials Science division at San Diego, CA-based Accelrys Software Inc. “It allows us to focus our energy and resources on doing what we’re good at, which is developing scientific and engineering software to solve customer problems.”
CAE Reference Architecture
compute
clusters
compute
clusters
compute
clusters
compute
clusters
direct attached Disk Array (or use SFS)Client Workstations
Front End HA
job scheduler
pre/post
SMP
compute
SMPs
compute
clusters
compute
clusters
visualization
cluster
Scalable
File Share
meta data
object data
LAN
InfiniBand switched fabric interconnect
Remote Workstations
12
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
A Cluster Alternative to Direct Attached
Storage: HP Scalable File Share (SFS)
•
Applying principles of clusters to file systems and storage
enables the sharing of data sets without performance penalty
•
MSC.Nastran is Fast on HP SFS:
Replace extra-disk fat-nodes with flexible storage
•
Traditional approach:
•
Special nodes in the cluster w/ multiple local JBOD disks
•
Expensive and hard to manage
•
New approach
•
Use fast centralized, virtualized HP SFS filesystem
•
Similar performance
•
Lower cost
•
Shared rather than dedicated storage
•
Easier to use
•
Any node in the cluster can run Nastran
MSC.Nastran Benchmark XXCMD
•
Standard MSC benchmark
•
XXCMD: solution of the natural
frequencies of an automotive body
•
Performs a medium amount of I/O
compared to industry real-life
customer datasets (4 TB of I/O with
blocksize of 256 KB)
•
Multiple jobs running simultaneously:
no shared data
•
Customers typically use direct
attached storage for each host
•
1 controller and 5 drives per job are
recommended for good throughput
•
SFS performance
•
1 Object Storage Server node and
4 enclosures (with array of SATA
drives) for every 4 hosts achieved
excellent performance
•
No degradation for up to 16 hosts,
and small degradation from 16 to
32 hosts
•
Significant (~6 times) advantage vs.
small SCSI configuration
MSC.Nastran benchmark XXCMD - performs medium I/O
(small is better)
0 20000 40000 60000 80000 100000 120000 140000 tim e ( se c ) SFS 2 jobs per hostMSA 2 jobs per host
SCSI 2 jobs per host
14
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Key Considerations in Designing a
Solution
•
What processor and interconnect for the mix of jobs
•
Centralized resource or single purpose systems
•
Can applications co-exist?
•
Economics of consolidation
•
Environmentals: power, cooling, weight, space
•
Roll your own system or acquire a total solution
•
Production scalability requirements
•
Performance
•
Availability and Reliability
•
Manageability (provisioning, booting, monitoring, upgrades)
www.hp.com/go/hptc
For more information see
Cluster Platform Express:
www.hp.com/go/cp-express
16
MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California
Implementations of
CAE Reference Architecture:
AMD Opteron example
Opteron Workstation
for Pre/Post
XW9300
2 Dual Core Opteron 2.6 GHz CPUs 2 internal 146 GB drives
32 GB memory DVD
Fast
HP xw9300 Workstation
Opteron Server for Structural
Analysis
DL585
22U Rack with Factory integration 4 Dual Core Opteron CPUs 2 internal 146 GB drives 32 GB memory
MSA30 Dual Bus
Faster
ProLiant DL585 Server
with Disk Array
Fastest
CP 4000 Cluster
Opteron Cluster for CFD
and Impact Analysis
HP Cluster Platform 4000
compute cluster
42U Rack Sidewinder option DL385 head node for cluster
administration
DL145G2 with two Dual Core Opteron CPUs, each with 1 internal drive and 4 GB memory (1GB/core) DL585 front end node with
64GB for grid generation and domain decompositioin XC Software Operating
Environment support