Clusters: Mainstream Technology for CAE

(1)

Clusters: Mainstream Technology for CAE

Alanna Dwyer

HPC Division, HP

(2)

2

MSC.Software VPD Conference | July 17-19, 2006 | Huntington Beach, California

Linux and Clusters Sparked a

Revolution in High Performance

Computing!

• Supercomputing performance now affordable and accessible

• Linux enabled the use industry-standard technologies

• Many more users and new applications

• Cluster growth rate is over 50% per year! (volume is half of HPC)

• Now a critical resource in meeting today’s CAE challenges

• Increasingly complex CAE analysis demands more

• larger models; more jobs to run; longer runs

• Market is responding, adding enterprise RAS features to clusters

• Treating CLUSTERS like PRODUCTS, not custom deployments

• Integration with large SMP systems allows one to optimize resource

deployment

(3)

Why cluster?

• Budget:

• Price-performance (+10 GFLOPs system today < $4K)

• Scale beyond practical SMP limits

• Faster time to market and profit, improved insights

• Resource consolidation

• Centralized management, optimize utilization

• Clusters aren’t just for compute engines

• Can apply same principles to file systems and visualization

(4)

4

Application Experience

–

_–

User Application (Courtesy of NTUST)

•A large-scale FE model (nonlinear continuum mechanics)

•Computing time of 80 days was necessary with 1-CPU in year 2000

•14 processors of AMD Athlon 1600+ with Myrinet Æ 67 hours

•96 processor cores of HP Opteron 270 at NTUST cluster < 12 hours

•A home-made application ported in less than a day.

(5)

SMP vs. Cluster (farm) Example

MSC.Nastran: XLTDF Com parison

0 1000

2000

3000

4000

5000

6000

7000

1

2

4 Number of processes

T

o

tal

el

ap

sed

ti

m

e

Integrity rx5670 4 way SMP

Integrity rx2620 - 2 node cluster

ProLiant DL145 G2 - 2 node cluster

(6)

6

CAE Application Sub-Segments

CAE Domain:

Pre/Post

Structures

Impact

Fluids

Parallelized

**Serial (SMP*)**

**SMP (MPI*)**

**1 – 4 (8*) cores**

Integrity

SMP or Farm

30%

MPI

Job Scalability

32 – 64 GB

MPI

**2 – 16 (32*) cores**

X64 Cluster

**4 – 128 (256*) cores**

Typical Solution

Workstation or

SMP server

X64 Cluster

60%

20%

CPU cycles – Auto

CPU cycles – Aero

All jobs

10%

50%

(7)

HPC Cluster

Implementation challenges

• System and workload

management

• Scalable performance

• Scalable data management

• Interconnect/Network

Complexity

• Application availability and

scalability

• Power and cooling

(8)

8

Latest Advancements in Clustering

• Multi-core delivering continued price-performance

improvements

• Improvements in clustering software and tools

• More applications are being developed and tuned to

leverage cluster/DMP solutions

• Principles of compute clusters being applied to storage

and visualization

• InfiniBand now established in HPC

• Solutions now coming to market that address power

(9)

(10)

10

Applications:

ISVs standardizing on HP-MPI

Powerful Solver Technology

AMLS

Molpro

University of Cardiff

“One of the top reasons that we went with HP-MPI is that we've had a great working relationship with HP. It was a win-win for ANSYS, HP and our customers - in terms of cost, interconnects, support and performance compared to other message passing interfaces for Linux and Unix. In addition, I've always had great turnaround from HP in response to hardware and software issues.”

Lisa Fordanich, Senior Systems Specialist, ANSYS www.ansys.com/services/ss-interconnects.htm “HP-MPI is an absolute godsend,” notes Keith Glassford, director of the Materials Science division at San Diego, CA-based Accelrys Software Inc. “It allows us to focus our energy and resources on doing what we’re good at, which is developing scientific and engineering software to solve customer problems.”

(11)

CAE Reference Architecture

compute

clusters

compute

clusters

compute

clusters

compute

clusters

direct attached Disk Array (or use SFS)

Client Workstations

Front End HA

job scheduler

pre/post

_SMP

compute

_SMPs

compute

_clusters

compute

_clusters

visualization

cluster

Scalable

File Share

meta data

object data

LAN

InfiniBand switched fabric interconnect

Remote Workstations

(12)

12

A Cluster Alternative to Direct Attached

Storage: HP Scalable File Share (SFS)

• Applying principles of clusters to file systems and storage

enables the sharing of data sets without performance penalty

• MSC.Nastran is Fast on HP SFS:

Replace extra-disk fat-nodes with flexible storage

• Traditional approach:

• Special nodes in the cluster w/ multiple local JBOD disks

• Expensive and hard to manage

• New approach

• Use fast centralized, virtualized HP SFS filesystem

• Similar performance

• Lower cost

• Shared rather than dedicated storage

• Easier to use

• Any node in the cluster can run Nastran

(13)

MSC.Nastran Benchmark XXCMD

• Standard MSC benchmark

• XXCMD: solution of the natural

frequencies of an automotive body

• Performs a medium amount of I/O

compared to industry real-life

customer datasets (4 TB of I/O with

blocksize of 256 KB)

• Multiple jobs running simultaneously:

no shared data

• Customers typically use direct

attached storage for each host

• 1 controller and 5 drives per job are

recommended for good throughput

• SFS performance

• 1 Object Storage Server node and

4 enclosures (with array of SATA

drives) for every 4 hosts achieved

excellent performance

• No degradation for up to 16 hosts,

and small degradation from 16 to

32 hosts

• Significant (~6 times) advantage vs.

small SCSI configuration

MSC.Nastran benchmark XXCMD - performs medium I/O

(small is better)

0 20000 40000 60000 80000 100000 120000 140000 tim e ( se c ) SFS 2 jobs per host

MSA 2 jobs per host

SCSI 2 jobs per host

(14)

14

Key Considerations in Designing a

Solution

• What processor and interconnect for the mix of jobs

• Centralized resource or single purpose systems

• Can applications co-exist?

• Economics of consolidation

• Environmentals: power, cooling, weight, space

• Roll your own system or acquire a total solution

• Production scalability requirements

• Performance

• Availability and Reliability

• Manageability (provisioning, booting, monitoring, upgrades)

(15)

www.hp.com/go/hptc

For more information see

Cluster Platform Express:

www.hp.com/go/cp-express

(16)

16

Implementations of

CAE Reference Architecture:

AMD Opteron example

Opteron Workstation

for Pre/Post

XW9300

2 Dual Core Opteron 2.6 GHz CPUs 2 internal 146 GB drives

32 GB memory DVD

Fast

HP xw9300 Workstation

Opteron Server for Structural

Analysis

DL585

22U Rack with Factory integration 4 Dual Core Opteron CPUs 2 internal 146 GB drives 32 GB memory

MSA30 Dual Bus

Faster

ProLiant DL585 Server

with Disk Array

Fastest

CP 4000 Cluster

Opteron Cluster for CFD

and Impact Analysis

HP Cluster Platform 4000

compute cluster

42U Rack Sidewinder option DL385 head node for cluster

administration

DL145G2 with two Dual Core Opteron CPUs, each with 1 internal drive and 4 GB memory (1GB/core) DL585 front end node with

64GB for grid generation and domain decompositioin XC Software Operating

Environment support