• No results found

Cloud Computing for HPC

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing for HPC"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloud Computing

for HPC

Extending Clusters to Clouds

Solution Briefing

(2)
(3)

Platform Computing, Inc.

The leader in cluster, grid and cloud management software:

o

19 years of profitable growth

o

2,000 of the world’s most demanding client organizations

o

5,000,000 CPUs under management

o

500 professionals working across 13 global centers

o

Strategic relationships with Cray, Dell, Fujitsu, HP, IBM, Intel,

Microsoft, Red Hat and SAS

Platform

Clusters, Grids, Clouds

(4)

Product Leadership

Workload Management

Platform Computing

Clusters

Grids

Resource Management

“We believe Platform ISF is perhaps the most complete internal

cloud software solution we’ve seen so far,” Staten says.

Clouds

Platform HPC

Platform MPI

Platform LSF

Platform Symphony

Platform ISF

(5)

Cloud Computing

for HPC

(6)

Key Trends

Extending HPC to the cloud paradigm

Management, Analysis, and Reporting

Graphic Processing Units

ISV Application Integration

(7)

HPC Challenges

Users

need:

More grid resources to run

application faster

Flexible resources to support

multiple application

Lower cost per performance

IT

needs:

Contain costs without compromising

grid size and performance

Grid data security

Better meet their users needs

(8)

Optimally Making Use Of The Cloud

Send workload to the cloud

o

When workload queues exceed tolerable thresholds

for pending jobs

o

When more capacity is required to meet SLAs

o

To help small groups easily run their first cluster

enabled jobs

Not all workload is suitable for the cloud, including:

o

When data transfer required exceeds the acceptable

wait time

o

Data intensive computing applications

o

When cloud resources become unreliable or

unavailable

o

When privacy and/or security risks are too high

(9)

HPC Cloud - Differentiating the products

Cloud

Connect

On Demand

LSF, HPC, Symphony plugin

Schedules infrastructure in

multi-tenant environments

Provides for request fulfillment for

sandboxing

Creates external cloud connection

from local clusters

Accounts for infrastructure

consumed

Customer self service

Pay-per-use

Amazon EC2 only (no local)

Web access only

HPC Vertical specific (Life

Sciences offering now avail)

Applications pre-installed

(10)
(11)

Problem #1: Connecting to the cloud

Problem

• Customer workloads are spiky

• Provisioning for peak is highly wasteful (utilization)

• Relying on desktops or existing servers wastes user time and can

be very slow

Alternatives

• Take advantage of cloud resources by building their own solution

• Provision for peak (and live with the cost)

• Wait (wastes valuable engineering time, slow TTM)

Desired solution

• Provide a simple to use IaaS connection from an LSF Cluster

• Provide a simple policy engine to decide which jobs burst and

(12)

Problem #2: Only use what is needed

Problem

• IaaS providers usually charge by the instance-hour. In short bursts,

very cost effective. In long duration, expensive.

• Workload varies all the time. Cloud should only be used for peak

demand

Alternatives

• Open Source products: OpenNebula, Nimbus

• Competitive products: AdaptiveCloud & Unicloud

• In-house ELIM integrated with IaaS APIs

Desired solution

• LSF/HPC/Symphony Plugin architecture

• Automated flexup/flexdown based on pending jobs and TTL for idle

resources

(13)

Test/Dev

Private

Cloud

Java

Platform’s Product Line

Cloud

Extension

Workstation

Cluster

Extension

HPC Appl.

Integrations

Advanced

Analytics

EGO

GPU GPU GPU

Symphony

HPC

ISF

LSF

PCM

MPI

(14)

What is Platform HPC?

The easiest and most complete HPC cluster solution

• Feature-rich workload management

• Unified web portal for access anywhere

• Heterogeneous cluster management

Complete Product

• Easy to use job submission portal

• Customizable application templates

Integrated

Application

Support

• Certified with server, storage &

interconnect vendors

• Best customer support

Certified

Platform

(15)

What is Platform LSF?

The HPC Workload Management Standard

• Advanced, feature-rich workload scheduling

• Robust set of add-on features

• Integrated application support

Complete

• Policy & resource-aware scheduling

• Resource consolidation for max performance

• Advanced self-management

Powerful

• Support for thousands of concurrent users & jobs

• Delivers a virtualized pool of shared resources to

support multiple apps

• Flexible control to support multiple policy centers

Most Scalable

• Optimal utilization reduces infrastructure costs

• Improves user productivity for faster time to solution

• Robust operational capabilities improve administrative

productivity

(16)

Dynamic Resource Management

Separate applications from infrastructure by

creating an independent management platform . . .

What is Platform ISF?

. . . to achieve resource sharing, vendor

independence, and commodity computing

Application workloads

Private Cloud management platform

VM management

Provisioning

Server

Storage

network

(17)

Integrated Cluster with the Cloud

(Platform LSF with VPN cluster management)

1

Platform’s Cloud Solutions for HPC

Cloud Bursting

Making it easy to extend to the cloud

Multi-Cluster to the Cloud

(Platform LSF with Platform MultiCluster)

2

Dynamic Cluster Extension to the Cloud

(Platform LSF with Platform ISF)

(18)

Use Case #1

Integrated Cluster with the Cloud

Internal Resources

Cloud provider VPC

connection

Workload Manager

(Platform LSF)

The existing cluster nodes

are already too busy

Additional resources from

Amazon join automatically

Platform LSF contacts cloud

provider to launch VMs

User

Jobs

End user submits more jobs

A policy

determines that

the jobs can go to

the cloud

1

2

3

4

5

6

(19)

Internal

Resources

Cloud on-demand

instances

Workload Manager

(Platform LSF)

MCO automatically

forwards jobs to the

new cluster based

upon poliicies

User

Jobs

Transparent for end users

LSF asks cloud provider

to create a new cluster of

VMs (possibly CCI)

MultiCluster

orchestrator

Jobs that may go to

the cloud in RED

1

2

3

4

5

Use Case #2

(20)

I need a cluster:

1 master and 3

computes nodes to

test my new project

Dynamic Resource

Manager

(Platform ISF)

Cloud provider gets

request from ISF to build

new test cluster

ISF determines that no internal

resources are available and by

policy QA/TST can go to cloud

User

Jo

b

s

Grid is created and

then jobs get

submitted

Workload Manager

(Platform LSF)

Workload manager

requests resources from

dynamic resource manager

ISF passes master

location of new

nodes to LSF

1

2

3

4

5

7

6

Use Case #3

Dynamic Cluster Extension to the Cloud

ISF gets master

location from

cloud provider

(21)

Designed for Extensibility

Customizable scheduling algorithm

Open adapter architecture

Flexible architectural options

On Demand Scalability

Grow the grid when needed, shrink when not

Contain capital costs, keep utilization high

Industry-Leading Support

Large, worldwide development and support team

Extensive partner ecosystem

Nearly two decades of HPC experience

(22)
(23)

Problem #1: No HPC Infrastructure locally

Problem

• Smallest companies and consultants have no access to HPC

• Provisioning for HPC need is unwise / impractical

• Desktop is insufficient to service the workload

• Little or no IT/HPC expertise

Alternatives

• Very few

• Contract with Cycle Computing, Univa, others to build cloud

infrastructure (expensive, long lead time)

Desired solution

• Self service, near instantaneous availability, security

• Provide pre-configured SaaS (open source apps) for anyone with

an IaaS account

(24)

Platform OnDemand

Workstation

User

VPN OR MultiCluster

L S F D A T A

F I L E D A T A

Phase I: Life Sciences

Phase IB: GRE

(25)

Roadmap – On Demand

HPC

Cluster

Linux

Win

Joined

Indep

L/S &

Chem

CAE

& IM

O/G

GEO

GRE

DCC

FS

EDA

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

Phase I

Phase IB

Phase II

Phase III

Phase IIIB

Phase IV

CYQ3

2011

CYQ4

2011

CYQ1

2012

CYQ2

2011

Phase I

IB

Phase II

Phase III IIIB Phase IV

Marketplace GA

Independent Offering

(26)

EC2 Instance Sizes

Vertical

Master Host

HPC Cluster

Life Sci. / Chem

Standard Large

HPC

CAE / IM

Standard Large

HPC

Oil & Gas / GEO

Standard Large

HPC

GRE

Standard Small

High-Mem 4 XL

DCC

TBD

TBD

Size

Memory (GB)

Cores

HDD (GB)

Price $(Lin / Win)

Standard-Small

1.7

1

160

0.085 / 0.12

Standard-Large

7.5

2

850

0.34 / 0.48

Standard-XL

15

4

1690

0.68 / 0.96

Micro

0.613

1

EBS only

0.02 / 0.03

High-Mem XL

17.1

2

420

0.50 / 0.62

High-Mem 2XL

34.2

4

850

1.00 / 1.24

High-Mem 4XL

68.4

8

1690

2.00 / 2.48

High CPU-med

1.7

2

350

0.17 / 0.29

8

1690

0.68 / 1.16

(27)

References

Related documents