• No results found

Cloud Computing through Virtualization and HPC technologies

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing through Virtualization and HPC technologies"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

1

Cloud Computing through

Virtualization and HPC technologies

William Lu, Ph.D.

(2)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

2

Cloud Computing & HPC

A Case of HPC Implementation

Application Performance in VM

Summary

Agenda

(3)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

3

• HPC users are always looking for more

capacities

• Cloud Computing concept is very attractive for

HPC sites with dynamic workload

Questions

• How cloud implementation look like in HPC?

• VM or not VM?

• What is the VM performance cost for HPC

applications?

Cloud Computing & HPC

(4)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

Portland11/19/2009

A Case of HPC

Implementation

(5)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

5

Stage 1: Commodity Cluster

• Problems:

– Existing UNIX server does not have enough

capacity to run CFD jobs

– The maintenance cost and administration cost

is about same as the original hardware

• Solution:

– X86 Cluster

– Open source cluster stack (Linux, MPI, open

source scheduler)

• Result:

– Capacity increases 5 times

– Cost decrease 50%

– Clusters are adopted in many departments

(6)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

6

Stage 2: Centralized clusters

• Problems:

– 50 clusters in an organization with

different management practice

– Each cluster needs budget to refresh

• Solution:

– Consolidate 50 clusters into 2 HPC

centers

– Use robust workload scheduler

(Platform LSF)

• Result:

– Centralized budgeting and capacity

planning

– Cut hardware cost by 20%

– Cut administration cost by 70%

(7)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

7

Stage 3: Breaking Group Silos

• Problems:

– Each group has dedicated

queue and a partition of a

cluster

– The utilization is low: 60%

– No funding from management

for increasing capacity before

getting higher utilization

• Solution

– Leverage advanced scheduling

features in Platform LSF

• Result

– Improved utilization from 60% to

75%

(8)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

8

Stage 4: Make Users Happy

• Problems

– Conflict scheduling policies among

different groups

– Conflict of OS/app environment

across groups

– Utilization is not optimal

• Solution

– Multiple elastic logical clusters

managed by Platform ISF

• Result

– Utilization reaches to 85%

– Users are much happier

VM

(optional)

(9)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

9

Stage 5: A Shared Dynamic HPC

Environment

• Problems:

– Can we get near 100% utilization of HPC

cluster?

– How to get capacity without limit?

• Solution:

– “Bursting”

• Result:

– HPC cluster reaches near 100%

utilization

– Peak workload can go beyond what HPC

cluster provides

(10)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

10

Comparison of User Experiences

Public Cloud IaaS

Resource available

on demand

Provisioning app

stack manually

Pay per use

VM

A dynamic shared

HPC infrastructure

Resource available

on demand

On demand pre-

defined stack

Shared cost with

other user groups

Physical & VM

(11)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

11

• Application performance in a shared

environment

• Application performance inside a VM

on single node

• Application performance inside a VM

using interconnects

Cloud Computing Performance Tests

via HPC Advisory Council

(12)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

12

• CPU: Dual socket Quad core 2.83 GHz Intel

• Memory: 8 GB RAM

• OS : CentOS Linux 5.3 x86_64

• Hypervisor: Citirx Xen 5.5

• VM Configuration

• 1x(8 GB VM with 8 Cores)

• 8x(1 GB VMs with 1 core)

Environments

(13)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

13

Serial runs show the overhead of VM is small

Serial Runs

0

0.2

0.4

0.6

0.8

1

1.2

LSDyna Compile ABAQUS Fluent

Physical

Virtual

(14)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

14

Memory Bandwidth Test

Copy Scale Add Triad

Physical Machine 3471.4773 3513.829 3761.7779 3729.6516

Virtual Machine 3472.9145 3518.0658 3760.3024 3725.5795

3300

3350

3400

3450

3500

3550

3600

3650

3700

3750

3800

Band w idt h ( M B/s) Hig h er is b ett er

Memory Bandwidth (Stream - 1.8 GB size)

VM adds small overhead to memory bandwidth

(15)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

15

I/O Bandwidth Test

write rewrite read reread

Physical Machine 21.99 20.74 38.94 39.08

Virtual Machine 7.39 6.35 42.03 42.17

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

I/O Band w idt h ( M B/s)

Disk I/O IOZone - 32 GB file

VM adds overhead to I/O bandwidth

(16)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

16

LS-Dyna

serial parallel-2 parallel-4 parallel-8

Physical 13058 6996 4677 3947

Virtual 13199 8070 5955 5477

0

2000

4000

6000

8000

10000

12000

14000

E lapsed T ime (s) L o w er is b ett er

LSDyna - MPP971 - Refined Neon

(30ms)

39%

27%

15%

1%

Parallel LSDyna does not performance in VM

(17)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

17

Parallel Fluent performs well in VM

Fluent

serial parallel-2 parallel-4 parallel-8

Physical 5479 3569 2275 2113

Virtual 5556 3202 2389 2151

0

1000

2000

3000

4000

5000

6000

A xis T it le

Fluent 12.1 - sedam_4m

(18)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

18

Abaqus

serial parallel-2 parallel-4 parallel-6

Physical 12768 6693 3958 3384

Virtual 12953 6836 4131 3397

Virtual - different VMs 12953 7175 4813 4370

0

2000

4000

6000

8000

10000

12000

14000

E lapsed T ime ( s) L o w er is b ett er

ABAQUS - Explicit - e2.inp

2-4%

Parallel Abaqus performs well in VM

(19)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

19

• For common CAE applications running in serial

mode, the VM overhead is small (< 4%)

• For common CAE applications running in

parallel mode, the VM overhead depends

• CPU intensive or memory intensive application

could be good candidates to run in VM

• I/O intensive application performance in VM

depends

• We expect the VM overhead is unacceptable for

MPI jobs over interconnects

Conclusion of the test

(20)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

20

• Technology is ready for implementation of

private HPC cloud without VM

• A full HPC cloud solution running on VM is very

application dependent

• We will continue to do application performance

test and share the results in HPC Advisory

Council

Summary

(21)

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.

21

Thank you

References

Related documents

To what extent does the implementation of Response to Intervention (RTI) impact nonclassified kindergarten students’ reading ability as measured by scores on the STAR Early

The ControlLogix controller can control these distributed I/O modules using the I/O Configuration tree in RSLogix 5000 programming software. Co om mm mu un niic ca atte e w wiitth h

Assumptions in the ground of option pricing should correspond to general relations between economic and financial variables and market transactions that describe random

The Department of Information Technology (DoIT) builds, manages and maintains City government information technology infrastructure and systems used by City departments to

In fact, the WGS–based approaches (both the wgMLST and the bioinformatics script) already implemented in the National Reference Laboratory will be thereafter applied for the

Meanwhile, a significant increase of coefficient of variation in East Nusa Tenggara caused this province to become the second highest inequality among provinces.. An increase in

The final design incorporated high levels of thermal efficiency using a clay block with external insulation to provide thermal mass, highly insulated roof

In the present study we have tried to analyze the change in HRV in night shift workers due to the cumulative sleep deprivation following one week of continuous night shift duty