• No results found

Performance of HPC Applications on the Amazon Web Services Cloud

N/A
N/A
Protected

Academic year: 2021

Share "Performance of HPC Applications on the Amazon Web Services Cloud"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloudcom 2010

November 1, 2010

Indianapolis, IN

Keith R. Jackson, Lavanya Ramakrishnan, Krishna

Muriki, Shane Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright

Lawrence Berkeley National Lab

Performance of HPC Applications

on the Amazon Web Services

(2)

Goals

•  Understand the performance of Amazon EC2 for realistic HPC workloads

•  Cover both the application space and algorithmic space of typical HPC workloads

•  Characterize EC2 performance based on the communication patterns of applications

(3)

Methodology

•  Create cloud virtual clusters

•  configure a file server, head node, and a

series of worker nodes.

•  Compile codes on local LBNL system with Intel Compilers / OpenMPI, move binary to EC2

(4)

Hardware Platforms

•  Franklin:

•  Cray XT4

•  Linux environment / Quad-core, AMD Opteron / Seastar interconnect, Lustre parallel filesystem

•  Integrated HPC system for jobs scaling to tens of thousands of processors; 38,640 total cores

•  Carver:

•  Quad-core, dual-socket Linux / Nehalem / QDR IB cluster

•  Medium-sized cluster for jobs scaling to hundreds of processors; 3,200 total cores

(5)

Hardware Platforms

•  Lawrencium:

•  Quad-core, dual-socket Linux / Harpertown / DDR IB cluster •  Designed for jobs scaling to tens-hundreds of processors; 1,584

total cores

•  Amazon EC2:

•  m1.large instance type: four EC2 Compute Units, two virtual cores with two EC2 Compute Units each, and 7.5 GB of memory

(6)

HPC Challenge

!" #" $" %" &" '!" '#" ()*+,*" -*)./01." 2)3*,.4156" 7(#" !"#$$%&"'()*+% 897::";9<=>?@" !" !#$" %" %#$" &" &#$" '" '#$" (" (#$" $" )*+,-+" .+*/012/" 3*4+-/5267" 8)&" !"#$%&'()*+,-' 9:;8<=">?@ABC"

(7)

HPC Challenge (cont.)

!" #!!" $!!!" $#!!" %!!!" %#!!" &'()*(" +(',-./," 0'1(*,2/34" 5&%" !"#$!%#&'(")*'+,-.' 6',76/,8"0'9:";<=>" !" !#!$" !#%" !#%$" !#&" !#&$" !#'" !#'$" ()*+,*" -*)./01." 2)3*,.4156" 7(&" !"#$!%#&'()'*+(,-.' 8).981.:";<"=>;?@A" !" #!" $!" %!" &!" '!!" '#!" '$!" '%!" ()*+,*" -*)./01." 2)3*,.4156" 7(#" !"#$%&'()*+,( 2)8,.49":;<=" !" !#$" %" %#$" &" &#$" '" '#$" (" )*+,-+" .+*/012/" 3*4+-/5267" 8)&" !"#$%&$'()*+!,-.) 9*/:42:;<"=>9?@A"

(8)

NERSC 6 Benchmarks

•  A set of applications selected to be representative of the broader NERSC workload

•  Covers the science domains, parallelization schemes, and concurrencies, as well as machine-based

characteristics that influence performance such as

message size, memory access pattern, and working set sizes

(9)

Applications

•  CAM: The Community Atmospheric Model

•  Lower computational intensity

•  Large point-to-point & collective MPI messages

•  GAMESS: General Atomic and Molecular Electronic Structure System

•  Memory access

•  No collectives, very little communication •  GTC: GyrokineticTurbulence Code

•  High computational intensity

•  Bandwidth-bound nearest-neighbor communication plus collectives with small data payload

(10)

Applications (cont.)

•  IMPACT-T: Integrated Map and Particle Accelerator Tracking Time

•  Memory bandwidth & moderate computational intensity

•  Collective performance with small to moderate message sizes

•  MAESTRO: A Low Mach Number Stellar Hydrodynamics Code

•  Low computational intensity

•  Irregular communication patterns

•  MILC: QCD

•  High computation intensity

•  Global communication with small messages

•  PARATEC: PARAllel Total Energy Code

(11)

Application and Algorithmic Coverage

Science areas Dense linear algebra Sparse linear algebra Spectral Methods (FFT)s Particle Methods Structured Grids Unstructured or AMR Grids Accelerator Science X X IMPACT-T X IMPACT-T X IMPACT-T X Astrophysics X X MAESTRO X X X MAESTRO X MAESTRO Chemistry GAMESS X X X X

Climate CAM X CAM X X

Fusion X X X GTC X GTC X Lattice Gauge X MILC X MILC X MILC X MILC Material Science X PARATEC X PARATEC X X PARATEC

(12)

Performance Comparison

(Time relative to Carver)

!" #" $" %" &" '!" '#" '$" '%" '&" ()*+ ,," (-." /*0). -" 12.)*" *)+ ,-34#5%" !"#$%&'! &()$* &'+ ,'-). *&.' )6789:"+.#" ;7<=>:?@A6" B=7:CD@:" !" #!" $!" %!" &!" '!" (!" )*+," -./.01," !"#$%&'! &()$* &'+ ,'-). *&.' .23456"1,$" +37896:;<2" =836>?;6"

(13)

NERSC Sustained System

Performance (SSP)

SSP represents delivered performance for a real workload

CAM Climate GAMESS Quantum Chemistry GTC Fusion IMPACT-T Accelerator Physics MAESTRO Astro- physics MILC Nuclear Physics PARATEC Material Science •  SSP: aggregate measure of the workload-specific, delivered

performance of a computing system

•  For each code measure

•  FLOP counts on a reference system •  Wall clock run time on various systems •  N chosen to be 3,200

(14)

Application Rates and SSP

Problem sets drastically reduced for cloud benchmarking !"!# !"$# %"!# %"$# &"!# &"$# '()*+)# ,)(-./0-# 1(2)+-3045# 6'&# !"# $%&'()*! +# $(,*-(./ 0.,%'1(* 234 5#6* !"!# !"$# !"%# !"&# !"'# ("!# ("$# ("%# )*+, --# )./# 0+1*/ .# 23/*+# +*, -.45# +06/# 1*4* .,/# !" #$%&'() *+,- .-/01 $2% /78398# :87;<=>;# 67?89;@>AB# ,/$#

(15)

Application Scaling & Variability

!" #!!!" $!!!!" $#!!!" %!!!!" !" %!" &!" '!" (!" !"#$%&'(% )*#+$,%-./%01'2'% )*%" +,-./01234" !" #!!!" $!!!" %!!!" &!!!" '!!!!" '#!!!" '$!!!" !" #!" $!" %!" &!" !"#$%&'(% )*#+$,%-./%01'2'% ()#" *+,-./0123" MILC PARATEC

(16)

PARATEC Variability

!" #!!!" $!!!!" $#!!!" %!!!!" %#!!!"

&'("$" &'("%" &'(")" &'("*" &'("#" &'("+" &'(","

!"##$%&' ()"%*+&#,*&%*-,' "%./* !" #!" $!!" $#!" %!!" %#!" &!!" &#!" '!!" '#!" #!!" ()*"$" ()*"%" ()*"&" ()*"'" ()*"#" ()*"+" ()*"," !"#$%& '(")*+,#-*,)*.-/ ")01*

(17)

!" #" $!" $#" %!" %#" !" $!" %!" &!" '!" #!" (!" )!" *!" !"#$%&'! &()$* &'+ ,'-) ./ &#01"%' 23,%%"#10)$,#' +,-."+/-."0123-45" +,-."+/-."61-78,729" :1-7/;":,-."0123-45" :1-7/;":,-."61-78,729" +<:<=>?" @A0?" @<>B=:C" DE?<@" A@+<?=F=" G=?"

Communication/Perf Correlation

(18)

Observations

•  Significant performance variability

•  Heterogeneous cluster: 2.0-2.6GHz AMD, 2.7GHz Xeon •  Shared interconnect

•  sharing un-virtualized hardware?

•  Significantly higher MTBF

•  Variety of transient failures, including

•  inability to access user data during image startup; •  failure to properly configure the network;

•  failure to boot properly;

•  intermittent virtual machine hangs; •  Inability to obtain requested resources.

(19)

Amazon Cluster Compute Instances

!" !#$" %" %#$" &" &#$" '()* ++" ',-" .)/ (-," 01-()" )(*+ ,23& $4" !" #$ % &' !& () $* &' +, '-). *& .' 56789:;<=>" *-&?@9A6" *-&?@9A6?3BA" C86:DE<:" -68198" !" !#$" %" %#$" &" &#$" '" '#$" ()*+" ,-.-/0+" !" #$ % &' (& )* $+ &' ,-'. *( +& (' *123456789" 0+&:;4<1" 0+&:;4<1:=><" ?315@A75" +13B43"

(20)

Conclusions

•  Standard EC2 performance degrades significantly as applications spend more time communicating

•  Applications that stress global, all-to-all communication perform worse then those that mostly use point-to-point communication

•  The Amazon Cluster Compute offering has significantly

(21)

Acknowledgements

•  This work was funded in part by the Advanced Scientific Computing Research (ASCR) in the DOE Office of Science under contract number DE-C02-05CH11231

•  Nicholas J Wright was supported by the NSF under award OCI-0721397

•  Special thanks to Masoud Nikravesh and CITRIS, UC

References

Related documents

In this paper we estimated the potential that PHEVs have for reducing energy consumption and GHG emissions relative to current gasoline vehicles in China 2020

In 2012, Widodo researching on Analysis and Design Website For Media Promotion And Sales Selfish Cloting Company E - Commerce or electronic marketing is one of the

The program offered conditional cash transfers to the rural poor in exchange for sending their children to school and for regular attendance at health clinics and pláticas (small

It is clear that nurses will not continue to work in environments that take little account of their professional and personal needs and aspirations. Research on why nurses are

The source-filter model for speech production was implemented; where the filter was estimated by the LPC coefficients computed from the MFCCs vectors, and the

have a unique capacity to initiate extra-follicular B cell responses through rapid activation of Ag-specific B cells [18] , suggesting the possibility that the two DC subsets

Using the Theory of Planned Behavior (TPB), this study compares the contribution of each construct (i.e. attitudes, subjective norms, and.. perceived behavioral control) to

Make sure the water is coming through all the parts at the faucet, such as the timer, water splitter, pressure regulator, filters (including filter washers that may be at