Index Terms -HPC, Cloud Computing, EC2, SaaS II. I. INTRODUCTION

(1)

Research of High Performance Computing With

Clouds

Ye Xiaotao1, Lv Aili1, Zhao Lin2 1

Modern Education Technology Center,Henan Polytechnic University, Jiaozuo, China 2

Neusoft Institute of Information, Dalian, China

Abstract—HPC is most commonly associated with computing used for scientific research nowadays, which always uses supercomputers and computer clusters. Cloud computing – a relatively recent, builds on decades of research in virtualization, distributed computing, utility computing and more recently networking, web and software service. Cloud computing includes 3 services: SaaS, PaaS and IaaS. The popular general cloud service like EC2 allow users to provision compute clusters fairly and quickly by paying a monetary value only for the duration of the resources. Recently, HPC give rises of the cloud computing for cheaper economic solutions and more enterprises announced their HPC on-demand service. In this paper, some HPC applications (mostly recently) that have been deployed with clouds are also summarized. The possibility of using cloud computing for HPC is illustrated by experiments and more and more application types are well-suited to use cloud.

Index Terms -HPC, Cloud Computing, EC2, SaaS

I. INTRODUCTION

High-performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and quickly. HPC uses supercomputers and computer clusters to solve advanced computation problems. The application and the data both need to be moved to the available computational resource in order for them to be executed [1]. These infrastructures are highly efficient in performing compute intensive data movement. Today, computer systems approaching the teraflops-region are counted as HPC-computers.

Cloud computing is the latest and perhaps the most dramatic trend in advanced computing paradigms since the introduction of commodity clusters, which have dominated HPC for more than a decade. Clouds offer an amorphous distributed environment of computing resources and services to a dynamic distributed user base. Like clusters, cloud computing exploits economies of scale to deliver advanced capabilities. Unlike clusters, cloud resources are nonspecific and provide basic capabilities but guarantee neither identical properties from run to run nor high availability of specialized system types.

At present, the use of cloud computing in computation science is still limited, but the first step towards this goal have been already done. Last year, the Department of Energy (DOE) National Laboratories started exploring

the use of cloud services for scientific computing. On April 2009, Yahoo Inc. announced that it has extended its partnership with the major top universities in United States of America to advance cloud computing research and applications to computational science and engineering.

II. CLOUD COMPUTING

A. Cloud Definition

Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like a public utility. A technical definition [2] is "a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction." This definition states that clouds have five essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.

B. Cloud Technologies

The cloud technologies such as MapReduce and Dryad have created new trends in parallel programming [3]. The support for handing large data sets, the concept of moving computation to data, and the better quality of services provided by the cloud technologies make them favorable choice of technologies to solve large scale data/compute intensive problems.

Cloud technologies such as Google MapReduce, Google File System (GFS), Hadoop and Hadoop Distrubuted File System (HDFS), Microsoft Dryad, and CGL-MapReduce adopt a more data-centered approach to parallel runtimes[4][5]. In these frameworks, the data is staged in data/compute nodes of clusters or large-scale data centers, such as in the case of Google. The computations move to the data in order to perform the data processing. Distributing file systems such as GFS and HDFS allow Google MapReduce and Hadoop to access data via distributed storage systems built on heterogeneous compute nodes, while Dryad and CGL-MapReduce support reading data from local disks. The simplicity in the programming model enables better support for quality of services such as fault tolerance and monitoring.

Ye Xiaotao (1980-), male, Han, Qinyang Henan, master, lecturer, research area: High Performance Computing

Supported by The Henan project of Higher Education informationization. project number: 506062

ISBN 978-952-5726-10-7 Proceedings of the Third International Symposium on Computer Science and Computational Technology(ISCSCT ’10) Jiaozuo, P. R. China, 14-15,August 2010, pp. 289-293

(2)

Figure 1. Cloud computing offerings by services.

TableⅠsummarizes the different characteristics of Hadoop, Dryad, CGL-MapReduce, and MPI.

C. Cloud Computing Services Offering

Cloud computing is typically divided into three levels of service offerings: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Serivice (SaaS) [6]. Figure 1 provides such categorization.

Infrastructure as a Service - Traditional computing resources such as servers, storage, and other forms of low level network and hardware resources offered in a virtual, on demand fashion over the Internet. IaaS in a general sense, provides the ability to ’summon’ resources in specific configurations at will and delivers value similar to what one might find in a traditional datacenter. IaaS’ power lies in its massive on-the-fly flexibility and configurability. It can be equated to owning a magic wand that could conjure up a variety of network and server resources in zero time and occupying zero space. Examples include services like GoGrid, Amazon’s EC2 [7] and even S3 [8] (as a storage infrastructure play)

Platform as a service implementation provides users with an application framework and a set of API that can be used by developers to program or compose applications for the Cloud. In some cases, PaaS solutions are generally delivered as an integrated system offering both a development platform and an IT infrastructure on top of which applications will be executed. The two major players adopting this strategy are Google and Microsoft.

Software as a Service - Specialized software functionality delivered over the Internet to users who intend to use the set of delivered functionality to augment or replace real world processes. Generally speaking, users within the SaaS space are aggregated into ‘tenants’, or bodies of 1 or more categorically related users. Think Salesforce.com CRM, or SugarCRM.

Table Ⅱ gives a feature comparison of some of the most representative players in delivering IaaS/PaaS solution for cloud computing.

III. HIGH PERFORMANCE COMPUTING WITH CLOUDS

Cloud computing presents a unique opportunity for batch processing and analytics jobs that analyze terabytes of data and can take hours to finish. If there is enough data parallelism in the application, users can take advantage of the cloud's new "cost associativity": using hundreds of computers for a short time costs the same as using a few computer for a long time. Programming

abstractions such as Google's MapReduce and its open-source counterpart Hadoop allow programmers to express such tasks while hiding the operational complexity of choreographing parallel execution across hundreds of cloud computing servers. Some works with MapReduce has already been done and tested over the clouds. Again, the cost/benefit analysis must weigh the cost of moving large datasets into the cloud against the benefit of potential speedup in the data analysis. When we return to economic models later, we speculate that part of Amazon's motivation to host large public datasets for free may be to mitigate the cost side of this analysis and thereby attract users to purchase cloud computing cycles near this data.

Some commercial HPC applications that have been deployed with clouds have been described by focusing the nature of the application and the commercial benefits of the deployment with the clouds. For example, the Server Labs, Pathwork Diagnostics, Cycle computing and Atbrox and Lingit.

Nonetheless, the cloud computing model, in spite of its promise, either imposes constraints in conflict with some HPC requirements or simply fails to adequately support them [9]. Among these constraints is the underlying hardware architecture virtualization, which is valuable for generic usage of diverse cloud resources. Such resources generally provide portability but obstruct targeting algorithm optimizations to specific hardware structures, as is typical of HPC applications. The time-critical overhead that virtualization layers add further degrades the performance efficiency and scalability of some HPC workloads. Another performance issue related to clouds is that users share resources among multiple tasks for both computational and networking functionality. The resulting resource contention inserts sporadic and unpredictable delays, further degrading performance and making optimizations more difficult.

Networking is critical to HPC facility operations. The availability of network infrastructure enables-and potentially limits-collaboration among geographically

TABLE II CLOUD COMPUTING SOLUTION FEATURE COMPARISON Properties Amazon EC2 Google AppEngine Microsoft Azure

Service Type IaaS IaaS-Paas IaaS-PaaS

Support for(value offer) compute/ storage compute(web application) compute/ storage Value Added

Provider Yes Yes Yes

User access Interface Web APIs and Command Line Tools Web APIs and Command Line Tools Azure Web Portal Virtualization OS on Xen Hyperiview Application Container Service Comtainer Platform(OS & runtime) Linux, Windows Linux .NET on Windows Deployment Model If PaaS, ability to deploy on 3rd party IaaS N.A. No No

(3)

distributed groups. This is also true for computing systems that support execution of distributed tasks. Because the Internet is a key component of the cloud computing model, this new computing regime will exacerbate any pre-existing limitations in the network infrastructure. The ability to manage costs and acceptable application performance response times will determine operational effectiveness. The network will determine the distance between the data and the computation, which means that in the cloud model, if the bandwidth is low, the user must procure additional data storage near the computation. This increased reliance on data communication will likely be the first deciding criterion for whether an organization will adopt cloud computing.

External I/O can become a serious bottleneck to application performance if not balanced with application needs, buffering, and contention for these resources with other concurrent demands. Checkpoint and restart requirements for purpose of long-term reliability can impose further demands on I/O bandwidth, which, if not available, might seriously degrade overall delivered performance. Thus, I/O could further reduce the value of cloud computing to HPC users.

Beyond performance are the critical issues of security and reliability. Much data is highly sensitive, such as intellectual property, competitive planning information, or highly classified intelligence from mission-critical agencies with strong national security responsibilities. In these cases, users won't trust remote networking, storage, and processing resources, no matter how well-intentioned they assume the encryption and other implemented measures to be. Therefore, such organizations are unlikely to employ clouds for these purposes, which comprise a significant portion of HPC activity. Similarly, clouds might not provide sufficient reliability to adequately minimize risk-a particularly sensitive issue in time-bounded applications. Again, dedicated systems are more likely the preferred platform in these cases.

Edward Walker, a research scientist with the Texas Advanced Computing Center at the University of Texas at Austin, has done performance analysis of Amazon EC2 for high performance scientific applications. His results show a significant performance gap in the examined clusters that system builders, computational scientists, and commercial cloud computing vendors need to be aware of.

IV. EXPERIMENTS AND EVALUATION

HPC as a Service [10] is a computing model where users have on-demand access to including the expertise needed to set up, optimize and run their applications over the Internet. The traditional barriers associated with high-performance computing such as the initial capital outlay, time to procure the system, effort to optimize the software environment, engineering their system for peak demand and continuing operating costs have been removed. Instead, HPC as a Service user has a scalable cluster available on demand that operates and has the same performance characteristics as a physical HPC cluster located in their data room. There are different

definitions of cloud computing, but at the core "Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet." HPC as a Service extends this model by making concentrated, non-virtualized high-performance computing resources available in the cloud.

A. Benefits

HPC as a Service provides users with a number of key benefits as follows.

• HPC resources scale with demand and are available with no capital outlay-only the resources used are actually paid for.

• Experts in high-performance computing help setup and optimize the software environment and can help trouble-shoot issues that might occur.

• Faster time-to-results especially for computational requirements that greatly exceed the existing computing capacity.

• Accounts are provided on an individual user basis, and users are billed for the time they use service.

• A HPC platform for you and your applications: Support for ANSYS, OpenFOAM, LSTC, etc ... and third party support.

• Access from anywhere in the worlds with high-speed data transfer in and out.

B. HPC On-demand Service

More enterprises announced to offer a computing on demand solution aimed specifically at the HPC market, e.g. Penguin Computing's Penguin on Demand (POD), newservers' Bare Metal Cloud, Gompute, SGI's Cyclone and all kinds of middleware like Platform ISF.

Linux cluster maker Penguin Computing hopped on the HPC-in-a-cloud bandwagon with the announcement of its HPC on-demand service August 2009. POD provides a computing infrastructure of highly optimized Linux clusters with specialized hardware interconnects and software configurations tuned specifically for HPC. Rather than utilizing machine virtualization, as is typical in traditional cloud computing, POD allows users to access a server’s full resources at one time for maximum performance and I/O for massive HPC workloads.

Comprising high-density Xeon-based compute nodes coupled with high- speed storage, POD provides a persistent compute environment that runs on a head node and executes directly on the compute nodes’ physical cores. Both GigE and DDR high-performance Infiniband network fabrics are available. POD customers also get access to state-of-the-art GPU supercomputing with NVIDIA Tesla processor technology. Jobs typically run over a localized network topology to maximize inter-process communication, to maximize bandwidth and minimize latency.

Penguin has also been working with a new biomedical startup to understand the performance characteristics of their application on the POD system. Results on an 8 nodes configuration (using Amazon's High-CPU instance) show a runtime of 31.2 minutes on the POD and 18.5 hours on EC2 and shown in Table Ⅲ, putting the

(4)

POD about 32x faster than EC2 for this particular application. An infrastructure comparison between POD and EC2 in this test is given in the Table Ⅲ too.

Another example, Gompute provides on demand HPC for technical and scientific computing. Gompute's services allow users to exploit HPC resources over the Internet by paying for what they actually use. Gompute also provides its users with high quality training for the applications supported at Gompute's on demand service. Consultants and independent software vendors can sell their services and software licenses using Gompute.

HPC in the Cloud is a mixed bag. Unless you use a specially designed HPC cloud the I/O resources critical to HPC performance can be quite variable. This may be changing, however, as individual servers contain more cores. Recently IDC has reported that 57% of all HPC applications/users surveyed use 32 processors (cores) or

less. When the clouds start forming around 48-core servers using the imminent Magny Cours processor from AMD, many applications may fit on one server and thus eliminate the variability of server-to-server communication. HPC may start to take a very different form as dense multi-core servers enter the cloud. A user may sit at her desk submitting jobs to their own SGE desktop. The resource scheduler will then reach out to local resources or Cloud resources that can run virtualized or bare metal versions of her applications.

V. CONCLUSION AND FUTURE WORK

Cloud computing's potential for the particularly challenging domain of HPC is promising. In fact, many application types in the overall HPC workflow are well-suited to the near-term exploitation of cloud services. Furthermore, institutions that take advantage of clouds might benefit substantially in operational and cost–effectiveness as well as in flexibility and responsiveness to internal workload demands. But don't assume that clouds will easily replace the HPC systems that organizations currently deploy to provide the most extremes in capability; rather, the two world views must coexist, seeking benefits from clouds while achieving HPC's mission–critical requirements.

However, many anticipated properties of distributed cloud environments strongly suggest that clouds can only partly address HPC user needs and that some workload subdomains will remain beyond the capabilities of cloud services. Virtualization, uncertainty of hardware structural details, lack of network control and memory

TABLE III INFRASTRUCTURE COMPARISON AND

PERFORMANCE FOR APPLICATION FROM,POD VS.EC2

POD EC2

Network 1 GbE and DDR _InfiniBand Shared Memory, _{300-400MB/s X-transfer rate} Computing

Unit Xeon 5400

1.0-1.2 GHz, 2007 Opteron or 2007 Xeon processor

OS Linux Linux, Open Solaris, Windows Server and others

Run Time 31.2min 18.5hours

Latency 47ms 185ms

Throughput 20MB/s 5MB/s

TABLE I COMPARISON OF FEATURES SUPPORTED BY DIFFERENT PARALLEL PROGRAMMING RUNTIMES.

Feature Hadoop Dryad CGL-MapReduce MPI Programming Model MapReduce DAG based exectution flows MapReduce with a Combine phase Variety of topologies constructed using the rich set of parallel constructs

Data Handing HDFS Shared

directories/local disks

Shared directories/local

disks Shared directories Intermediate Data Communication HDFS/Point-to-point via HTTP Files/TCP pipes/Shared memory FIFO Content Distribution Network (NaradaBrokering

(Pallickara and Fox 2003))

Low latency communication

channels

Scheduling Data locality/Rack aware

Data

locality/Network topology based run time graph optimizations

Data locality Available processing capabilities

Failure Handing

Persistence via HDFS Re-execution of map and reduce tasks

Re-execution of vertices

Currently not implemented

(Re-executing map tasks, redundant reduce tasks )

Program level Check pointing OpenMPI(Gabriel, E.,G.E.Fagg, etal.2004),FT MPI Monitoring Monitoring support of HDFS, Monitoring MapReduce computations Monitoring support for execution graphs

Programming interface to monitor the progress of jobs

Minimal support for task level monitoring

Language Support

Implemented using Java.Other languages are supported via Hadoop Streaming

Programmable via C# DayadLINQ provides LINQ programming API for Dryad

Implemented using Java Other languages are supported via Java wrappers

C, C++, Fortran, Java, C#

(5)

access contention, repeatability, and protection and security all inhibit cloud paradigm adoption for certain critical uses. Also, it's unlikely that a general business model, implicit with clouds, will provide the extreme computing and peak performance. Finally, protected access to such facilities is a potential source of competitive edge for science, market, and national security, and the agencies that employ them will therefore limit or entirely preclude offering such systems to a cloud-covered processing world.

VI. ACKNOWLEDGMENTS

I would like to thank my colleagues on the HPU-HPC team for their contributions, insights, and support.

This paper is supported by the high-performance computing platform of Henan Polytechnic University.

REFERENCES

[1] WIKIPEDIA, "High-performance computing", http://en.wikipedia.org/wiki/High-performance_computing

[2] Cloud Computing Denition, National Insitute of Standards

and Technology, Version 15,

http://csrc.nist.gov/groups/SNS/cloud-computing/index.ht ml

[3] Jaliya Ekanayake and Geoffrey Fox, “High Performance Parallel Computing with Clouds and Cloud Technologies”, 1st International Conference on Cloud Computing, Oct 19-21, 2009.

[4] ASF. 2009. Apache Hadoop Core. http://hadoop.apache.org/core.

[5] ASF. 2009. Apache Hadoop Pig. http://hadoop.apache.org/pig/.

[6] C. Vecchiola, S. Pandey, and R. Buyya, “High-performance cloud computing: A view of scientific applications,” CoRR, vol. abs/0910.1979, 2009.

[7] Amazon Elastic Compute Cloud (EC2), http://aws.amazon.com/ec2/

[8] Amazon.com, Inc. 2009. Simple Storage Service (S3). http://aws.amazon.com/s3.

[9] Thomas Sterling, Dylan Stark, A High-Performance Computing Forecast: Partly Cloudy, Computing in Science & Engineering, July/August 2009, pp.42-49.

[10] HPC as a Service,

http://www.penguincomputing.com/POD/HPC_as_a_servi ce.