• No results found

List of Abbreviations

CACHECACHE

1.1.3. Cloud Computing

Cloud computing [14,38] is the evolution of a collection of technologies that have been gathered together to redefine the approach for building an IT infrastructure. Nothing is essentially new in any of the technologies employed in cloud computing, since most of them have been already used. The cloud computing term describes a computing paradigm, where a large pool of systems are connected in private or pub- lic networks, to provide dynamically scalable infrastructure for application, data and file storage. To this end, clouds are built using virtualized infrastructure technol- ogy. Virtualization is the process of converting a physical IT resource into a virtual one. Thus, cloud computing follows a very fundamental principal of reusability of IT capabilities, relying on the sharing of various resources (e.g., networks, servers, storage, applications, and services). With the advent of this technology, the cost of computation, application hosting, content storage and delivery is reduced signif- icantly.

A public cloud offers access to external users who are usually billed by con-

sumption using the pay-as-you-go model. Cloud Providers offer services that can

be grouped into three categories [100]: Software as a Service (SaaS), where a com-

plete application is offered to the customer as a service on demand; Platform as a

Service (PaaS), where a layer of software is encapsulated and offered as a service; and Infrastructure as a Service (IaaS), which provides basic storage and computing capabilities as standardized services over the network. Public IaaS cloud providers typically make huge investments in data centers and then hire it out, allowing con- sumers to avoid substantial capital investments and to obtain both cost-effective

1.1 Introduction to high performance computing 11

Figure 1.7: Growth of IaaS market. Source: [77]

and energy-efficient solutions [180]. IaaS accounts for less than ten percent of the cloud market in 2016 [77]. However, it was the fastest growing cloud-based service (see Figure 1.7), and it is expected to repeat the strong growth in 2017 as well.

Considering the demanding and dynamic nature of HPC applications, Cloud Computing technologies represent a powerful approach to managing technical com- puting resources [178]. The elastically scaling out to meet increased capacity de- mands is the obvious benefit of the cloud. Besides, other features make cloud com- puting an attractive option for meeting the needs of HPC applications. The cost savings in the cloud can be significant. The cloud supports rapid provisioning for particular workloads. The ability to rapidly provision new environments/clusters in minutes is key to the success and practicality of many HPC applications, compared to the time it can typically take to provision new hardware on-premise. Summa- rizing, combining scale and elasticity creates a capability for HPC cloud users that does not exist for centralized shared HPC resources. Each HPC user in the cloud can have access to their own set of HPC resources, such as compute, networking, and storage resources for their own specific applications with no need to share the resources with other users. They have zero queue time and can create systems architectures that their applications need.

In spite of the previous commented benefits, some challenges still remain for the adoption of cloud in HPC applications [178]. The most important are security and performance. Security remains a significant barrier to adoption, however the issue resides primarily in users’ trust and perception rather than limitations in ca- pability and architecture of various cloud platforms. Regarding performance, in the last decade, several researchers have studied the performance of HPC applications in cloud environments [63, 66, 70, 102, 147]. Most of these studies use classic MPI benchmarks to compare the performance of MPI on public cloud platforms. These works conclude that the lack of high-bandwidth, low-latency networks, as well as the virtualization overhead, has a large effect on the performance of HPC applications on the cloud. It is in response to these issues that some cloud providers, such as Amazon [11] or Microsoft Azure [139], have recently provided compute nodes which utilize hardware found in HPC clusters and that assert to be optimized for running HPC applications.

Programming frameworks in the cloud

New programming environments are being proposed to deal with large scale computations on the cloud. These new distributed frameworks provide high-level programming abstractions that simplify the development of distributed applications including implicit support for deployment, data distribution, parallel processing and run-time features like fault tolerance or load balancing.

From the new programming models that have been proposed to deal with large scale computations on cloud systems, MapReduce [53] is the one that has attracted more attention since its appearance in 2004. In short, MapReduce executes in par-

allel several instances of a pair of user-provided map and reduce functions over a

distributed network of worker processes driven by a single master. Executions in

MapReduce are made in batches, using a distributed filesystem (typically HDFS) to take the input and store the output. MapReduce has been applied to a wide range of applications, including distributed pattern-based searching, distributed sorting, graph processing, document clustering or statistical machine translation among oth- ers. However, when it comes to iterative algorithms MapReduce has shown serious performance bottlenecks [64] mainly because there is no way of efficiently reusing data or computation from previous iterations. New proposals, not based on MapRe-

1.2 Optimization in computational systems biology 13

duce, like Spark [235] or Flink, which has its roots in Stratosphere [9], are designed from the very beginning to provide efficient support for iterative algorithms.

Spark provides a language-integrated programming interface to resilient dis- tributed datasets (RDDs), a distributed memory abstraction for supporting fault- tolerant and efficient in-memory computations. According to [235] the performance of iterative algorithms can be improved by an order of magnitude when compared to MapReduce.

In Chapter 5 of this Thesis we explore the feasibility of deploying our experi- ments on clouds, specifically on the Microsoft Azure public cloud. A performance evaluation has been carried out, comparing the obtained results with those of the local clusters. Besides, a preliminary comparison of one of the metaheuristics pro- posed, that is HPC oriented, with other similar implementation using Spark, that is throughput oriented, is also performed.

1.2.

Optimization in computational systems