Cloud Computing for Big Data

(1)

(DOI: dx.doi.org/14.9831/1444-8939.2014/2-4/MAGNT.21)

Cloud Computing for Big Data Hanan Elazhary

Department of Computing and Information Technology King Abdulaziz University, Jeddah, Saudi Arabia

Abstract: Big Data is characterized by large data sets and compute-intensive applications. Examples include computational biology applications such as genome or DNA sequencing, proteomics, computational neuroscience, computational pharmacology and metagenomics. Physics, business, and government also have many applications. Such data and corresponding applications present a challenge to traditional storage and computing solutions. This is in addition to the problem of sharing such a large amount of data among researchers in a controlled fashion. Cloud computing is a promising solution that offers unlimited on-demand elastic storage and compute capacity at an affordable cost. The purpose of this paper is to discuss opportunities and challenges of using cloud computing for processing Big Data. Additionally, it provides a comprehensive survey of existing tools for Big Data and classifies them using a criterion specific for Big Data. Example applications utilizing these tools are also provided.

Keywords: Big Data, Computational Biology, Cloud Computing

1. Introduction

In recent years, there has been an increasing interest in Big Data applications. For example, computational biology [1] aims at gaining larger insight into biology. Computational biology applications include the Human Genome Project (HGP) [2] that aims at the complete understanding of the human genome1. Possible techniques for such a project involve DNA sequencing or full genome sequencing [3] whose goal is determining the full DNA sequence of a given genome at a single time. Another application is proteomics [4], which aims at the complete understanding of proteomes2. Computational neuroscience [5] refers to the study of the structure of the nervous system of the brain and its information-processing functions. The Mouse Brain Atlas [6, 7] and the Human Brain Atlas [8] are example projects carried out by the Allen Institute for

1_{The term "genome" refers to the whole set of genes}

of a given organism.

2

The term "proteome" is a mix of the

terms "protein" and "genome" and refers to the whole set of proteins of a given organism.

Brain Science. Metagenomics [9] is a field that aims at studying the genetic material obtained from environmental samples. Metagenomics data is both huge and noisy as it contains fragmented data that can represent about 10,000 species. Computational pharmacology [1] is another field concerned with finding a linkage between genes and diseases in order to identify potential drugs.

Physics has its applications too. For example, the European Organization for Nuclear Research (CERN) built the largest and most powerful particle collider, the Large Hadron Collider (LHC) [10] aiming at allowing physicists to test the predictions of different theories of particle physics and high-energy physics. Data produced by LHC and LHC-related simulations has been estimated to be approximately fifteen petabytes per year. The NASA Center for Climate Simulation (NCCS) [11] processes as much as 32 petabytes of climate observations and simulations [12]. The Sloan Digital Sky Survey (SDSS) [13] uses a dedicated optical telescope for sky survey. Data collection

(2)

began in 2000 and the images collected so far cover over 35% of the sky.

Amazon [14], e-bay [15], Walmart [16] and Facebook [17] are examples of business applications of Big Data. Governmental applications of Big Data include the analysis of cargo traffic from entry ports up to exit ports to ensure security of the global supply chain [18]. Omaba's campaign for example used Big Data to rally individual voters during elections in 2012 [19].

Big Data applications imply both the storage and compute-intensive analyses and processing of tremendous amounts of data. In the best case, most analyses are O(N) and this gets worse in case pairwise or higher-order associations are examined [20]. Unfortunately, traditional storage and computing solutions are inadequate for satisfying the requirements of such data and applications. Another problem is the need to share such data among researchers at different locations in a restricted and controlled fashion. This is in addition to the bandwidth required for the transfer of the data. Cloud computing offers promising solutions to most of these problems and so the goal of this paper is to provide definitions for cloud computing and to highlight opportunities and challenges in using cloud computing for Big Data. A comprehensive survey of Big Data tools is provided and the tools are classified using a criterion suitable for Big Data. The paper also provides example Big Data applications utilizing the cloud.

The paper is organized as follows: Section 2 provides definitions of cloud computing. Sections 3 and 4 discuss opportunities and challenges of cloud computing for Big Data respectively. Section 5 discusses and classifies existing tools for Big Data and example applications using these tools. Finally, Section 6 provides the conclusions.

2. Cloud Computing Definition

So far, there is no agreement in the literature about the definition of cloud

computing. To the best of our knowledge, the only formal definition in the literature has been published after years of work and 15 drafts by the National Institute of Standards and Technology (NIST) in September 2011 [21]. According to NIST [22], cloud computing is a model with five essential characteristics, three service models, and four deployment models. The five essential characteristics are:

Network access: Resources are available over the network and accessed through standard mechanisms using different types of clients such as mobile phones, tablets, laptops, PCs, and workstations. Convenient resource access: A consumer

can self-configure resources on-demand as needed with minimal interaction with the service provider.

Resource pooling: The resources are pooled to appear unlimited and serve multiple consumers; this is achieved by dynamically assigning and reassigning resources according to demand.

Rapid elastic provisioning of resources with minimal management effort: Resources can be elastically provisioned to scale rapidly outward and inward with demand.

Metered service: Provided services are metered on a pay-per-use basis at some level of abstraction according to the type of service.

The three service models are:

Infrastructure as a Service (IaaS): The consumer can be provided computing resources (such as processors, storage, and networks) to deploy and run arbitrary software including operating systems and applications with limited computing resources configuration.

Platform as a Service (PaaS): The consumer can deploy and run applications created using programming languages, libraries, services, and tools supported by the provider also with limited application-hosting environment configuration and no

(3)

configuration of the underlying infrastructure.

Software as a Service (SaaS): The consumer can use applications provided by the provider and running on a cloud infrastructure with limited consumer-specific application configuration.

The four deployment models are:

Private cloud: The cloud infrastructure is intended to be used exclusively by a single organization with multiple consumers.

Community cloud: The cloud infrastructure is intended to be used exclusively by a specific community of consumers belonging to different organizations but with common concerns and interests.

Public cloud: The cloud infrastructure is intended to be used by the general public. Hybrid cloud: The cloud infrastructure is formed of distinct cloud infrastructures (private, community, or public) that are linked together using standards that enable and facilitate portability as needed. The problem with this definition is that it is over-specified. This makes the definition both overwhelming (due to using too many terms) and un-extendable (due to being very specific). Accordingly, in spite of the effort exerted to formulate this definition, it has been criticized several times in the literature. According to Daconta [23], the definition is "incomplete, distorted and short-sighted" for many reasons. For example, it limited itself to three out of several possible "things as a service." Besides, it assumes that the three service models (IaaS, PaaS and SaaS) are layered, which is not always true. It also assumes that the three models are equally important, which is also considerably false.

Chou [24] mentioned that "the classification and some definitions of the four deployment models are redundant and inconsistent." For example, a

community cloud is in fact a private cloud but for a specific community. He also criticized the change of criteria of classification: a hybrid cloud is formed of different clouds, but a private cloud and a public cloud are classified according to their consumers.

We redefine cloud computing as a

computational model that provides metered convenient access to shared services. The

five terms employed in the above definition can be discussed as follows:

The term "model" is a general term that can describe different possible implementations and deployments; this implies that the deployment models of NIST (IaaS, PaaS and SaaS) should not be included as a part of the definition just as Personal Area Network (PAN), Local Area Network (LAN), Metropolitan Area Network (MAN), and Wide Area Network (WAN) are not included as a part of the definition of computer networks.

The term "services" is another general term that covers any type of service including physical and virtual services, hardware resources, software solutions, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).

The term "shared" implies pooled hosted networked ubiquitous services within the range of the cloud.

The term "metered" implies pay-per-use service for the benefit of both the consumers and the service providers. The term "convenient" is an extendable

term that incorporates as many features as needed such as on-demand rapid (possibly self-) configuration and access to services matching the consumer’s needs using different types of clients with minimal service provider interaction. It also incorporates pushing risk out of the business (from the point of view of the consumer) and elastic provisioning with

(4)

minimal management effort (from the point of view of the service provider).

3. Opportunities of Cloud Computing

Cloud computing offers tremendous opportunities for Big Data. It has several promising capabilities; for example:

Scalability: In cloud computing, capacity is virtually unlimited and so scalability is always possible; instead of running a job on a single computer for 10 hours, it can be run on 10 computers for a single hour. Elasticity: Resources are provisioned and

de-provisioned dynamically according to workload changes. Elasticity has three dimensions: cost, quality, and resources [25].

Pay-per-use Capability: Since resources are dynamically provisioned according to the workload changes, payment is made according to the actual utilization so as not to waste money

Sharing: Cloud computing allows the transparent sharing of resources. For example, cloud data stores allow sharing of large datasets instead of cashing copies on different individual clusters.

Data Reliability: Copies of data can be backed up in different geographical locations to overcome data loss even due to natural disasters. .

Big Data Paradigms: A set of paradigms such as MapReduce [26, 27] and Dremel [28] have been developed specifically for processing and analyzing Big Data.

Easier maintenance and upgrade: Maintenance is done by the service provider allowing researchers to concentrate merely on research.

4. Challenges of Cloud Computing

In spite of the many opportunities offered by cloud computing for Big Data, there are many challenges that need to be addressed; these include:

Security Issues: reduced control over the location of sensitive data and possibility of data leakage since data belonging to

different costumers can be stored in the same location. There is also the problem of privacy of human data on the cloud. Internet connection: In case of an

application with heavy communication, a stable Internet connection with a high bandwidth is required and is not always available. This is in addition to the time and cost required to transfer large datasets to the cloud or between clouds.

Big Data computational paradigms and tools such as MapReduce do not perform well and cause an increase of cost with the increase of the data size requiring algorithm rethinking and code refactoring [20].

Portability of applications and data among service providers.

Complicated pricing models make pricing difficult to assess and monitor.

Quality of Service (QoS) assurance.

5. Tools for Big Data

Many useful tools already exist for Big Data. In this paper, these are classified using a criterion suitable for Big Data. It is worth noting that OpenCrowd [29] maintains a Cloud Taxonomy of some of these tools, but provides a more general classification and ignores some very important tools. Our proposed classes are provided in the following sub-sections:

5.1 High-Performance Infrastructure as a Service

Infrastructure as a Service tools can be used for deploying and running arbitrary software including operating systems and applications. Big Data requires high-performance Infrastructure as a Service tools for running its compute-intensive applications and processing tremendous amounts of data. These include:

IBM Softlayer [30]: Softlayer Bare Metal Servers offer exceptional performance and storage capacity with speed, power, and flexibility needed for Big Data applications.

(5)

ProfitBricks [31]: It provides high performance IaaS suitable for Big Data applications.

Amazon EC2 [32]: High Performance Computing (HPC) required for Big Data is enabled via Cluster Compute or Cluster GPU servers in Amazon Web Services (AWS) cloud [33].

5.2 Storage as a Service

Big Data applications require huge storage capacity for tremendous amounts of data. Many tools are suitable for this purpose. They include:

Amazon Elastic Block Store (Amazon EBS) [34]: It provides block-level storage to be used with Amazon Elastic Compute Cloud (Amazon EC2) in the AWS Cloud.

Amazon S3 [35]: It provides a simple interface for the ubiquitous storage and retrieval of any amount of data on the Web.

AT&T Synaptic Storage [36]: It provides an elastic capacity allows ubiquitous access to data via an application program interface (API).

Google BigTable [37]: It provides storage for applications utilizing Google Platform as a Service tools provided by Google App Engine discussed in the following sub-sections.

HP Cloud Object Store [38]: It allows customers to create an unlimited number of containers with an unlimited number of objects on high-performance HP servers. Internap Cloud Storage [39]: This is an

object-storage system located in high-availability secure data centers and designed to scale to millions of objects. Zetta [40]: It offers a complete server

backup solution.

5.3 Big Data as a Service

Platform as a Service tools can be used for deploying and running applications created using provided programming languages, libraries, services, and tools. Big

Data as a Service tools are considered a sub-category of Platform as a Service tools specific for Big Data. A common prominent platform for processing Big Data is Apache Hadoop [41], which is an open-source platform with libraries and utilities for the storage and processing of Big Data. It utilizes the MapReduce algorithm for the distribution of data among processing nodes. Big Data as a Service tools include:

Actian DataCloud [42]: A platform that allows the development of integration and management solutions for data and applications of any size.

Altiscale [43]: It provides Hadoop as a Service.

Amazon Kinesis [44]: It allows developing applications that respond to changes in streaming Big Data with few lines of code.

BigML [45]: It is a cloud-based machine-learning platform that allows the development of predictions for online Big Data. BigML PredictServer [46] is a dedicated cloud image that can be used to develop blazingly fast predictions. Datameer [47]: It is a platform for

Hadoop with pre-built functionalities, but can be extended by plug-ins and open APIs.

Mortar Data [48]: It provides solutions, code, and tools for high-scale data science. It has been exploited by several customers such as Associated Press [49]. Qubole [50]: It offers several tools

including Hadoop MapReduce for complete Big Data service. Thus, it has been exploited by 50 customers such as NextDoor [51].

Cloudera [52]: A plaform for Hadoop running on the AWS cloud.

MapR [53]: A platform based on Hadoop to allow customers to easily store and process Big Data. It has been adopted by a large number of partners and customers including Google and Amazon.

(6)

Pig [54]: It is a high-level programming platform for creating MapReduce programs used with Hadoop.

Hadoop-BAM [55]: It is a library that acts as an integration layer between analysis applications and sequencing data that are processed using Hadoop in computational biology.

5.4 Data as a Service

Da ta as a Service tools provide data needed for specific applications. Such services are especially-needed for Big Data applications since collection of large datasets is not an easy task. Data as a Service tools include:

AWS Public Datasets [56]: It offers sets of data from eight different domains BrightPlanet [57]: It offers data from

selected sites on the Web.

5.5 Data Stores as a Service

Big Data cannot be efficiently manipulated using traditional relational database management systems that utilize SQL queries for data management. Thus, about fifty NoSQL data stores [58] have been proposed and developed specifically for Big Data to achieve both speed up and elasticity. These data stores can be broadly classified into:

Key-Value Stores: They are the simplest NoSQL data stores that store pairs of keys and values and retrieve values based on the keys. They can also sort the keys to enable range queries and ordered processing of keys. They are fast and can easily scale up with data size with huge changes per second with millions of simultaneous users in case of online, gaming, and mobile applications [59]. Example tools include Redis cloud [60] and Amazon Dynamo DB [61].

Document Stores: They pair each key with a document, which is a complex data structure that can contain different key-value pairs, key-array pairs, and nested documents. They are suitable for storing

unstructured data, such as social media posts and multimedia. Example tools include MongoDB [62] and CouchDB [63].

Column Stores: They store columns rather than rows of data. They are suitable for business intelligence applications and data warehouses when new values of a column are supplied for all rows at once. Example tools include: Cassandra [64] and Google BigQuery [65].

Graph Stores: They are used to store network data such as social connections. Example tools include: Neo4j [66] and Google Horton [67].

SpliceMachine [68] is claimed to be the only Hadoop RDMS allowing scaling up on larger servers and scaling out horizontally. It can support computational biology by handling huge amounts of data such as genomic data.

5.6 Software as a Service

A set of Software as a Service tools are developed and being developed to aid in the processing of Big Data. These include:

Plex [69]: It is a Software as a Service (SaaS) ERP for connecting and managing an entire manufacturing process

Opani [70]: It is a Software as a Service tool for the analysis of Big Data such as MRI images, Microscope images of cancer cells and MySQL databases. It has been adopted for Facebook status updates, Twitter and Yahoo Finance.

Many Software as a Service tools have been developed specifically for processing biological Big Data such as sequence analysis, alignment and mapping. These tools may be classified as Biology as a

Service tools and include ArrayExpressHTS

[71], BGI [72], Bioscope [73], CloudAligner [74], Cloud BioLinux [75], CloudBurst [76], Cloud-Coffee [77], Cloud-MAQ [78], CloVR [79], Crossbow [80], Eoulsan [81, 82], FX [83], Jnomics [84], Myrna [85], PeakRanger [86], SEAL [87], SeqWare[88],

(7)

YunBe [89], and VAT [90]. It is worth noting that some of these tools can be further classified according to their specific tasks [91, 92, 93].

6. Conclusions

Though traditional storage and computing solutions cannot meet the requirements of Big Data applications, cloud computing is a promising candidate for this purpose. Cloud computing has several inherent capabilities that offer real opportunities for Big Data. These include: scalability, elasticity, metered pay-per-use capability, sharing, data reliability, Big Data paradigms, in addition to easier maintenance and upgrade. On the other side, there are many challenges such as security and privacy issues, relatively slow Internet connections, the performance of Big Data paradigms in case of extremely large data sizes, the complicated pricing models, quality of service assurance, in addition to the portability of applications and data among different service providers. In the literature, a large number of tools already exist for several different types of Big Data applications and these have been surveyed and discussed in the paper. They are classified using a criterion suitable for Big Data and example applications that have already benefited from cloud capabilities are provided.

References

[1] http://en.wikipedia.org/wiki/Computati onal_biology; accessed July 2014. [2] http://www.genome.gov/12011238;

accessed July 2014.

[3] http://en.wikipedia.org/wiki/Whole_ge nome_sequencing; accessed July 2014. [4] http://en.wikipedia.org/wiki/Proteomic

s; accessed July 2014.

[5] http://en.wikipedia.org/wiki/Computati onal_neuroscience; accessed July 2014.

[6] http://mouse.brain-map.org/; accessed July 2014.

[7] Lein E. et al., "Genome-Wide Atlas of Gene Expression in the Adult Mouse Brain," Nature 445 (pp. 168-176,

2007).

[8] http://human.brain-map.org/; accessed July 2014.

[9] http://en.wikipedia.org/wiki/Metageno mics; accessed July 2014.

[10] http://en.wikipedia.org/wiki/Large_Ha dron_Collider; accessed July 2014. [11] http://www.nccs.nasa.gov/index.html;

accessed July 2014.

[12] http://en.wikipedia.org/wiki/Big_data; accessed July 2014.

[13] http://en.wikipedia.org/wiki/Sloan_Dig ital_Sky_Survey; accessed July 2014. [14] http://www.amazon.com/; accessed

July 2014.

[15] http://www.ebay.com/; accessed July 2014. [16] http://www.walmart.com/; accessed July 2014. [17] http://www.facebook.com/; accessed July 2014. [18] http://fcw.com/articles/2013/09/25/big-data-transform-government.aspx; accessed July 2014.

[19] Issenberg S., "How President Obama’s campaign used big data to rally individual voters, Part 1.," http://www.technologyreview.com/feat uredstory/508836/how-obama-used-big-data-to-rally-voters-part-1/; accessed July 2014.

[20] Kasson P., "Computational Biology in the Cloud: Methods and New Insights from Computing at Scale, Proc. Pac

Symp Biocomputing (pp. 451-453,

2013).

[21] http://www.nist.gov/itl/csd/cloud-102511.cfm; accessed July 2014. [22] Mell P. and Grance T., "The NIST

Definition of Cloud Computing," Special Publication 800-145, National

Institute of Standards and Technology (NIST), U.S. Department of Commerce

(8)

[23] Daconta M., “Why NIST's Cloud

Definition is Fatally Flawed,” http://gcn.com/articles/2012/04/02/real

ity-check-nist-flawed-cloud-framework.aspx; accessed July 2014. [24] Chou Y., “An Inconvenient Truth of

the NIST Definition of Cloud Computing,"

http://cloudcomputing.sys-con.com/node/2131995; accessed July 2014.

[25] http://en.wikipedia.org/wiki/Elasticity_ (cloud_computing); accessed July 2014.

[26] Dean, J. and Ghemawat S., "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, 51(1) (pp. 107-113, 2008).

[27] Dean J. and Ghemawat S., "MapReduce: A Flexible Data Processing Tool," Communications of the ACM 53(1) (pp. 72-77, 2010). [28] Melnik S. et al., "Dremel: Interactive

Analysis of Web-Scale Datasets," Communications of the ACM 54(6) (p.

114-123, 2011).

[29] http://cloudtaxonomy.opencrowd.com/; accessed July 2014.

[30] http://www.softlayer.com/bare-metal-servers; accessed July 2014.

[31] http://www.profitbricks.com/high-performance-computing-hpc; accessed July 2014.

[32] http://aws.amazon.com/hpc/; accessed July 2014.

[33] http://aws.amazon.com/; accessed July 2014. [34] http://aws.amazon.com/ebs/; accessed July 2014. [35] http://aws.amazon.com/s3/; accessed July 2014. [36] https://www.synaptic.att.com/clouduse r/html/productdetail/Storage_as_a_Ser vice.htm; accessed July 2014.

[37] http://en.wikipedia.org/wiki/BigTable; accessed July 2014.

[38] http://www.hpcloud.com/products-services/object-storage; accessed July 2014.

[39] http://www.internap.com/cloud/cloud-storage/; accessed July 2014.

[40] http://www.zetta.net/; accessed July 2014.

[41] http://en.wikipedia.org/wiki/Apache_H adoop; accessed July 2014.

[42] http://cloud.pervasive.com/; accessed July 2014. [43] https://www.altiscale.com/hadoop-cloud/solution-comparison/; accessed July 2014. [44] http://aws.amazon.com/kinesis/; accessed July 2014. [45] http://www.bigdata-startups.com/BigData-startup/bigml/; accessed July 2014. [46] https://bigml.com/predictserver; accessed July 2014. [47] http://www.datameer.com/; accessed July 2014. [48] http://www.mortardata.com/; accessed July 2014.

[49] http://www.ap.org/; accessed July 2014.

[50] http://www.qubole.com/; accessed July 2014.

[51] https://nextdoor.com/; accessed July 2014.

[52] http://www.cloudera.com/content/clou dera/en/solutions/partner/Amazon-Web-Services.html; accessed July 2014.

[53] http://www.mapr.com/products/produc t-overview/overview; accessed July 2014.

[54] http://pig.apache.org/; accessed July 2014.

[55] Niemenmaa M. et al., " Hadoop-BAM: Directly Manipulating Next Generation Sequencing Data in the Cloud,"

Bioinformatics 28(6) (pp. 876–877,

2012).

[56] http://aws.amazon.com/publicdatasets/; accessed July 2014.

(9)

[57] http://www.brightplanet.com/solutions/ data-as-a-service/; accessed July 2014. [58]

http://www.mongodb.com/nosql-explained; accessed July 2014.

[59] https://infocus.emc.com/april_reeve/bi g-data-architectures-nosql-use-cases-for-key-value-databases/; accessed July 2014. [60] http://redislabs.com/redis-cloud; accessed July 2014. [61] http://aws.amazon.com/dynamodb/; accessed July 2014. [62] https://mongolab.com/welcome/; accessed July 2014. [63] http://couchdb.apache.org/; accessed July 2014. [64] http://cassandra.apache.org/; accessed July 2014. [65] https://cloud.google.com/products/bigq uery/; accessed July 2014.

[66] http://www.neo4j.org/; accessed July 2014.

[67] http://research.microsoft.com/en-us/projects/ldg/; accessed July 2014. [68] http://www.splicemachine.com/;

accessed July 2014.

[69] http://www.plex.com/; accessed July 2014.

[70] http://readwrite.com/2011/05/06/opani-social-supercomputing-in; accessed July 2014.

[71] Goncalves A. et al., "A Pipeline for RNA-Seq Data Processing and Quality Assessment," Bioinformatics 27(6)

(pp. 867–869, 2011).

[72] http://www.genomics.cn/en/index; accessed July 2014.

[73] http://www.lifetechnologies.com/eg/en /home/life-science.html; accessed July 2014.

[74] Nguyen T. et al., "CloudAligner: A Fast and Full-Featured MapReduce Based Tool for Sequence Mapping,"

BMC Research Notes 4(171) (2011).

[75] http://cloudbiolinux.org/; accessed July 2014.

[76] Schatz M., "CloudBurst: Highly Sensitive Read Mapping with

MapReduce," Bioinformatics 25(11) (pp. 1363–1369, 2009).

[77] Tommaso P. et al., "Cloud-Coffee: Implementation of a Parallel Consistency-Based Multiple Alignment Algorithm in the T-coffee Package and its Benchmarking on the Amazon Elastic-Cloud,"

2010).

[78] Talukder A et al., "Cloud-MAQ: The Cloud-Enabled Scalable Whole Genome Reference Assembly Application," Proc. the 7th International Conference on Wireless

And Optical Communications

Networks (pp. 1-5, 2010).

[79] http://clovr.org/; accessed July 2014. [80] Langmead B. et al. "Searching for

SNPs with cloud computing," Genome

Biology 10(11) (2009).

[81] http://transcriptome.ens.fr/eoulsan/; accessed July 2014.

[82] Jourdren L et al., "Eoulsan: A Cloud Computing-Based Framework Facilitating High Throughput Sequencing Analyses.," Bioinformatics

28(11) (pp. 1542-3, 2012).

[83] Hong D. et al., "FX: An RNA-Seq Analysis Tool on the Cloud,"

2012).

[84] http://www.mybiosoftware.com/sequen ce-analysis/10943; accessed July 2014. [85] Langmead B. et al., "Cloud-Scale RNA-Sequencing Differential Expression Analysis with Myrna,"

Genome Biology 11(R83) (2010).

[86] Feng X., "PeakRanger: A cloud-Enabled Peak Caller for ChIP-Seq Data," Bioinformatics 12(139) (2011). [87] Pireddu L. et al. "Seal: A Distributed

Short Read Mapping and Duplicate Removal Tool," Bioinformatics 27(15) (pp. 2159–2160, 2011).

[88] O’Connor B. et al., "SeqWare Query Engine: Storing and Searching Sequence Data in the Cloud," BMC

(10)

Bioinformatics 11(Suppl 12:S2)

(2010).

[89] Zhang L. et al., "Gene set analysis in the cloud," Bioinformatics (2011). [90] Habegger, L., "VAT: A Computational

Framework to Functionally Annotate Variants in Personal Genomes within a Cloud-Computing Environment,"

Bioinformatics 28(17) (pp. 2267-2269,

2012).

[91] Lin Y., Yu C. and Lin Y., "Enabling Large-Scale Biomedical Analysis in the Cloud," BioMed Research International, 2013(185679) (2013).

[92] Dai L. et al., "Bioinformatics Clouds for Big Data Manipulation," Biology

Direct 7(43) (2012).

[93] Chen J. et al., "Translational Biomedical Informatics in the Cloud:

Present and Future," BioMed Research