• No results found

How To Talk About Data Intensive Computing On The Cloud

N/A
N/A
Protected

Academic year: 2021

Share "How To Talk About Data Intensive Computing On The Cloud"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Data-intensive Computing on

the Cloud: Concepts,

Technologies and

Applications

B. Ramamurthy [email protected]

This talks is partially supported by National Science Foundation grants DUE: #0920335, OCI: #1041280

12/5/2011

ECC 1

(2)

Presenter’s Background in cloud

computing

• Bina

o Is a PI on two current NSF* grants related to cloud computing:

o 2009-2012: Data-Intensive computing education:

CCLI Phase 2: $250K

o 2010-2012: Cloud-enabled Evolutionary Genetics Testbed: OCI-CI-TEAM: $250K

o Faculty at the CSE department at University at Buffalo.

*National Science Foundation

12/5/2011

ECC 2

(3)

Outline of the talk

• Introduction to Data-intensive computing on the cloud

o Technology context: multi-core, virtualization, 64-bit processors, parallel computing models, big-data storages…

o Cloud models: IaaS (Amazon AWS), PaaS (Microsoft Azure), SaaS (Google App Engine)

• Demonstration of cloud capabilities

o Cloud models : Demos on amazon ec2 cloud o Data-intensive Computing: MapReduce

• A Certificate Program in Data-intensive Computing offered by SUNY (yes, SUNY approved)

• Questions and Answers

12/5/2011

ECC 3

(4)

Introduction: A Golden Era in

Computing

Powerful multi-core processors

General purpose graphic processors

Superior software methodologies

Virtualization leveraging the

powerful hardware Wider bandwidth

for communication Proliferation

of devices Explosion of

domain applications

(5)

Top Ten Largest Databases

0 1000 2000 3000 4000 5000 6000 7000

LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate

Top ten largest databases (2007)

Terabytes

Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/

12/5/2011

ECC 5

(6)

Top Ten Largest Databases in 2007 vs

Facebook ‘s cluster in 2010

0 1000 2000 3000 4000 5000 6000 7000

LOC CIA Amazon YOUTube ChoicePt Sprint Google AT&T NERSC Climate

Top ten largest databases (2007)

Terabytes

Ref: http://www.focus.com/fyi/operations/10-largest-databases-in-the-world/

12/5/2011

ECC 6

Facebook

21 PetaByte In 2010

(7)

Big-data Challenges

• Scalability issue: large scale data, high performance computing, automation, response time, rapid

prototyping, and rapid time to production

• Need to effectively address (i) ever shortening cycle of obsolescence, (ii) heterogeneity and (iii) rapid changes in requirements

• Transform data from diverse sources into intelligence and deliver intelligence to right people/user/systems

• How to store the big-data? What new computing models are needed?

• What about providing all this in a cost-effective manner?

12/5/2011

ECC 7

(8)

Enter the cloud

• Cloud computing is Internet-based computing, whereby shared resources, software and information are provided to computers and other devices on-demand, like the

electricity grid.

• The cloud computing is a culmination of numerous

attempts at large scale computing with seamless access to virtually limitless resources.

o on-demand computing, utility computing, ubiquitous computing, autonomic computing, platform

computing, edge computing, elastic computing, grid computing, …

12/5/2011

ECC 8

(9)

The Cloud Computing

• Cloud provides processor, software, operating systems, storage, monitoring, load balancing, clusters and other requirements as a service

• Pay as you go model of business

• When using a public cloud the model is similar to renting a property than owning one.

• An organization could also maintain a private cloud and/or use both.

• Cloud computing models:

o platform (PaaS), o software (SaaS),

o infrastructure (IaaS),

o Services-based application programming interface (API)

12/5/2011

ECC 9

(10)

Windows Azure

• Enterprise-level on-demand capacity builder

• Fabric of cycles and storage available on-request for a cost

• You have to use Azure API to work with the infrastructure offered by Microsoft

• Significant features: web role, worker role , blob storage, table and drive-storage

• Platform as a service

12/5/2011

ECC 10

(11)

Google App Engine

• This is more a web interface for a development environment that offers a one stop facility for design, development and deployment Java and Python-based applications in Java, Go and Python.

• Google offers the same reliability, availability and scalability at par with Google’s own applications

• Interface is software programming based

• Comprehensive programming platform irrespective of the size (small or large)

• Signature features: templates and appspot, excellent monitoring and management console;

• Free version to explore at: http://code.google.com/appengine/

• Software as a service: Evolutionary Genetics Testbed

12/5/2011

ECC 11

(12)

Amazon EC2

• Amazon EC2 is one large complex web service.

• EC2 provides an API for instantiating computing

instances with any of the operating systems supported.

• It can facilitate computations through Amazon Machine Images (AMIs) for various other models.

• Signature features: S3, Cloud Management Console, MapReduce Cloud, Amazon Machine Image (AMI)

• Excellent distribution, load balancing, cloud monitoring tools

• You can explore amazon using the free account at:

• http://aws.amazon.com/free/

6/23/2010

Wipro Chennai 2011 12

(13)

Demos

Amazon AWS: EC2 & S3 (among the many infrastructure services)

o Archiving on the cloud,

Windows instance

o Rescuing legacy applications using the cloud,

Windows instance

o A three-tier enterprise application

Tomcat, Mysql, Web server Linux instance

Bitnami AMI (Amazon Machine Image)

o A big-data application on a distributed cluster (Data- intensive computing)

Word count application on a cluster

MapReduce programming model on Hadoop Cluster

12/5/2011

ECC 13

(14)

Summary

• We explored the need for data-intensive or big-data computing

• We discussed three popular cloud models that are delivered as services

• We illustrated cloud concepts and demonstrated the cloud capabilities through simple applications

• Data-intensive computing on the cloud is an essential and indispensable skill for the workforce of today and

tomorrow

• UB has implemented a SUNY-wide a Certificate Program in Data-intensive Computing

12/5/2011

ECC 14

(15)

References & useful links

• Amazon AWS: http://aws.amazon.com/free/

• AWS Cost Calculator:

http://calculator.s3.amazonaws.com/calc5.html

• Windows Azure: http://www.azurepilot.com/

• Google App Engine (GAE):

http://code.google.com/appengine/docs/whatisg oogleappengine.html

• For miscellaneous information:

http://www.cse.buffalo.edu/~bina

12/5/2011

ECC 15

References

Related documents

pestis infizierten HeLa-Zellen zeigen, dass YopM nicht nur im Zytosol sondern auch nach 4h Infektionsdauer im Zellkern nachgewiesen werden kann (Skrzypek et al., 1998).. Es

Libuše Brožová (Institute of Macromolecular Chemistry of the AS CR v.v.i., Czech Rep.) Bart Van der Bruggen (University of Leuven, Belgium, European Membrane Society) Miroslav

Seven patients in the whole cohort had a clear-cell carcinoma, which is known for it ’ s high level of MMP-14 expression: of these patients only one patient showed no epithelial

Ito of NE Chem- cat Corporation, Japan described the produc- tion of a higher power density fuel cell system with improved cost performance by the use of platinum

While latent space proposals assist in making meaningful and efficient transitions within a Markov Chain, PL-MCMC ultimately relies on the auxiliary distribution, q, and

After nearly two years of planning, negotiation and collaboration at local and state levels by Pamela Springer, Ph.D., R.N., Director of the School of Nursing and Pam Strohfus,

Although the field of application of the performance assessment system adopted in Tuscany goes beyond the individual health authority dimension, extending to all

Since SLPs are trained in the foundations of language, including phonology and morphology, it may be that the SLP is the natural choice to aid general educators in teaching