• No results found

Computing. Chaitan Baru San Diego Supercomputer Center. Competitive Advantage Through Cloud Computing

N/A
N/A
Protected

Academic year: 2021

Share "Computing. Chaitan Baru San Diego Supercomputer Center. Competitive Advantage Through Cloud Computing"

Copied!
62
0
0

Loading.... (view fulltext now)

Full text

(1)

Technical Aspects of Cloud

Computing

Chaitan Baru

San Diego Supercomputer Center

Competitive Advantage Through

Cloud Computing

(2)

News flash…

“Now Available

in the Cloud..”

What does this

mean?

For the

developers

For the users

(3)

Outline

The changing context

Cloud computing definitions

Implementation considerations

Application case studies

Future directions

Course materials:

http://clds.sdsc.edu

, click on Education

Files stored in the SDSC Cloud

(4)

The Changing Context

Rapid growth in data

data-driven business decisions

Scientific workloads as a predictor of future business

workloads

Sensor-based systems, remote sensing, genome sequencing

A point of inflexion

Changing software: from RDBMS, to noSQL, to streaming,

to scientific data, …

Changing hardware: multi-core, solid-state disk, large

memory and new types of memory

Changing platforms: wholly-owned (“on premises”) systems

vs clouds

Changing business costs / models: ultra-high productivity,

energy efficiency, rent vs own, …

(5)

New sources of increasing data

volumes

• 

Sensor Networks Top Social Networks for Big Data, Stacey Higginbotham Sep. 13, 2010,

http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/

(6)

Gene Sequencing

~2TB/experiment in next-generation gene

sequencing

Sequencing of individuals

Multiple runs per individual

Multiple sequencing over time

Managing and Analyzing Next-Generation Sequence Data

,

Richter BG,

Sexton DP (2009). PLoS Comput Biol 5(6): e1000369. doi:10.1371/

journal.pcbi.1000369, June 2009

(7)

Remote Sensing

~1TB of high-resolution topographic data for

San Andreas Fault

10x more for imagery

Repeated scans for ecological applications

OpenTopography.org

LaSDI Initiative: Laser Spatial Data

Infrastructure

Sub-meter to 10cm scale 3D models of earth

(8)

Data is the new oil !

The data ecosystem

From acquisition, to transfer, storage, creation of

derived products, and exploitation

Those with data are better off than those without

Those who can exploit data have the competitive

advantage

Walmart, Fedex, Wall Street trading, Internet

companies (Google, Amazon, Facebook, Twitter,..)

And, cannot find oil without data…!

Oil exploration data growing from 10’sTB to PBs over

next few years

(9)

Where should all the data reside?

All in your private systems (private cloud)?

All in a public cloud ?

Hybrid model: Private + Public ?

(10)

Cloud Computing: Definition

Cloud computing is a model for enabling

convenient, on-demand network access to a

shared pool of configurable computing resources

(e.g., networks, servers, storage, applications,

and services) that can be rapidly provisioned and

released with minimal management effort or

service provider interaction

Another turn of the screw in our push

towards productivity

(11)

Cloud Computing: NIST Definition

On-demand self-service

You get the resource when you ask for it, using APIs

Broad network access

Accessible from anywhere

Resource pooling

Shared resource provisioning

Rapid elasticity

Uniform scaleout

Measured service

Monitoring of usage, reporting of usage

(12)

Delivery Models

Software as a Service (SaaS)

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

Importance?

Implications on what type of programming work

needs to be done

Who do you have (or want to hire) to work on

(13)

http://thoughtsoncloud.com/

(14)

Delivery Model: Software as a Service

Software as a Service (SaaS)

The capability provided to the consumer is to use the

provider’s applications running on a cloud

infrastructure

A

ccessible from various client devices through a thin

client interface such as a Web browser (e.g., web-based

email)

The consumer does not manage or control the

underlying cloud infrastructure

 

W

ith the possible exception of limited user-specific application

(15)

Software as a Service: Example

Google Maps API

http://code.google.com/apis/maps/

Users are provided with simple APIs for maps

Uses cloud resources at the back-end

Facebook

http://www.facebook.com

Social networking site using a suite of cloud-based tools at the

back-end

Animoto

http://www.animoto.com

Service that makes videos from user uploaded images

(16)

Salesforce.com: SaaS

(17)

Delivery Models: Platform as a Service

Platform as a Service (PaaS)

The capability provided to the consumer is to deploy

onto the cloud infrastructure consumer-created

applications using programming languages and tools

supported by the provider (e.g., java, python, .Net)

The consumer does not manage or control the

underlying cloud infrastructure, network, servers,

operating systems, or storage

 

B

ut the consumer has control over the deployed applications

(18)

Platform as a Service: Google AppEngine

(19)

Delivery Model: Infrastructure as a

Service

Infrastructure as a Service (IaaS)

The capability provided to the consumer is to

provision processing, storage, networks, and other

fundamental computing resources

C

onsumer is able to deploy and run arbitrary software,

which can include operating systems and applications

The consumer does not manage or control the

underlying cloud infrastructure

B

ut has control over operating systems, storage, deployed

applications, and possibly select networking components (e.g.,

firewalls, load balancers)

(20)

Infrastructure as a Service: Amazon

Web Services (AWS)

Amazon Elastic Compute Cloud (EC2)

▫ 

A web service that provides resizable

compute capacity in the cloud.

▫ 

Configure an Amazon Machine Instance

(AMI) and load it into the Amazon EC2

service

▫ 

Quickly scale capacity, both up and down,

as your computing requirements change

Amazon Simple Storage Service (S3)

▫ 

A simple web services interface that can be

used to store and retrieve large amounts of

data, at any time, from anywhere on the

web

▫ 

It gives any developer access to the same

highly scalable, reliable, fast, inexpensive

data storage infrastructure that Amazon

uses to run its own global network of web

sites

(21)

AWS,

aws.amazon.com

(22)

Microsoft Azure

www.microsoft.com/windowsazure/

(23)

OpenStack, www.openstack.org

Receiving attention. Example, Cisco support for OpenStack

http://www.slideshare.net/CiscoSP360/velocity-2011-cisco-and-open-stack

(24)

Eucalyptus

24

Target market

On-premise (private) IaaS

Use existing infrastructure to create AWS-compatible cloud

Products:

Eucalyptus IaaS

Eucalyptus OpenSource

(25)

Eucalyptus IaaS

(26)

Nirvanix

CloudComplete

Can vary among private, hybrid, public cloud

implementations, using Nirvanix’s public cloud

(27)

Virtualization and Cloud Computing

Virtualization is the ability to run “virtual machines” on top of

a “hypervisor.”

A virtual machine (VM) is a software implementation of a

machine (i.e., a computer) that executes programs like a physical

machine.

Each VM includes its own kernel, operating system, supporting

libraries and applications.

A hypervisor provides a uniform abstraction of the underlying

physical machine. Multiple VMs can execute simultaneously on a

single hypervisor.

The decoupling of the VM from the underlying physical hardware

allows the same VM to be started on different physical machines.

Virtualization is an enabler for cloud computing

Gives the cloud computing provider the flexibility to move and

allocate the computing resources requested by the user wherever

the physical resources are available.

(28)

SNIA CDMI

Cloud Data

Management

Interface

Standardizing

at the IaaS

level

28

(29)

Some Take Home Lessons

Cloud providers are providing you a

service

, not

just a product

Product model: sell product, support product

Service model: provide service, become intimately

exposed to all aspects of the service that the

customer sees

Seeing this from a customer’s viewpoint

(30)

Cloud Computing Costs*

(31)

Cloud Computing: The Rationale

Flatten out the peaks and valleys of utilization to

get higher overall utilization of entire

infrastructure

Bring together workloads with different valley /

peak behaviors

But…is running a high utilization operation the

same as running a low utilization operation?

Velocity 2010: Datacenter Infrastructure Innovation, James

Hamilton, VP & Distinguished Engineer, Amazon AWS

http://www.youtube.com/watch?v=kHW-ayt_Urk

(32)

Dealing with Peaks

Old approach:

Provision for peak workload. Low utilization at other

times

HPC approach:

Build a machine for a certain max job size. Provide job

queue and “on-demand”, pre-emptible access at other

times.

Cloud approach:

Charge different rates for use at different times, based

on usage

E.g. Amazon Spot Instance

Typical server workloads: 10-15%

(33)

Some Application Case Studies

from XLDB

XLDB11: 5

th

Extremely Large Databases

Workshop, Oct 18-19, SLAC, Palo Alto, CA

“State of practice” workshop

Presentations on current system implementation

and challenges, and needs and requirements

E.g. presentations from: Facebook, LinkedIn,

eBay, Google, Netflix, Novartis, Quora,

Metamarkets, Microsoft, …

http://www-conf.slac.stanford.edu/xldb2011/Program.asp

(34)

Quora.com

Scaling up

Quickly in the

Cloud,

Edmond Lau,

Quora, XLDB

Workshop

34

(35)

Quora.com

(36)
(37)

Quora

(38)

Quora

(39)
(40)

Metamarkets, Michael Driscoll,

co-founder, CTO

(41)
(42)
(43)

Metamarkets: Performance at scale

Evaluating the online ad market

Billions of microtransactions per day

Require billion rows/second performance

Fast analytics over 100’s of terabytes

Metamarket’s Druid system

Partial aggregates + In-memory data + Indexes

Distributed data + Parallelizable Queries =

Horizontal Scalability

Real-time analytics

Implemented in the cloud (AWS)

(44)

Cloud Data Analytics, Roger Barga,

Microsoft Azure

(45)
(46)
(47)
(48)

Netflix

Presentation by Eric Colson, VP, Netflix

All Netflix processing (DVD rentals and

Streaming video) is in the cloud (AWS)

Total cost of Netflix implementation may be

higher than an in-house solution but,

Netflix made a business strategy decision. They

are not in the business of running IT

infrastructure

Cloud computing required them to build a

distributed IT team, which did not match their

culture of building close teams.

(49)

Application Examples: Bioinformatics

Crossbow

Genotyping from short reads using cloud

computing

http://bowtie-bio.sourceforge.net/crossbow/

index.shtml

SDSC Project to implement Hadoop-based

processing for next generation sequencing on

SDSC’s HPC systems as well as clouds (AWS)

(50)

Role of Big Data

What is the connection between cloud and big data?

Cloud

Scaling

Lots of data

Big data

Lots of data

Hadoop

A software (eco)system for efficient processing of very

large data

Uses MapReduce, which has become a convenient

language for low entry barrier, very large-scale data

processing

Could use cloud computing resources to implement

Hadoop

$1M Question: data movement and data locality

(51)

Discussion of Survey

Results from “Cloud Storage: Adoption, Practice

and Deployment”, survey conducted for the

Storage Networking Industry Association (SNIA)

(52)

Research and Markets Survey

Cloud Computing in HPC: Rationale for Adoption

Top reason: Access to extra resources to meet peak system load

requirements

Cost Avoidance

Continued demand for HPC compute cycles. Cloud computing could deliver

low-cost computing cycles.

Capacity Management

Deal with periodic demand peaks and better management of data center growth,

power, and cooling issues.

Collaboration

Integration of internet-based applications and communications may allow HPC

users to better work with those both inside and outside of their organizations.

Evaluation of Cloud Systems

Looking at the cloud system alternative to determine if and how they can make

use of the technology and concepts.

Organizational Requirement

Making sure that the competition does not “steal a march” with a new technology.

(53)

Monitoring and Benchmarking

Monitoring of resource usage is essential for

cloud environments

What about SLAs and QoS?

“Cloud computing means get your legal teams

lined up” !

Resource monitoring is well-recognized need

But need to ensure the right level of monitoring

and reporting is available

Benchmarking is a new frontier

(54)

Amazon Cloudwatch

54

http://aws.amazon.com/

cloudwatch/

(55)
(56)

Azurescope

56

(57)

Azurescope application “probe”

(58)

Azurescope write “probe”

(59)

Future Directions: Benchmarking

The need for Big Data and Cloud benchmarks

The changing software landscape

From RDBMS and Data Warehousing to NoSQL,

Hadoop, Unstructured data, Stream Processing, Graph

Processing, …

The changing hardware landscape

Multi-core, SSD, new types of memory, large memory,

different networking options, commodity vs high-end, …

Multiple platform choices

Dedicated data platforms

(60)

Benchmarking Issues

“Reference benchmarks” for big data (TPC style)

Define modalities of big data

Define end-to-end flows of big data

Identify key real-world characteristics, e.g. multi-rack,

heterogeneous hardware

Identify which existing benchmarks can be reused

“Probe” benchmarks for clouds

Cloud performance can be variable

“Application-level” performance probes

The

Cloud Weather Service™

(61)

Future Directions: “Vertical” Clouds

Clouds that are aimed at major markets

Should have something unique about them, and a

sizable market

E.g.

Collocated clouds for performance

e.g. Wall Street trading systems, online

advertisement systems, etc.

Collocated clouds for security / privacy

(62)

References

Related documents

The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed

In pieces such as Circling Below, I started to explore the recording process in a live setting to try and be more transparent in the electronic processes that I was using.. An

Flow division control facilities shall be provided as necessary to ensure organic and hydraulic loading control to plant process units and shall be designed for easy operator

A plain seam made to resemble a French seam by the face-to-face enclosing of the folded seam edges. The mock French seam which is also known as False French or Imitation French seam

The consumer does not manage or control the underlying cloud infrastructure, network, servers, operating systems, or storage, but the consumer has control over

The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed

underlying cloud infrastructure, network, servers, operating systems, or storage, but the consumer has control over the deployed applications and possibly application hosting

Landsat satellite images for the years 1999, 2007 and 2011 were processed to analyze the river channel migration, changes in the river width and the rate of erosion and