INFO5011. Cloud Computing Semester 2, 2011 Lecture 1, Cloud Computing Introduction

(1)

INFO5011

Cloud Computing Semester 2, 2011

Lecture 1, Cloud Computing Introduction

Some slides were developed using the original Berkeley RAD lab Above the Clouds Presentation

The original PPT slides and the Berkeley paper can be found from:

(2)

Outline

›

Cloud computing – a broad definition and views

›

The road to cloud computing

›

Cloud killer apps

›

From users’ perspective

›

From providers’ perspective

›

Challenges and opportunities

(3)

cloud computing– a broad definition

›

A definition by US goverments’ National Institute of Standard

and Technology

-

Cloud computing is a model for enabling ubiquitous, convenient,

on-demand network access to a

shared

pool of

configurable

computing

resources (e.g.,

networks, servers, storage, applications

,

and services

)

(4)

Cloud computing delivering models

›

In this definition, cloud computing has three delivery models:

- Software as a Service (SaaS): The consumer uses an application, but does not control the operating system, hardware or network infrastructure on which it's running.

- Applications are restricted to business applications or applications that may normally installed in a business network or personal computer

- Examples

- Business applications: CRM solutions from salesforce.com

(5)

Cloud computing delivering models (II)

- Platform as a Service (PaaS): The consumer uses a hosting environment for their applications. The consumer controls the applications that run in the

environment (and possibly has some control over the hosting environment),but does not control the operating system, hardware or network infrastructure on which they are running. The platform is typically an application framework.

(6)

Cloud computing delivering models (III)

› Infrastructure as a Service (IaaS): The consumer uses "fundamental computing resources" such as processing power, storage, networking components or

middleware. The consumer can control the operating system, storage, deployed applications and possibly networking components such as firewalls and load balancers, but not the cloud infrastructure beneath them.

Cloud Server and Data Center Map:

(7)

(8)

All cloud services exist in a spectrum

EC2

Azure

AppEngine

Lower-level,

Less management

Higher-level,

More management

Force.com GoogleApps

Utility computing

(9)

Stack of Services: Berkley’s view

›

Example

- Netflix: world's leading Internet subscription service for movies and TV shows

- Netflix migrates from its own data centers to AWS in 2010

- Capacity growth rate is accelerating, unpredictable

- Year on year customer growth is 52%, year on year customers using streaming is up 145% (from ~4M to ~11M).

- Product lunch spikes– ipone, wii, PS3,

Xbox

- Datacenter is large inflexible capital commitment

SaaS User

SaaS Provider / Cloud User

Cloud Provider

Utility Computing Web Application

(10)

Netflix example: reasons for moving to cloud

› We needed to re-architect, which allowed us to question everything, including whether to keep building out our own datacenter solution.

› Letting Amazon focus on datacenter infrastructure allows our engineers to focus on building and improving our business.

› We’re not very good at predicting customer growth or device engagement.

› We think cloud computing is the future.

http://techblog.netflix.com/2010/12/four-reasons-we-choose-amazons-cloud-as.html

Q & A with Cloud Architect, at Netflix.

―Many folks claim that, they can deliver a private cloud at a similar price point to AWS. I assume you ran the numbers yourself. In whatever detail you can share, what does the ROI look like for Netflix?‖

› ―Oracle on IBM is very expensive, so AWS looks cheap in comparison‖

› ―AWS costs are fully burdened, and we could not have hired enough SAs and DBAs to build out our own datacenter this fast.‖

› ―costs are elastic, you start paying for a resource just before it goes live, and if you stop using a resource you stop paying for it‖

(11)

Netflix example: moving to cloud is not a simple

platform migration

› Public cloud is less reliable than private datacenter

- May require migrating state in volatile memory between instances › Co-tenancy is hard

- Multi-tenancy is an important feature of cloud platform

- Co-tenancy can introduce variance in throughput at any level of the stack. › The best way to avoid failure is to fail constantly

› Learn with real scale, not toy models

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

A down side of current cloud services is the SLA and its monitoring, measuring and claiming credit back.

(12)

The road to cloud computing

›

Public Cloud Uptake Will Be Driven by Informal Buyers

›

Enterprise IT Buyers Will Stay Focused on Virtualization Over Cloud

›

New Cloud Offerings Will Increase a Typical Enterprise's Service

Usage

›

―Data from the 2010 Forrester survey shows that

80

percent of enterprise

decision makers surveyed said

that

consolidating IT infrastructure through

server virtualization

is a high priority. In contrast,

29

percent of

respondents said that

building an internal private cloud

operated by IT (not

a service provider) is a high priority at their organization and

28

percent

said

that

using a [public] cloud service provider for storage or server

consolidation

was a high priority.‖

Shane O’Neill, Cloud Computing in 2011: 3 Trends Changing Business Adoption, PCWorld, Feb 2011

(13)

Server Sprawl

›

Server sprawl (large number of underutilized servers) has become a major

problem in many IT departments

›

Main causes:

- Requirement by vendors to run their applications in isolation

- Operating system heterogeneity

- Mail server may require Windows server; a database maybe best run in lunix or Solaris.

- Mergers and acquisitions and other integration projects may end up with a large collection of servers, each dedicated to a single task

›

―81 percent of CIOs were using virtualization technologies to drive

consolidation, according to a recent survey by CIO‖ (2008 survey)

(14)

Increasing utilization is hard!

Beyond Server Consolidation. Werner Vogels. ACM Queue, Jan/Feb 2008. Average CPU utilization of 5000+ servers at google during a six- month period

(15)

Virtualization-based server consolidation

›

Benefits of virtualization:

- It breaks the 1:1 relationship between applications and the operation system and between the operating system and the hardware

- It creates N:1 relationships so that we can run multiple isolated applications on a single shared resoruce

- It also enables 1:N relationships where applications can span multiple physical resources more easily by providing elasticity in their resource usage.

›

Challenges in server consolidation

- How to accurately characterize an application’s resource requirements

- How to optimally distribute the virtual machines hosting the applications over the physical resources

(16)

100% utilization is not the goal

›

Workload in the enterprise are heterogeneous

›

Demand is uncertain and often occurs in spikes

›

Operating System starts to behave unpredictable under high CPU and IO

load

›

for pure CPU-bound environments,

70

percent seems to be achievable for

highly tuned applications; for environments with mixed workloads,

40

percent is a major success, and

50

percent has become the Holy Grail.

›

Real word estimates of server utilization in datacenters range from 5% to

20%!

(17)

Private Cloud and Hosted private cloud

›

―

Anecdotally and from surveys, it's becoming clear that most enterprises

are first looking to the private cloud as a way to play with cloud tools and

concepts in the safety of their own secure sandbox.”

[http://www.pcworld.com/businesscenter/article/224228/public_cloud_vs_private_cloud_why_not_both.html]

›

It uses

similar technologies

as those in the public cloud, but is behind a fire

wall and is only open to departments and people within the organization

›

Maybe useful for large enterprise, but the upfront cost would be too much

for small organizations. ―

The level of expertise we would have needed

in-house to make this happen doesn't make sense for a company of our size,

and it doesn't even make sense for our road map for the next three to five

years

.‖

›

Amazon offers hosted private cloud, Virtual Private Cloud

(18)

Public Cloud computing

›

The obvious benefits:

-

Illusion of infinite resources

-

No up-front cost

-

Fine-grained billing (e.g. hourly)

›

The driving technologies

-

Experience with very large datacenters (Warehouse Scale Computer)

-

Unprecedented economies of scale

-

Pervasive broadband Internet

-

Fast x86 virtualization

-

Pay-as-you-go billing model

(19)

(20)

Cloud Killer Apps

›

Mobile and web applications

›

Extensions of desktop software

- Matlab, Mathematica

›

One off Batch processing / MapReduce

- Washington Post Engineer used 200 EC2 instances (1,407 server hours) to convert 17,481 pages of Hillary Clinton’s travel documents into a form more

friendly to use in WWW presentation

- NY Times used 100 instances of Amazon EC2 to convert 11 million historical articles from TIFF to PDF, within 24 hours, all articles as 4TB data were

converted into 1.5 TB of pdf.

- NY Times builds its own Hadoop Toolkit to enable easy writing of MapReduce jobs

- Motivated by huge volume of data log and the difficulties of running it

(21)

Cloud users’ incentive

›

There is significant overhead in acquiring IT resources

›

Server acquisition times often run into several months

›

In large organization, once a resource has been allocated to a project,

teams are unwilling to release it given the long lead times in reacquiring

the resource when needed again.

›

This in turn increase the wasted server times.

(22)

The famous Animoto example

―They had 25,000 members on Monday, 50,000 on Tuesday, and

250,000 on Thursday. Their EC2 usage grew as well.

For the last month or so they had been using between 50 and 100 instances. On Tuesday their usage peaked at around 400, Wednesday it was 900, and then 3400 instances as of Friday morning.‖

(23)

Unused resources

Economics of Cloud Users

• Pay by use instead of provisioning for peak

Static data center Data center in the cloud

Demand Capacity Time Re s o u rc e s Demand Capacity Time Res o u rc e s

(24)

Unused resources

Economics of Cloud Users

• Risk of over-provisioning: underutilization

Data center/cloud server distribution: http://www.datacentermap.com/

Demand Capacity Time Res o u rc e s

(25)

Economics of Cloud Users

• Heavy penalty for under-provisioning

Lost revenue

Lost users

Res o u rc e s Demand Capacity Time (days) 1 2 3 Res o u rc e s Demand Capacity Time (days) 1 2 3 Res o u rc e s Demand Capacity Time (days) 1 2 3

(26)

Economics of Cloud Providers

›

5-7x economies of scale [Hamilton 2008]

›

Extra benefits

- Amazon: utilize off-peak capacity

- Microsoft: sell .NET tools

- Google: reuse existing infrastructure

Resource

Cost in

Medium DC

Cost in

Very Large DC

Ratio

Network

$95 / Mbps / month

$13 / Mbps / month

7.1x

Storage

$2.20 / GB / month

$0.40 / GB / month

5.7x

Administration

≈140 servers/admin >1000 servers/admin

7.1x

(27)

Adoption Challenges

Challenge

Opportunity

Availability

Multiple providers & DCs

Data lock-in

Standardization

Data Conﬁdentiality and

Auditability

Encryption, VLANs, Firewalls;

Geographical Data Storage

(28)

Datacenter

blackout is ―common‖

Amazon data center lists: In US:

* Ashburn, Virginia * Dallas/Fort Worth * Los Angeles

* Miami

* Newark, New Jersey * Palo Alto, California * Seattle

* St. Louis

Amazon data center lists: In other countries: * Amsterdam * Dublin * Frankfurt * London * Hong Kong * Singapore * Tokyo http://www.datacenterknowledge.com/

―Because our costs vary by location, pricing for data served from edge locations outside of the US varies, and is currently slightly higher,‖ – Jeff Barr in AWS blog

―Last week’s engthy outage for Amazon Web Services cloud computing platform was caused by a network configuration error as Amazon was attempting to upgrade capacity on its network‖[April 21st_{, 2011]}

―More than 18 million blogs WordPress.comwere down for several hours Tuesday [March 22nd, 2011]night. ―A fix for a

server issue caused a series of failures,‖ Automattic reported on its Twitter feed.‖

―The social news site Redditis revising how it uses Amazon’s cloud computing service following performance problems

that contributed to six hours of downtime for the Reddit site this week[ March 18th_{, 2011].‖ ―Several hours after the}

latencies were reported as fixed, AWS reported that connectivity problems related to a ―misbehaving network device.‖‖

―Hundreds of thousands of UK customers of Vodafonelost service this morning [February 28th_{, 2011] after switch}

(29)

Growth Challenges

Challenge

Opportunity

Data transfer

bottlenecks

FedEx-ing disks, Data

Backup/Archival

Performance

unpredictability

Improved VM support, flash

memory, scheduling VMs

Scalable storage

Invent scalable store

Bugs in large distributed

systems

Invent Debugger that relies

on Distributed VMs

Scaling quickly

Invent Auto-Scaler that relies

(30)

Policy and Business Challenges

Challenge

Opportunity

Reputation Fate Sharing Offer reputation-guarding

services like those for email

Software Licensing

Pay-for-use licenses; Bulk

(31)

Cloud Services SLA

› Most cloud provider’s SLA does not contain fine grained Service Level Objectives (SLO)

› It is customers’ responsibility for noticing problems and for collecting evidence

(32)

Cloud Services SLA comparison

Windows Azure compute Amazon – EC2 RackSpace --Cloud Servers

Uptime/available guarantee

99.95% (overall), 99.9%(role instance)

99.95% 100%

Notification onus Customer Customer customer

Time window Notify incidents within 5 days, submit claim before next billing month

30 days after incident 30 days after incident

Credit back < 99.95%(99.9%) : 10% credut <99%: 25% credit

10% of bill per eligible credit period

5% of the fees for each 30 minutes of network or data center downtime, up to 100% of the fees·

5% of the fees for each additional hour of downtime past time-to-resolve, up to 100% of the fees

(33)

Cloud Services SLA comparison (II)

Windows Azure storage Amazon – S3 RackSpace --Cloud Files

Uptime/available guarantee

99.9%

―Error rate‖ is defined as the number of failed transaction divide by total number of transactions. The definition of ―failed transaction‖ considers maximum process. E.g. for transaction type ―copy blob‖, processing time over 90 seconds is considered as failed; for normal query, processing time over 10 seconds is considered as failed.

99.9%

―Error Rate‖ means: (i) the total number of internal server errors returned by Amazon S3 as error status ―InternalError‖ or

―ServiceUnavailable‖ divided by (ii) the total number of requests during that five minute period. We will calculate the Error Rate for each Amazon S3 account as a percentage for each five minute period in the monthly billing cycle.

99.9%

(i) The Rackspace Cloud network is down, or (ii) the Cloud Files service returns a server error response to a valid user request during two or more consecutive 90 second intervals, or (iii) the Content Delivery Network fails to deliver an average download time for a 1-byte reference document of 0.3 seconds or less, as

measured by The Rackspace Cloud's third party measuring Notification onus Customer Customer customer

Time window Notify incidents within 5 days, submit claim before next billing month

10 business days after the current billing cycle

30 days after incident

Credit back < 99.9% : 10% credit <99%: 25% credit 99 % - 99.9% 10% Less than 99% 25% 99.89% - 99.5% 10% 99.49% - 99.0% 25% … Less than 96.5% 100%

(34)

Monitoring Services

›

Cloud providers has mechanism for monitoring services

- E.g Amazon Cloud Watch

- A web service that provides monitoring for AWS cloud resources, starting with Amazon EC2. It provides customers with visibility into resource utilization,

operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic.

- Such tool aims more for customers to decide the way of using services, e.g. how many instances, at what capacity

(35)

Main Resources

› Arik HesselDahl, Seven Questions for Adam Selipsky, VP at Amazon Web Services, All things Digital,

March, 7, 2011 [accessible from: http://newenterprise.allthingsd.com/20110307/seven-questions-for-adam-selipsky-head-of-amazon-web-services/ ]

› Cloud Computing Use Case Discussion Group, Cloud Computing Use Case White Paper (version 4.0) ,

July, 2010 [accessible from: http://opencloudmanifesto.org/Cloud_Computing_Use_Cases_Whitepaper-4_0.pdf ]

› "Above the Clouds: A Berkeley View of Clould Computing", 2009

http://berkeleyclouds.blogspot.com/

its Twitter feed

http://newenterprise.allthingsd.com/20110307/seven-questions-for-adam-selipsky-head-of-amazon-web-services/

http://opencloudmanifesto.org/Cloud_Computing_Use_Cases_Whitepaper-4_0.pdf