January 25, 2014
1
Our team: UndergradTim Hegeman, Stefan Hugtenburg, Jesse Donkevliet …
GradSiqi Shen, Guo Yong, Nezih Yigitbasi StaffHenk Sips, Dick Epema, Alexandru Iosup, Otto Visser CollaboratorsIon Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan, Vlad Nae (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea, Alexandru Olteanu (UPB), Derrick Kondo (INRIA), Assaf Schuster, Mark Silberstein, Orna Ben-Yehuda (Technion), Kefeng Deng (NUDT, CN), ...
Introduction to
Cloud Computing
Alexandru Iosup
Parallel and Distributed Systems Group Delft University of Technology The Netherlands
SPEC RG Cloud Meeting
January 25, 2014
2
January 25, 2014
3
What is Cloud Computing?
3. A Useful IT Service
“Use only when you want! Pay only for what you use!”
January 25, 2014
4
Q:What do youuse?
Q:Why not thislevel? Q:Why not thislevel?
January 25, 2014
5
Agenda
1. What is Cloud Computing?
2. IaaS Clouds, the Core Idea
3. The IaaS Owner Perspective
4. The IaaS User Perspective
5. Reality Check
6. Conclusion
IaaS Cloud Computing
Joe Has an Idea ($$$)
(Source: A. Antoniou, MSc Defense, TU Delft, 2012. Original idea: A. Iosup, 2011.)
MusicWave
•
Big up-front commitment
•
Load variability: NOT supported
Solution #1
Buy or Rent
…
10%
(Source: A. Antoniou, MSc Defense, TU Delft, 2012. Original idea: A. Iosup, 2011.)
Solution #2
Deploy on IaaS Cloud
(Source: A. Antoniou, MSc Defense, TU Delft, 2012. Original idea: V. Nae, 2008.)
Q:So are we just shifting the problem to somebody else, that is, the IaaS cloud owner?
•
NO
big up-front commitment
•
Load variability: supported
Inside an IaaS Cloud Data Center
(Source: A. Antoniou, MSc Defense, TU Delft, 2012. Original idea: A. Iosup, 2011.)
Time and Cost Sharing Among Users
User C
User B MusicWave
(Source: A. Antoniou, MSc Defense, TU Delft, 2012.)
Main Characteristics of IaaS Clouds
1. On-Demand Pay-per-Use
2. Elasticity (cloud concept of Scalability)
3. Resource Pooling
4. Fully automated IT services
5. Quality of Service
January 25, 2014
January 25, 2014
13
Agenda
1. What is Cloud Computing?
2. IaaS Clouds, the Core Idea
3. The IaaS Owner Perspective:
How to Deploy a Cloud?
4. The IaaS User Perspective
5. Reality Check
6. Conclusion
IaaS Cloud Deployment Models
PrivateOn-premises
Public
Off-premises
Hybrid
(Source: A. Antoniou, MSc Defense, TU Delft, 2012. Original idea: Mell and Grance, NIST Spec.Pub. 800-145, Sep 2011.)
Resource Sharing Models
MusicWave January 25, 2014 15 MusicWave OtherApp
Space-Sharing
Time-Sharing
IaaS Clouds
MusicWave OtherAppQ:Which one is better?
Grids
Host OS Host OS OtherAppVirtualization
January 25, 2014 16 Virtualization Host OSMusicWave OtherApp OtherApp
Q:What to do now? Guest OS Virtual Resources VM Instance Applications Guest OS Virtual Resources VM Instance Applications
Q:What is the problem?
January 25, 2014
Virtualization and The Full IaaS Stack
17 Guest OS Virtual Resources VM Instance Applications Physical Infrastructure Virtual Infrastructure Manager
Virtual Machine Manager
Guest OS Virtual Resources
VM Instance Applications
Virtual Machine Manager
Guest OS Virtual Resources
VM Instance Applications
The Virtual Machine Lifecycle
January 25, 2014
18 (Source: A. Antoniou, MSc Defense, TU Delft, 2012.)
Use Case:
Amazon Elastic Compute Cloud (EC2)
•
Prominent IaaS provider
•
Datacenters all over the world
•
Many VM instance types
•
Per-hour
charging
January 25, 2014
19
Instance Capacity US$/hour
m1.small 0.10 m1.large 0.38 c1.xlarge 0.76 January 25, 2014 20
Agenda
1. What is Cloud Computing?
2. IaaS Clouds, the Core Idea
3. The IaaS Owner Perspective
4. The IaaS User Perspective:
How to Use Clouds? How to Choose Clouds?
5. Reality Check
6. Conclusion
Workload
January 25, 2014 21 MusicWave OtherApp Time MusicWave OtherApp OtherApp OtherApp Load = 4 RunTime= 6Use Case: Workloads of Zynga
(Massively Social Gaming)
January 25, 2014
22 Sources: CNN, Zynga.
Source: InsideSocialGames.com
“Zynga made more than $600M in 2010 from selling in-game virtual goods.” S. Greengard, CACM, Apr 2011 Selling in-game virtual
goods: “Zynga made est. $270M in 2009 from.”
http://techcrunch.com/2010/ 05/03/zynga-revenue/
Use Case: Workloads of Zynga
(Massively Social Gaming)
•
Load can grow very quickly
January 25, 2014 23 L o a dProvisioning and Allocation of Resources
January 25, 2014 24 L o a d Time
Provisioning
Allocation
Provisioning and Allocation of Resources
January 25, 2014 25 L o a d TimeProvisioning
Allocation
Q:What is the interplaybetween provisioning and allocation?
Provisioning and Allocation
Policies
January 25, 2014 26
Where?
When?
How many?
Time L o a dProvisioning
Allocation
From where?
Which type?
etc.
When?
etc.
Q:How manypolicies exist? Q: How to selecta policy?(Source: A. Antoniou, MSc Defense, TU Delft, 2012.)
Use Case:
Two Provisioning Policies, Compared
January 25, 2014
27
Startup
OnDemand
Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009
Use Case:
Two Provisioning Policies, Compared
Metrics for comparison
•
Job Slowdown (
JSD
): Ratio of actual runtime in the
cloud and the runtime in a dedicated non-virtualized
environment
•
Charged Cost (
C
c)
•
Utility (
U
)
January 25, 2014
28
Q:Charged cost vs Total RunTime?
Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009
Use Case:
Two Provisioning Policies, Compared
Workloads
January 25, 2014 29 0.0 0.5 1.0 1.5 2.0 0 20 40 60 CPU Load (%) Time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 Time (min) 0.0 0.5 1.0 1.5 2.0 0 20 40 60 Time (min)Uniform Increasing Bursty
Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009
System Hardware VIM Hypervisor Max VMs
DAS4/Delft 20 Dual quad-core 2.4 GHz 24 GB RAM 2x1 TB storage 64 FIU 7 Pentium 4 3.0 GHz 5 GB RAM 340 GB Storage 7
Amazon EC2 unkown/various - 20
Use Case:
Two Provisioning Policies, Compared
Environments
January 25, 2014
30
Villegas, Antoniou, Sadjadi, Iosup. An Analysis of Provisioning and Allocation Policies for Infrastructure-as-a-Service Clouds, (submitted). PDS Tech.Rep.2011-009
Use Case:
Many Provisioning Policies, Compared
Job Slowdown (JSD)
January 25, 2014 31 0 5 10 15 20 25 30 35 40 Job Slowdown DAS4 0 5 10 15 20 25 30 Job Slowdown FIU 0 5 10Uniform Increasing Bursty
Job Slowdown Workload EC2 Startup OnDemand ExecTime ExecAvg ExecKN QueueWait Q:Why is OnDemand worsethan Startup?
A:waiting for machines to boot
Use Case:
Many Provisioning Policies, Compared
Charged Cost (C
c)
January 25, 2014 32 0 50 100 150 200 250 300 Hours DAS4 0 5 10 15 20 25 30 Hours FIU 0 5 10 15 20 25 30 35 40 45Uniform Increasing Bursty
Hours Workload EC2 Startup OnDemand ExecTime ExecAvg ExecKN QueueWait Q:Why is OnDemand worsethan Startup?
A:VM thrashing
Q:Why no OnDemand on Amazon EC2?
Use Case:
Many Provisioning Policies, Compared
Utility (U
)
33 0 1 2 3 4 5 6 7 8 Utility DAS4 0 1 2 3 4 5 Utility FIU 0 1 2 3 4 5Uniform Increasing Bursty
Utility
Workload EC2
Startup
OnDemand ExecTimeExecAvg QueueWaitExecKN January 25, 2014
34
Agenda
1. What is Cloud Computing?
2. IaaS Clouds, the Core Idea
3. The IaaS Owner Perspective
4. The IaaS User Perspective
5. Reality Check:
Who Uses Public Commercial Clouds?
6. Conclusion
January 25, 2014
35 35
The Real IaaS Cloud
•
“The path to abundance”
•
On-demand capacity
•
Cheap for short-term tasks
•
Great for web apps (EIP, web
crawl, DB ops, I/O)
•
“The killer cyclone”
•
Not so great performance
for scientific applications
(compute- or data-intensive)
http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ Tropical Cyclone Nargis (NASA, ISSS, 04/29/08)
VS
January 25, 2014 January 25, 2014
36 (Source: http://www.cca08.org/files/slides/w_vogel.pdf)
Zynga zCloud: Hybrid Self-Hosted/EC2
•
After Zynga had large scale
•
More efficient self-hosted servers
•
Run at high utilization
•
Use EC2 for unexpected demand
January 25, 2014
37 (Sources: http://seekingalpha.com/article/609141-how-amazon-s-aws-can-attract-ugly-economics and http://www.undertheradarblog.com/blog/3-reasons-zynga-is-moving-to-a-private-cloud/)
Other Cloud Customers
•
218 virtual CPUs
•
9TB/2TB block/S3 storage
•
6.5TB/2TB I/O per month
January 25, 2014 38 (Source: http://markbuhagiar.com/technical/businessinthecloud/) January 25, 2014 39
Agenda
1. What is Cloud Computing?
2. IaaS Clouds, the Core Idea
3. The IaaS Owner Perspective
4. The IaaS User Perspective
5. Reality Check
6. Conclusion
January 25, 2014
40
Conclusion Take-Home Message
•
Cloud Computing = IaaS + PaaS + SaaS
•
Core idea = lease vs self-own
•
On-Demand, Pay-per-Use, Elastic, Pooled, Automated, QoS
•
The Owner Perspective
•
Time-Sharing
•
Virtualization
•
The User Perspective
•
Variable workloads
•
Provisioning and Allocation policies
•
Reality Check: 100s of users
http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/
January 25, 2014
41
Thank you for your attention!
Questions? Suggestions? Observations?
Alexandru Iosup
A.Iosup@tudelft.nl
http://www.pds.ewi.tudelft.nl/~iosup/
(or google “iosup”)
Parallel and Distributed Systems Group
Delft University of Technology
-http://www.st.ewi.tudelft.nl/~iosup/research.html -http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html -http://www.pds.ewi.tudelft.nl/More Info:
Do not hesitate to contact me…… And Now For Something Different
•
Ongoing projects in Cloud Computing at TUD
January 25, 2014
Massivizing Social Games: Distributed Computing
Challenges and High Quality Time – A. Iosup 43
Cloudification: PaaS for MSGs
(Platform Challenge) Build Social Gaming
platform that uses (mostly) cloud resources
•
Close to players
•
No upfront costs, no maintenance
•
Compute platforms: multi-cores, GPUs, clusters, all-in-one!
•
Performance guarantees
•
Hybrid deployment model
•
Code for various compute platforms—platform profiling
•
Load prediction miscalculation costs real money
•
What are the services?
•
Vendor lock-in?
•
My
data
Data, Data, Data
Understanding the Behavior of
Players in Online Social Games
January 25, 2014 44 460 73 66 596 871 198 1075 1004 32 127 51 240 (w/ E.Dias, A. Olteanu, F. Kuipers)
Iosup et al. An Analysis of Implicit Social Networks in Multiplayer Online Games.IEEE Internet Computing
45
• Using data centers for
dynamic
resource allocation
• Main advantages:
1. Significantly lower over-provisioning
2. Efficient coverage of the world is possible
Proposed hosting model: dynamic
Massive join
Massive join Massive leave
Nae, Iosup, Prodan. Dynamic Resource Provisioning in Massively Multiplayer Online Games. IEEE TPDS 2011
46
Resource Provisioning and Allocation
Static
vs.
Dynamic
Provisioning
250%
25%
[Source: Nae, Iosup, and Prodan, ACM SC 2008]
O v e r-p ro v is io n in g = W A S T E (% )
Nae, Iosup, Prodan. Dynamic Resource Provisioning in Massively Multiplayer Online Games. IEEE TPDS 2011
• Old scheduling aspects
• Workloads evolve over time and exhibit periods of distinct characteristics
• No one-size-fits-all policy: hundreds exist, each good for specific conditions
• Data centers increasingly popular (also not new)
• Constant deployment since mid-1990s
• Users moving their computation to IaaS-cloud data centers
• Consolidation efforts in mid- and large-scale companies
• New scheduling aspects
• New workloads
• New data center architectures
• New cost models
• Developing a scheduling policy is risky and ephemeral • Selecting a scheduling policy for your data center is difficult • Combining the strengths of multiple scheduling policies is …
Why Portfolio Scheduling?
What is Portfolio Scheduling?
In a Nutshell, for Data Centers
• Create a set of scheduling policies
• Resource provisioning and allocation policies, in this work
• Online selection of the active policy, at important moments
• Periodic selection, in this work
• Same principle for other changes: pricing model, system, …
Deng, Song, Ren, Iosup: Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds. SC 2013: 55
Performance Evaluation
1) Effect of Portfolio Scheduling (1)
• Portfolio scheduling is 8%, 11%, 45%, and 30%better than the best constituent policy
A portfolio scheduler can be better than
any of its constituent policies
Q:
What can prevent a portfolio scheduler from being
better than any of its constituent policies?
Deng, Song, Ren, Iosup: Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds. SC 2013: 55
Deng, Song, Ren, Iosup: Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds. SC 2013: 55
Performance Evaluation
1) Effect of Portfolio Scheduling (2)
• Job selection policies such as UNICEF and LXF that favor
short jobs have the best performance
(provisioning, job selection, VM selection) policy
Q:
How well do you think a single
would perform? Will it be dominant? (Rhetorical)
Massivizing Social Games: Distributed Computing
Challenges and High Quality Time – A. Iosup 51
Mobile Social Gaming and the SuperServer
(Platform Challenge) Support
Social Gaming on mobiles
•
Mobiles everywhere (2bn+ users)
•
Gaming industry for mobiles is new
Growing Market
•
SuperServer to generate content for
low-capability devices?
•
Battery for 3D/Networked games?
•
Where is my server? (Ad-hoc mobile
gaming networks?)
•
Security, cheat-prevention
Extending the Capabilities of Mobile
Devices through Cloud Offloading …
with Application to Online Social Games
Metrics:
• processing time
• packet size
• inter-arrival rate
Design & Implementation:
• based on OpenTTD
• repeatability
• offloading mechanisms
• instrumentation for metrics
53
Social Everything! So Analytics
•
Social Network=undirected graph, relationship=edge
•
Community=sub-graph, density of edges between its nodes higher
than density of edges outside sub-graph
(Analytics Challenge)
Improve gaming experience
•
Ranking / Rating
•
Matchmaking / Recommendations
•
Play Style/Tutoring
Self-Organizing Gaming Communities
•
Player Behavior
Asterix B-tree
How to Analyze? Ecosystems of
Big-Data Programming Models
Dremel Service Tree
SQL JAQL Hive Pig
MapReduce Model Algebrix
PACT MPI/ Erlang L F S
Nephele Hadoop/ Dryad Hyracks
YARN Haloop DryadLINQ Scope Pregel HDFS AQL CosmosFS Azure Engine Tera Data Engine
Adapted from: Dagstuhl Seminar on Information Management in the Cloud,
http://www.dagstuhl.de/program/calendar/partlist/?semnr=11321&SUOG Azure Data Store Tera Data Store Storage Engine Execution Engine Voldemort High-Level Language Programming Model GFS BigQuery Flume Flume Engine S3 Dataflow Giraph Sawzall Meteor
January 25, 2014
55
Benchmarking suite
Platforms and Process
•
Platforms
•
Process
•
Evaluate baseline (out of the box) and tuned performance
•
Evaluate performance on fixed-size system
•
Future: evaluate performance on elastic-size system
•
Evaluate scalability
YARN
Giraph
Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis
January 25, 2014
56
BFS: results for all platforms, all data sets
•
No platform can runs fastest of every graph
•
Not all platforms can process all graphs
•
Hadoop is the worst performer
Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis
January 25, 2014
57
Giraph: results for
all algorithms, all data sets
•
Storing the whole graph in memory helps Giraph perform well
•
Giraph may crash when
graphs
or messages
become larger
Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis
January 25, 2014
58
Conclusion and ongoing work
•
Performance is f(Data set, Algorithm, Platform, Deployment)
•
Cannot tell yet which of (Data set, Algorithm, Platform) the
most important (also depends on Platform)
•
Platforms have their own drawbacks
•
Some platforms can scale up reasonably with cluster size
(horizontally) or number of cores (vertically)
•
Ongoing work
•
Benchmarking
suite
•
Build a performance boundary model
•
Explore performance variability
Guo, Biczak, Varbanescu, Iosup, Martella, Willke. How Well do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis.IPDPS
http://bit.ly/10hYdIU
DotA communities
•
Players are loosely organised in communities
•Operate game servers•Maintain lists of tournaments and results
•Publish statistics and rankings on websites
•Dota-League: players join a queue and matchmaking forms teams •DotAlicious: players can choose which match/team to join R. van de Bovenkamp, S. Shen, A. Iosup, F. A. Kuipers:
Understanding and recommending play relationships in online social gaming. COMSNETS 2013: 1-10
Relationships in the gaming graph
•
Players who regularly play together in DotAlicious do
so in more diverse combinations than in Dota-League
•
Contrary to Dota-League, DotAlicious players tend to
play on the same side: playing together intensifies the
social bond
•
Winning together increases friendship relationships,
while loosing together weakens friendship relationships
•
Small clusters of friends with very strong social ties
exist
R. van de Bovenkamp, S. Shen, A. Iosup, F. A. Kuipers: Understanding and recommending play relationships in online social gaming. COMSNETS 2013: 1-10
Matchmaking application
•
Replay match list, but also consider clusters in gaming graph
•
Scoring methodology:
•
Points per cluster: Number of players in the match that
are part of the same cluster
•
Excluding largest cluster of the network and clusters of size 1
Player Cluster a 1 b 2 c 1 d 3 e 4 Player Cluster f 2 g 5 h 3 i 6 j 3 Team 1 Team 2 Cluster Points 1 2 2 2 3 3 4 0 5 0Total of 7 pointsfor this match R. van de Bovenkamp, S. Shen, A. Iosup, F. A. Kuipers:
Understanding and recommending play relationships in online social gaming. COMSNETS 2013: 1-10
Results matchmaking
Can already improve original matchmaking algorithm
for all gaming graphs!
Dota-League DotAlicious
R. van de Bovenkamp, S. Shen, A. Iosup, F. A. Kuipers: Understanding and recommending play relationships in online social gaming. COMSNETS 2013: 1-10