IaaS-Clouds in the MaDgIK Sky

(1)

IaaS-Clouds in the MaDgIK Sky



Konstantinos Tsakalozos



PhD candidate

(2)

Research Topics

1.Nefeli: Hint based deployment of virtual

infrastructures

2.How profit maximization drives resource

allocation in highly scalable infrastructures

3.MigrateFS, towards a true share nothing cloud

4.Tackle cloud's heterogeneity

(3)

Nefeli, VM placement



The Idea behind Nefeli:



The Virtual Infrastructure consumer/user is aware of

operation and data flows among VMs. Can we

harvest this information to tackle performance

bottlenecks?



BUT: The physical cloud infrastructure must

(4)

Interfacing with Nefeli



The consumer/user expresses a set of

constraints/hints describing an ideal

deployment



Nefeli takes these user constraints/wishes

under consideration when VMs are mapped to

physical machines (PMs)



Consider VMs holding Database replicas. They

have to be deployed on different PMs.



Consider VMs producing excessive network traffic.

(5)

Constraints



User constraints



VMs to be co-deployed, spread

across physical machines (PM),

favored against others, data gravity



Administrative constraints



Offload a PM, Power save



Solver: Simulated annealing

(6)

Runtime Interaction



The consumer/user expresses a set of states

for her infrastructure. These states “activate”

different constraints.



States are “trapped”. Nefeli migrates VMs to

accommodate user wishes



Active hints may change over time offering a

(7)

Nefeli vs other placement policies



Simulation measuring the end node throughput



Random VM placement, Balanced VM placement,

(8)

Nefeli in a real cloud

Nefeli achieves a 17% improvement on the time required to

have video and audio transcoding complete, compared to

default OpenNebula 1.2.

(9)

2. Resource allocation in highly

scalable infrastructures



Highly scalable frameworks:



The more resources consumed the higher the

performance



Scale linearly?



Clouds, seemingly endless resources



Performance guaranties?

How many resources (eg, Satelites, VMs)

should we use for a scalable infrastructure?

(10)

Clouds... It is all about money



Cost: Pay for the resources you consume.



Revenue: Sell products coming form the processing taking

place within the cloud



Budget Function: Response time to revenue



Pay more -> Reduce response time -> Increase your

(11)

Finding the maximum profit point

Max profit B changes at runtime.

Why?



Some cloud resources are shared

among users (Disk, Net I/O, CPU)



Workloads (processing time)

change based on input

To specify B’ we assume re-occurring user’s

workloads

•

DB loads Day-Night,

•

Index updates

(12)

Finding the maximum profit point

Re-occurring user workload:



In each iteration compute MR

and MC



We increase or decrease the

size number of VMs used

accordingly so as MR == MC

B’ “too far away” from B:

•

increase/decrease VMs exponentially

When B’ close to B:

(13)

Applications - Evaluation

Used by the cloud provider



Cost: cloud’s operational cost,



Revenue: per VM

Used by each consumer separately



Revenue: the degree of satisfaction the service

offers



Resources shared proportionally to the money

(14)

Evaluation - Two users

Evaluated using



Real infrastructures elastic Hadoop/Condor



Simulated for large infrastructures

•

A single user computing Pi

over and over again

•

Exponential and linear VM

adjustments

•

Second user entering the

(15)

3. A true share nothing cloud



Suspend/resume VM migration is a show

stopper for load balancing



You must have shared storage facilities



Shared storage is:



A single point of failure



Performance bottleneck



Clouds are based on commodity hardware to

(16)

Migrate FS. Why?



Distributed file systems:



Scaling issues



Have relaxed semantics



Offer much more than what clouds need



Migration operation



Sync VM disk image between target and source PM



Sync VM RAM between target and source PM



Instantly suspend VM form source and resume it to the

(17)

Migrate FS prototype

Two modes of operation:



“I need to move VM v from PM A to PM B in less

than t seconds”



“I need to move VM v from PM A to PM B with

guaranteed VM I/O performance”



Respect SLAs



At any time you can get an estimate on the time

the migration will take (depends on the I/O load

of the VM)

(18)

4. Handling Heterogeneity

How we dealt with hetogeneity



Organize physical nodes into ”sites”



Specialy crafted VMs to boot in multiple ”sites”



Univeral instantiation configuration schema

Heterogeneity: a challenge



Sky computing: Cloud of clouds

(19)

Load Balancing in IaaS-Clouds

Load balancing through VM migration



Live migration: almost no downtime



Copy RAM while the VM in online



Requirement: PMs share storage, compatible

hypervisors



Suspend-resume: have to copy memory and disk

content before resuming

Load balancing is itself a costly (time &

resources) operation

(20)

VM Scheduling - Placement



Physical,Virtual infrastructure properties



Resource availability, VM requirements (CPU, RAM,

network)



Topology: “distance” from repositories, neighboring nodes



Future load balancing prospects



User provided hints/constraints



System properties: Compatibility (kernel, virtualization),

Features (high availability, RAID)

(21)

Two Phase VM Scheduling

How to form a site:



Load balancing prospects. Favor site formation among

PMs allowing live migration. When live-migration

enabled nodes not enough allow suspend/resume

migration



Resources of the site must be more than the requested



Site formation is formed as a constraint satisfaction

problem

VM-to-PM mapping is also a constraint satisfaction

problem (Nefeli)

(22)

Elastic Solver

•

Consume resources from the cloud – fill out

underutilized, isolated physical nodes

•

Simulated annealing easily parallelizable through

simultaneous executions

•

More resources better site formation and VM-to-PM

(23)

Results?

Reduction of the search space yields:



Improvements in the time consumed



No degradation in the VM-to-PM quality when

(24)

Related work

 [Tsak11] K. Tsakalozos, H. Kllapi, E. Sitaridi, M. Roussopoulos, D. Paparas and A. Delis,

“Flexible Use of Cloud Resources through Profit Maximization and Price Discrimination”, ICDE 2011 Hannover, Germany, April 2011.

 [Tsak10] K. Tsakalozos, M. Roussopoulos, V. Floros and A. Delis, “Nefeli: Hint-based Execution

of Workloads in Clouds”, ICDCS 2010, Genoa, Italy, June 2010.

 [TsakF]K. Tsakalozos, M. Roussopoulos, and A. Delis, “VM Placement in non-Homogeneous

IaaS-Clouds”, under review.

 J. O. Kephart and D. M. Chess, “The Vision of Autonomic Computing”, IEEE–Computer, vol. 36,

no. 1, pp. 41–50, 2003.

 K. Lee, N. Paton, R. Sakellariou, and A. Fernandes, “Utility Driven Adaptive Workﬂow

Execution,” in Proc. of the 2009 9th IEEE/ACM Int. Symposium on Cluster Computing and the Grid, Shanghai, PR China.

 J. O. Kephart and R. Das, “Achieving Self-Management via Utility Functions,” IEEE Internet

Computing 2007.

 D. Grosu and A. Das, “Auctioning resources in Grids: model and protocols: Research Articles,”

(25)

Related work

 K. Subramoniam, M. Maheswaran, and M. Toulouse, “Towards a MicroEconomic Model for

Resource Allocation,” in In IEEE Canadian Conference on Electrical and Computer Engineering. IEEE Press, 2002.

 H. R. Varian, Intermediate Microeconomics : A Modern Approach, 7th ed. W. W. Norton and

Company, Dec. 2005, ch. 25, Monopoly

 Yingwei Luo, Binbin Zhang, Xiaolin Wang, Zhenlin Wang, Yifeng Sun, Haogang Chen, "Live and

incremental whole-system migration of virtual machines using block-bitmap," Cluster

Computing, 2008 IEEE International Conference on , vol., no., pp.99-106, Sept. 29 2008-Oct. 1 2008

 Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, and Harald Schioberg. 2007. Live

wide-area migration of virtual machines including local persistent state. In Proceedings of the 3rd international conference on Virtual execution environments (VEE '07).

 Keahey, K., Tsugawa, M., Matsunaga, A., Fortes, J., , "Sky Computing," IEEE Internet

Computing, Sept.-Oct. 2009

 F. Hermenier, X. Lorca, J.-M. Menaud, G. Muller, and J. Lawall, “Entropy: a consolidation

manager for clusters,” in Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual Execution Environments, ser. VEE ’09.