Starting for the cloud
-- two issuses in cluster:
resource allocation and overload
management
Ziyou Wang, Yan Li, Chao You, Minghui Zhou
Peking University
Agenda
Cloud Computing: Challenges
Resource Allocation
Shared cluster
Resource allocation planning
Overload Management
Examples
Automatic degradation mechanism
Cloud Computing: Challenges
The emergence of cloud computing makes it a cost-efficient way
for application providers to lease the computing resources from a
third provider
Benefit: increase resource utilization, improve business agility,
decrease power consumption…
But how to effectively allocate various resources in cloud to
different applications is still an open problem.
When the applications host in the cloud face with overload, which
means the demand on at least one of the cloud’s resources exceeds
the capacity of that resource, what can we do to handle this
situation?
Shared Cluster
Considering one kind of cloud implementation: the workloads of
different web applications are not correlated, a large-scale cluster,
called
shared cluster or data center
, is maintained to host a large
number of applications simultaneously
Each application runs on a subset of nodes
Each node may run multiple applications
Users Enterprises
Resource Allocation: a scenario
As the cluster’s resources are no longer occupied by one
application, it requires the cluster to allocate the resources on
demand
For example
middleware Node 150 app D High-‐throughput low-‐latency network app C An increase of app A,C’s workloadPlace new instances in the data center
re-allocate workload
middleware Node 1
app A app C
Repository Apps … Other nodes Dispatcher Applica>on users middleware Node 16 app B app A middleware Node 99
app B app A
Self-adaptive Resource Allocation
Model
Resource alloca>on
planning
Resource alloca>on
execu>on
Requests
Self-‐adap4ve
resource alloca4on
Our Resource Allocation Work
Middleware
Virtual Machine Monitor VM customized JOnAS app a
…
Resource par>>oner App deployer Dispatcher requests Repository VM customized JOnAS app x Communicator Local valuator Resource alloca>on planning Resource alloca>on execu>on Middleware Resource alloca>on planning … … coopera>on Management Console commands messagesFor the resource allocation planning, we propose a
decentralized resource allocation planning approach
•
Nodes decide their own resource allocation
•
Market-based coordination is adopted to help them
make the resource decision
Until now, the approach is evaluated with a serial of
simulated experiments, and is being implemented in
the cluster with JO
2nAS
Resource Allocation Planning
To support application prioritization, applications can be assign
with the different utility values. Accordingly, the goal of resource
management is to maximize the total utility values of the requests
satisfied
Inspired by human market, we model the shared cluster as a
market, where shares of application requests are treated as goods
and nodes as dealers to exchange goods
Basing on local valuation of the goods, each node autonomously
and continuously trades with others in order to find an application
share combination which fits the node’s resource constrains and
maximize its income
Resource Allocation Planning
When a node wants to sell, more than one node may want to buy.
To make the seller transfer the goods to the appropriate buyers, an
auction mechanism is adopted
1. multicast 4. notify 2.1 valuation 2.1 valuation 2.1 valuation 4. inform (appC, 50%, 100 req/ s) ... Node 1 app A app C Node 50 app A app B Node 65 app B app C Node 100 app B app D ... Nodes app ... ... want C, 35% want C, 20% 2.2 Sell C 30% 2.2 Sell C 20% 3. sort 4 notify N100: … N65: …(app C, 10%) N50: … N1: … (app C, 70%) Dispatcher N100: … N65: …(app C, 30%) N50: … (app C,30%) N1: … (app C, 20%) update (app C , 30% to n50, 20% to n65) middleware middleware middleware middleware middleware
Our Resource Allocation Work
Middleware
Virtual Machine Monitor VM customized JOnAS app a
…
Resource par>>oner App deployer Dispatcher requests Repository VM customized JOnAS app x Communicator Local valuator Resource alloca>on planning Resource alloca>on execu>on Middleware Resource alloca>on planning … … coopera>on Management Console commands messagesFor the resource allocation execution
•
Integrate a VMM into the middleware
•
Automatically load the app and partition the resource at
runtime via VMM
•
Customize JOnAS for the app, and store the customized
image in the repository
•
Proportionally workload dispatching
Now, we use Open VZ, a lightweight OS level VMM, as a
case study, and are trying to integrate OpenVZ into the
middleware
Agenda
Cloud Computing: Challenges
Resource Allocation
Shared cluster
Resource allocation planning
Overload Management
Examples
Automatic degradation mechanism
Examples
On September 11th 2001, for instance, the workload on a
popular news web site increased by an order of magnitude in
30 min, with the workload doubling every 7 min in that
period.
April 21th 2010, is the China National Mourning for Yushu
Quake Victims. Theatre and sporting performances are
cancelled, karaoke bars shut and the culture ministry has
ordered suspension of all online music, games, comics, films
and TV shows.
When overload happens?
Overload prevention
is a critical goal so that a system can remain
operational in the presence of overload even when the incoming
request rate is several times greater than the system’s capacity.
It is well known that the workload seen by Internet applications
varies over multiple time-scales and often in an unpredictable
fashion.
Unexpected things are always happening:
Featured on national television or in a major newspaper.
The TaoBao Architecture
Apache + Application Server + MySQL
200+ applications, thousands of components
12k servers
2k~3k java servers
Search
Product
Browsing
Product Recommendation
Shop Cart
The Reality – Manual Service
Degradation
In response to overload:
CNN replaced its front page with simple HTML page that could
be transmitted in a single Ethernet packet .
Taobao turned off a sub system.
All these techniques are implemented
manually
, though a better
approach would be to degrade service gracefully and automatically
in response to load.
Which point causes overload?
Which resource is the bottleneck?
Which service should be degraded or turned off?
Automatic Degradation Mechanism
Overload Priority defines the priorities of different services and
degradation actions can be taken.
Overload Detection is responsible for signaling the occurrence of
instable status of the application.
Overload Localization is triggered to locate the bottleneck of resources.
Overload Controller will take appropriate actions to degrade some
unnecessary services to release more resources to support key services.
Mechanism Overload Detection Overload Localization Overload Controller Performance Metrics Degradation Actions -Applications Service Service Service Service Service Overload Priority
Automatic Application Degradation
Cluster level degradation
Coarse-grained
Sub-system level degradation
Resource management
Service differentiation
Node level degradation
Fine-grained
Component level degradation
Considerations
Hard to be transparent to the user ( what can de degraded?
sometimes how?)
Using it alone can contribute to delay overload, but it needs to be
combined with other techniques to be fully effective.
Dynamic resource allocation
Admission control
Service differentiation