DDS-Enabled Cloud Management Support for
Fast Task Offloading
IEEE ISCC 2012, Cappadocia Turkey
Antonio Corradi
1
Luca Foschini
1
Javier Povedano-Molina
2
Juan M. Lopez-Soler
2
1
Dipartimento di Elettronica, Informatica, e Sistemistica
Universit`
a di Bologna (Italy)
2
Departamento de Teor´ıa de la Se˜
nal, Telem´
atica y Comunicaciones
Universidad de Granada (Spain)
Agenda
Cloud Monitoring and Management
DARGOS
Data-Centric Publish-Subscribe
Architecture
Experimental Results
Testbed Description
Results
Cloud monitoring
Cloud monitoring systems can be categorized:
I
Architectural model: Centralized vs. Decentralized
I
Communication model: Pull vs. Push
Resource monitoring in Clouds
I
A typical approach: centralized pull
I
Central node queries and stores remote resource usage
I
pros: easy to implement
I
cons: central point of failure, request-reply, scalability in N:M
scenarios, support for different update rates, no notifications
Centralized Cloud management
Centralized Cloud management (II)
Types of loads in Clouds
I
Services
I
Long term duration
I
Load is (almost) stable (e.g. Web server, Databases, ...)
I
Tasks
I
Short duration (from seconds to few minutes)
ILoad of each task is unknown a priori
Cloud resource monitoring in dynamic scenarios
I
Short-mid tasks with dynamic load
I
Bag of Tasks (BoT)
IMedia transcoding
IComputation offloading
I
Require an accurate and reliable snapshot of resources
available (real-time update)
I
CPU load, memory usage, system load, hypervisor,...
I
Different goals: maximize throughput, minimize power
DARGOS
I
Distributed Architecture for Resource manaGement and
mOnitoring in cloudS
I
A distributed monitoring system
I
“Argos Panoptes“: Argos the ”100 eyed“ guardian
I
Uses a Publish Subscribe approach
I
Used to collect real time monitoring data for taking
scheduling decisions
DCPS
Data-Centric Publish-Subscribe
I
Entities share a data model instead using interfaces
I
Producers publish data conforming this data model
I
Subscribers receive data matching their interests
I
Publishers and subscribers are decoupled in space and time
DCPS
Data Distribution Service (DDS)
I
OMG Specification for Data-Centric Publish-Subscribe
I
Data model
IWire protocol
I
Entities exchange Topics (e.g. temperature, 2D position, ...)
I
Topics are defined by their name and data type
I
Topic samples can contain key data to identify them
I
Publishers pushes Topic updates into Subscribers local cache
I
QoS control and management
I
Partition mechanisms
I
Unicast and multicast support
I
Adopted in time critical systems (avionics, stock exchange
quotations,...)
DCPS
Architecture
DARGOS Entities
I
DARGOS has two kinds of entities:
I
Node Monitoring Agent (NMA): collect and publishs local
resource usage
I
Installed at each node (e.g. CPU, system load, memory,...)
I1 resource, 1 topic
I
Cloud Monitoring Supervisor (CMS): interested in remote
monitoring data
I
Discovers and subscribes remote resources
I
Define their own requirements (reliability, acceptable
deadlines)
I
Installed in every application interested in resource data
(schedulers, dashboard,...)
Architecture
Architecture
Node Monitoring Agents (NMA)
I
Collect local resource data and publishes as DARGOS Topics
I
DARGOS NMAs have two operation modes:
I
Periodic
I
NMA pushes periodically resource usage information
I
Maximizes Accuraccy
I
Event based
I
NMA pushes resource information under certain conditions
(e.g. resource usage delta exceeds threshold)
Architecture
Periodic vs. Event based
Periodic
I
Period=1 second
I
Samples published=10
Event-based
I
Samples sent when usage
changes range
Architecture
Cloud Monitoring Supervisor (CMS)
I
CMS discovers available nodes and their available sensors
(DARGOS Topics)
I
CMS subscribe to sensor information of interest (CPU,
memory,...)
I
Applications that use CMS: Cloud dashboards, schedulers
I
Each CMS define their own quality of service (QoS)
requirements
I
Reliability or best effort (RELIABILITY)
I
Maximum allowable delay between updates (DEADLINE)
IMaximum refresh rate (TIME BASED FILTER)
I
CMSs establish subscription contracts with NMAs
Architecture
DARGOS-based Cloud management
Testbed Description
Experimental testbed
I
Testbed with DARGOS-enabled OpenStack Cloud
I
DARGOS based OpenStack scheduler
I
Server Consolidation
ILoad balancing
Testbed Description
Testbed description
I
OpenStack Cloud fabric
I
DARGOS enabled scheduler service
I
Three DARGOS enabled compute nodes
I
RTI DDS 4.5d middleware
Results
Results (VM per node)
OpenStack out-of-the-box scheduler
OpenStack DARGOS-based
scheduler (consolidation)
Results