Scalable, Fault-tolerant
Management of Grid Services:
Application to Messaging
Middleware
Harshawardhan Gadgil
Talk Outline
n
Use Cases and Motivation
n
Architecture
n
Handling Consistency and Security Issues
n
Performance Evaluation
n
Application: Managing Grid Messaging
Middleware
n
Conclusion
Gri
Large Number of Distributed Resources
n
Applications
distributed and
composed of a large
number and type
(hardware,
software) of
resources
n
Components widely
dispersed and
Sensor Grid
*
Exampl
Audio Video Conferencing
n
GlobalMMCS project, which uses
NaradaBrokering as a event
delivery substrate
n
Consider a scenario where there is
a teacher and 10,000 students.
One way is to form a TREE shaped
hierarchy of brokers
n
One broker can support up to 400
simultaneous video clients and
1500 simultaneous audio clients
with acceptable quality*. So one
would need (10000 / 400 ā 25
ā¦ ā¦ ā¦ ā¦ 400 participants 400 participants 400 participants A single participant
Definition
Use of term: Resource
n
Consider a Digital Entity on the network
n Specific case where this entity can be controlled by modest
external state
n Can be captured via a few messages (typically 1)
n The digital entity in turn can bootstrap and manage components
that may require more state.
n
Thus, Digital entity + Component =
Manageable
Resource
n Thus could be hardware or software (services)
n Henceforth, we refer to Service being managed as a Manageable
Definition
What is Management ?
n
Resource Management ā Maintaining Systemās ability to
provide its specified services with a prescribed QoS
n
Management Operations
*
include
n Configuration and Lifecycle operations (CREATE, DELETE)
n Handle RUNTIME events
n Monitor status and performance
n Maintain system state (according to user defined criteria)
n
This thesis addresses:
n Configuring, Deploying and Maintaining Valid Runtime
Configuration
n Crucial to successful working of applications
Existing Systems
n
Distributed Monitoring frameworks
n NWS, Ganglia, MonALISA
n Primarily serve to gather metrics (which is one aspect of resource
management, as we defined)
n
Management Frameworks
n SNMP ā primarily for hardware (hubs, routers)
n CMIP ā Improved security & logging over SNMP
n JMX ā Managing and monitoring for Java applications
n WBEM ā System management to unify management of distributed
computing environments
n
Management systems not-interoperable ā Move to Web
Services based management of resources
n XML based interactions that facilitate implementation in different
languages, running on different platforms and over multiple transports
Motivation:
Issues in Managementn
Resources
must
meet
n General QoS and Life-cycle features
n (User defined) Application specific criteria
n Improper management such as wrong configuration ā major cause
of service downtime
n
Large number of widely dispersed Resources
n Decreasing hardware cost => Easier to replicate for fault-tolerance
(Espl. Software replication)
n Presence of firewalls may restrict direct access to resources
n
Resource specific management systems have evolved
independently (different platform / language / protocol)
n Requires use of proprietary technologies
n
Central management System
Desired Features of the Management Framework
n
Fault Tolerance
n
Failures are Normal, Resources may fail, but so also
components of the management framework.
n
Framework MUST recover from failure
nScalability
n
With Growing Complexity of application, number of
resources (application components) increase
n E.g. LHC Grid consists of a large number of CPUs, disks and mass storage servers (on the order of ~ 30K)
n
In future, much larger systems will be built
n
MUST cope with large number of resources in terms of
n
Performance
n Initialization Cost, Recovery from failure, Responding to run-time
events
n
Interoperability
n Services exist on different platforms, Written in different languages,
managed using system specific protocols and hence not INTEROPERABLE
n Framework must implement interoperable protocols such as based
on Web-Service standards
n
Generality
n Management framework must be a generic framework.
n Should be applicable to any type of resource (hardware / software)
n
Usability
n Autonomous operation (as much as possible)
Architecture
n
We assume resource specific external state
to be maintained by a Registry (assumed
scalable, fault-tolerant by known
techniques)
n
We leverage well-known strategies for
providing
n
Fault-tolerance (E.g. Replication, periodic
check-pointing, request-retry)
n
Fault-detection (E.g. Service heartbeats)
Management Architecture built in terms of
n
Hierarchical Bootstrap System
n
Resources in different domains can be managed with
separate policies for each domain
n
Periodically spawns a
System Health Check
that
ensures components are up and running
n
Registry for metadata
(distributed database) ā
Robust by standard database techniques and our
system itself for Service Interfaces
n
Stores resource specific information (User-defined
configuration / policies, external state required to
properly manage a resource)
n
Generates a unique ID per instance of registered
component
n
Messaging Nodes
form a scalable messaging substrate
n Provides transport protocol independent messaging between
components
n Can provide Secure delivery of messages
n In our case, we use NaradaBrokering Broker as a messaging node
n
Managers
ā Active stateless agents that manage
resources.
n Since they donāt maintain state, hence robust
n Actual management functions are performed by a Resource
specific manager component
n
Resources
ā what you are managing
n Wrapped by a Service Adapter which provides a Web Service
interface.
n Service Adapter connects to messaging node to leverage transport
independent publish subscribe communication with other components
Architecture:
Scalability: Hierarchical distribution
ROOT
US EUROPE
CG
Active Bootstrap Nodes
/ROOT/EUROPE/CARDIFF
ā¢Always the leaf nodes in the hierarchy ā¢Responsible for maintaining a working set of management components in the
Passive Bootstrap Nodes
Only ensure that all child bootstrap nodes are always up and running
ā¦
Spawns if not present and ensure up and running
Architecture:
Conceptual Idea (Internals)
Manager processes periodically checks available resources to
manage. Also Read/Write resource specific external state
from/to registry
Always ensure up and running
Always ensure up and running
Architecture:
User Component
n
āResource Characteristicsā are determined by the user.
n
Events
generated by the Resources are handled by the
manager
n
Event processing is determined by via
WS-Policy
constructs
For e.g., Automatically instantiate a failed resource
instance
<pol:Policy xmlns:pol=http://schemas.xmlsoap.org/ws/2004/09/policy
xmlns:pol1="http://www.hpsearch.org/schemas/2006/07/policy"> <pol:All>
<pol1:AUTOInstantiate
forkProcessLocator="udp://156.56.104.152:65535"/> </pol:All>
</pol:Policy>
n
Managers can set up services
n Writing information to registry can be used to start up a set of
Issues in the distributed syste
Consistency
n
Examples of inconsistent behavior
n Two or more managers managing the same resource n Old messages / requests reaching after new requests
n Multiple copies of resources existing at the same time / Orphan
Resources leading to inconsistent system state
n
Use a Registry generated monotonically increasing Unique
Instance ID (IID) to distinguish between new and old
instances
n Requests from manager A are considered obsolete IF IID(A) <
IID(B)
n Service Adapter stores the last known MessageID (IID:seqNo)
allowing it to differentiate between duplicates AND obsolete messages
n Service adapter periodically renews with registry
n IF IID(ResourceInstance_1) < IID(ResourceInstance_2)
n THEN ResourceInstance_1 is OBSOLETE
Issues in the distributed syste
Security
*
NB-Topic Creation and Discovery - Grid2005 / IJHPCNhttp://grids.ucs.indiana.edu/ptliupages/publications/NB-TopicDiscovery-IJHPCN.pdf
n
NaradaBrokeringās Topic Creation and Discovery
*
and Security Scheme
#addresses
n
Message level security
n
Provenance, Lifetime, Unique Topics
nSecure Discovery of endpoints
n
Prevent unauthorized access to services
n
Prevent malicious users from modifying message
Implemented:
n
Management framework
n
Management of NaradaBrokering Brokers
nReleased with NaradaBrokering in Feb 2007
nWS ā Specifications
n WS ā Management (could use WS-DM) -June 2005 parts (WS ā
Transfer [Sep 2004], WS ā Enumeration [Sep 2004]) and WS ā Policy[Sep 2004], SOAP v 1.2 (needed for WS-Management)
n
WS ā Eventing
(Leveraged from the WS ā Eventing
support in NaradaBrokering)
Performance Evaluatio
Measurement Model ā Test Setup
n Multithreaded manager
process - Spawns a Resource specific management thread (A single manager can
manage multiple different types of resources)
n Limit on maximum resources
that can be managed
n Limited by maximum threads
per JVM possible (memory constraints)
n Theoretically ~1800
resources (refer: Thesis writeup)
Performance Evaluatio
Performance Evaluatio
Results
n Scenario illustrating a case with multiple concurrent events
n Response time increases with increasing number of concurrent requests
n Response time is
RESOURCE-DEPENDENT
and the shown times are illustrative
n Increases rapidly as no. of requests > 210
n MAY involve dependency on external services such as Registry access which will increase overall
Performance Evaluatio
Performance Evaluatio
Research Question: How much infrastructure is required to manage N resources ?
n N = Number of resources to manage
n M = Max. no. of resources that connect to a single messaging node n D = Maximum concurrent requests that can be processed by a single
manager process before saturating
n For analysis, we set this as the number of resources assigned per manager
n R = min. no. of registry service components required to provide
desired level of fault-tolerance
n Assume every leaf domain has 1 messaging node. Hence we have N/M
leaf domains
n Further, No. of managers required per leaf domain is M/D
n Other passive bootstrap nodes are not counted here since << N n Total Components in lowest level
= (R registry + 1 Bootstrap Service + 1 Messaging Node + M/D Managers) * (N/M such leaf domains)
Performance Evaluatio
Research Question: How much infrastructure is required to manage N resources ?
n Thus percentage of additional infrastructure is
= [(2 + R + M/D)*N/M] * 100 / N % = [(2 +R)/M + 1/D] * 100 %
n A Few Cases
n If, D = 200, M = 800 and R = 4, then Additional Infrastructure
= [(2+4)/800 + 1/200] * 100 % ā 1.2 %
n Shared Registry then there is one registry interface per domain, R =
1, then Additional Infrastructure
= [(2+1)/800 + 1/200] * 100 % ā 0.87 %
n If NO messaging node is used (assume D = 200), then Additional
Infrastructure
= [(R registry + 1 bootstrap node + N/D managers)/N] * 100 % = [(1+R)/N + 1/D] * 100 %
ā 100/D % (for N >> R)
Performance Evaluatio
Prototype:
Managing Grid Messaging Middleware
n
We illustrate the architecture by managing the distributed
messaging middleware: NaradaBrokering
n This example motivated by the presence of large number of
dynamic peers (brokers) that need configuration and deployment in specific topologies
n Runtime metrics provide dynamic hints on improving routing which
leads to redeployment of messaging system (possibly) using a different configuration and topology or use (dynamically)
optimized protocols (UDP v TCP v Parallel TCP) and go through firewalls
n
Broker Service Adapter
n NB illustrates an electronic entity that didnāt start off with an
administrative service interface
n So add wrapper over the basic NB BrokerNode object that provides
WS ā Management front-end
Performance Evaluatio
XML Processing Overhead
n
XML Processing overhead is measured as the total
marshalling and un-marshalling time required including
validation against schema
n
In case of Broker Management interactions, typical
processing time (includes validation against schema) ā 5
ms
n Broker Management operations invoked only during initialization
and failure from recovery
n Reading Broker State using a GET operation involves 5ms overhead
and is invoked periodically (E.g. every 1 minute, depending on policy)
n Further, for most operation dealing with changing broker state,
Prototype:
Observed Operation
Costs (Individual Resources ā Brokers)Time (msec) (average values)
Initialized
(Later modifications)
104 Ā± 2 160 Ā± 2 610 Ā± 6 778 Ā± 5
Un-Initialized (First time)
20 Ā± 1 Delete Link
27 Ā± 2 Create Link
57 Ā± 2 Create Broker
33 Ā± 3 Set Configuration
Recovery
Estimated Recovery Cost per broker
Max:
20 + 778 + 610 + 160 + 2* 27
= 1622 msec
Min:
5 + 778 + 610
= 1393 msec
N nodes, Links per broker vary from 0 ā 3
1 ā 4 Resource Cluster
10 + (778 + 610 + 160)
= 1558 msec
N nodes, N links (1 outgoing link per Node)
2 Resource Objects Per node
Ring
Recovery Time
= T(Read State From Registry) + T(Bring Resource up to speed)
= T(Read State) + T[SetConfig + Create Broker + CreateLink(s)]
Number of Resource specific Configuration
Prototype:
Observed
Recovery Cost per Broker
1421 Ā± 9 8 Ā± 1 2362 Ā± 18
Average (msec)
Restore (1 Broker + 1 Link) Read State
*Spawn Process
Operation
Time for Create Broker depends on the number & type of transports
opened by the broker
E.g. SSL transport requires negotiation of keys and would require more time than simply establishing a TCP connection
1616 Ā± 82
Contributions
n
Designed and implemented a Resource Management
Framework
n Scalable to manage large number of resources n Tolerant to failures in framework itself
n Can handle failures in managed resources via user defined policies
n
We have shown that Management framework can be built
on top of a publish subscribe framework to provide
transport independent messaging between framework
components
n
Implemented Web Service Management to manage
resources
n
Detailed evaluation of the system components to show that
the proposed architecture has acceptable costs
n
Implemented Prototype to illustrate management of a
Future Work
n
Apply the framework to
broader domains
n
Investigate application of architecture to
resources with significant runtime state that
needs to be maintained
n
Higher frequency and size of messages
n
XML processing overhead becomes significant
n
Investigate strategies to distribute framework
components (load balance) considering factors
such as locality of resources and runtime
Publications
n On the presented work:
n Scalable, Fault-tolerant Management in a Service Oriented Architecture
Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara, Marlon Pierc Accepted as poster in HPDC 2007
n Managing Grid Messaging Middleware
Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara, Marlon Pierc
In Proceedings of āChallenges of Large Applications in Distributed Environmentsā (CLADE 2006), pp. 83 - 91, June 19, 2006, Paris, France
n A Retrospective on the Development of Web Service Specifications
Shrideep Pallickara, Geoffrey Fox, Mehmet Aktas, Harshawardhan Gadgil, Beytullah Yildiz, Sangyoon Oh, Sima Patel, Marlon Pierce and Damodar Yemme
Chapter in Book Securing Web Services: Practical Usage of Standards and Specifications
n On NaradaBrokering:
n On the Secure Creation, Organization and Discovery of Topics in
Distributed Publish/Subscribe systems
Shrideep Pallickara, Geoffrey Fox, Harshawardhan Gadgil
(To Appear) International Journal of High Performance Computing and Networking (IJHPCN), 2006. Special Issue of extended versions of the 6 best papers at the ACM/IEEE Grid 2005 Workshop
n On the Discovery of Brokers in Distributed Messaging Infrastructure
Shrideep Pallickara, Harshawardhan Gadgil, Geoffrey Fo
Publications (contdā¦)
n On NaradaBrokering (contdā¦):
n On the Discovery of Topics in Distributed Publish/Subscribe systems
Shrideep Pallickara, Geoffrey Fox, Harshawardhan Gadgil
In Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing Grid 2005 Conference, pp. 25-32, Seattle, WA (Selected as one of six Best Papers)
n A Framework for Secure End-to-End Delivery of Messages in
Publish/Subscribe Systems
Shrideep Pallickara, Marlon Pierce, Harshawardhan Gadgil, Geoffrey Fox, Yan Yan, Yi Huang, (To Appear) In Proceedings of āThe 7th IEEE/ACM International Conference on Grid Computingā (Grid 2006), Barcelona, September 28th-29th, 2006
n HPSearch, GIS and Misc:
n A Scripting based Architecture for Management of Streams and Services in
Real-time Grid Applications
Harshawardhan Gadgil, Geoffrey Fox, Shrideep Pallickara, Marlon Pierce, Robert Granat, In Proceedings of the IEEE/ACM Cluster Computing and Grid 2005
Conference, CCGrid 2005, Vol. 2, pp. 710-717, Cardiff, UK
n Building Messaging Substrates for Web and Grid Applications
Geoffrey Fox, Shrideep Pallickara, Marlon Pierce, Harshawardhan Gadgi
Publications (contdā¦)
n HPSearch, GIS and Misc (contdā¦):
n Building and Applying Geographical Information System Grids
Galip Aydin, Ahmet Sayar, Harshawardhan Gadgil, Mehmet S. Aktas, Geoffrey C. Fox, Sunghoon Ko, Hasan Bulut,and Marlon E. Pierce, 12 January 2006, Special Issue on Geographical information Systems and Grids based on GGF15 workshop,
Concurrency and Computation: Practice and Experience
n SERVOGrid Complexity Computational Environments (CCE) Integrated
Performance Analysis