Managing Dynamic
Metadata and Contex
Mehmet S. Aktas
2
Context as Service Metadata
p
Context can be
n
interaction-independent
pslowly varying, quasi-static service metadata
n
interaction-dependent
pdynamically generated metadata as result of
interaction of services
pinformation associated to a single service, or a
session (service activity) or both
p
Dynamic Grid/Web Service Collections
n
assembled to support a specific task
n
can be workflow and audio/video collaborative
sessions
n
generate metadata and have limited life-time
Motivating Cases
pMultimedia Collaboration domain
nGlobal Multimedia Collaboration System- Global MMCS
provides A/V conferencing system.
ncollaborative A/V sessions with varying types of metadata such
as real-time metadata describing audio/video streams
ncharacteristics: widely distributed services, metadata of events
(archival data), mostly read-only
pWorkflow-style applications in GIS/Sensor Grids
nPattern Informatics (PI) is an earthquake forecasting
system.
nsensor grid data services generates events when a certain
magnitude of event (such as fault displacement) occurs
nfiring off various services: filtering, analyzing raw data,
generating images, maps
ncharacteristics: any number of widely distributed services can
4
1
WMS GUI WFS
http://..../..../..txt HP Search Data Filter PI Code Data Filter http://..../..../tmp.xml Context Information Service 2 5,6,7 8 4 3,9 <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../WMS</ context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> session <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content>
</context>
user profile
<context xsd:type="ContextType"timeout=“100">
<context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content>
<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> activity <context xsd:type="ContextType"timeout=“100"> <context-id>http://../abcdef:012345<context-id/> <context-service>http://.../HPSearch</ context-service> <content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> shared state
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3..."> <soap:Header encodingStyle=“WSCTX URL"
mustUnderstand="true">
<context xmlns=“ctxt schema“ timeout="100"> <context-id>http..</context-id>
<context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list
mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service>
<p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> ... SOAP header for Context
•session associated dynamic metadata
•user profile
•activity associated dynamic metadata
•service associated dynamically generated metadata What are the examples of dynamically generated
metadata in a real-life example?
3,4: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id
5,6,7:HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result
8:HPSearch writes the URI of the of the output file into Context
9:WMS polls the information from Context Service
10: WMS retrieves the generated output file by workflow script and generates a map
<context xsd:type="ContextType"timeout=“100">
<context-service>http://.../HPSearch</ context-service>
<content> HPSearch associated additional data generated during execution of workflow. </content>
</context>
Practical Problem
p
We need a Grid Information Service for managing
all information associated with services in Gaggles
for;
n
correlating activities of widely distributed services
pworkflow style applications
n
management of events in multimedia
collaboration
pproviding information to enable
§real-time replay/playback
§session failure recovery
n
enabling uniform query capabilities
p“Give me list of services satisfying C:{a,b,c..} QoS
6
Motivations
p
Managing small scale highly dynamic
metadata as in dynamic Grid/Web Service
collections
p
Performance limitations in point-to-point
based service communication approaches for
managing stateful service information
p
Lack of support for uniform hybrid query
capabilities to both static and dynamic context
information
p
Lack of support for adaptation to
instantaneous changes in client demands
p
Lack of support for distributed session
Research Issues I
p
Performance
n
Efficient mediator metadata strategies for service
communication: high performance and persistency
p
Efficient access request distribution
n
How to choose a replica server to best serve a client
request?
n
How to provide adaptation to instantaneous changes
in client demands?
p
Fault-tolerance
n
High availability of information
8
Research Issues II
p
Consistency
n
Provide consistency across the copies of a replica
p
Flexibility
n
Accommodating broad range of application
domains, such as read-dominated, read/write
dominated
p
Interoperability
n
Being compatible with wide range of applications
nProviding data models and programming interfaces
pto perform hybrid queries over all service metadata pto enable real-time replay/playback or session
Proposed System
Hybrid WS-Context Service
p
Fault tolerant and high performance Grid
Information Service
nCaching module
nPublish/Subscribe for
fault tolerance, distribution, consistency enforcement
nDatabase backend and Extended UDDI Registry
p
WS-I compatible uniform programming
interface
nSpecification with abstract data models and
programming interface which combines WS-Context and UDDI in one hybrid service to manage service metadata
nHybrid functions operate on both metadata spaces nExtended WS-Context functions operate on session
metadata
nExtended UDDI functions operate on
10
Distributed HYBRID Grid Information Services
Subscriber Publisher
Replica Server-2 Replica Server-N
Topic Based Publish-Subscribe Messaging System HTTP(S) WSDL Client WSDL Client WSDL WSDL HYBRID
Detailed architecture of the system
Client WSD L
HTTP(S )
Ext-UDDI WS-Context
Access
WSDL
JDBC Handlers
Expeditor Querying
Publishingand
Storage Sequencer
12
Key Design Features
p
External Metadata Service
nExtended UDDI Service for handling
interaction-independent metadata
p
Cache
nIntegrated Cache for all service metadata
p
Access
nRedirecting client request to an appropriate replica
server
p
Storage
nReplicating data on an appropriate replica server
p
Consistency enforcement
Extended UDDI XML Metadata
Service
p
An extended UDDI XML Metadata Service
nAlternative to OGC Web Registry Services
p
It supports different types of metadata
nGIS Metadata Catalog (functional metadata) nUser-defined metadata ((name, value) pairs)
p
It provides unique capabilities
nUp-to-date service registry information (leasing) nDynamic aggregation of geospatial services
p
It enables advanced query capabilities
nGeo-spatial queries
14
TupleSpaces Paradigm and JavaSpaces
p
TupleSpaces [Gelernter-99]
na data-centric asynchronous communication paradigm ncommunication units are tuples (data structure)
p
JavaSpaces [Sun Microsystems]
njava based object oriented implementation nspaces are transactional secure
pmutual exclusive access to objects
nspaces are persistent
ptemporal, spatial uncoupling
nspaces are associative
Publish/Subscribe Paradigm and
NaradaBrokering
p
Publish-Subscribe communication
paradigm
nMessage based asynchronous communication
nParticipants are decoupled both in space and in time
p
Open source NaradaBrokering software
ntopic based publish/subscribe messaging system nruns on a network of cooperating broker nodes. nprovides support for variety of QoSs, such as low
16
Caching Strategy
pIntegrated caching capability for both UDDI-type and
WS-Context-type metadata
nlight-weight implementation of JavaSpaces
ndata sharing, associative lookup, and persistency
nboth WS-Context-type and common UDDI-type standard
operations
pThe system stores all keys and modest size values in
memory, while big size values are stored in the database.
nWe assume that today’s servers are capable of holding such
small size metadata in cache.
nAll modest-size metadata accesses happen in memory
pWS-Context type metadata is backed-up into MySQL
Performance Model and
Measurements
Average±error (ms)
Stddev
(ms)
Hybrid-WS-Context
Inquiry
12.29±0.02
0.48
Extended UDDI Inquiry
17.68±0.06
0.84
P4, 3.4GHz, 1GB memory,
Java SDK 1.4.2, both client and services on the same machine
Simulation Parameters
Metadata size 1.7 KB
18
Hybrid WS-Context Caching Approac
Persistency investigation
pThe figure shows the average
execution time for varying backup frequency.
pThe system shows a stable
performance until after the backup frequency is every 10 seconds.
19
Hybrid WS-Context Caching
Approach
Performance investigation
p% 49 performance increase in
inquiry % 53 performance gain in publication functions compared to database solution.
pSystem processing overhead is less
than 1 milliseconds.
Simulation parameters Backup
20
Hybrid WS-Context Caching Approac
Message rate scalability investigation
pThis figure shows the system
behavior under increasing message rates.
pThe system scales up to 940 inquiry
messages/second and 480 publication messages/second.
Simulation parameters Backup
21
Hybrid WS-Context Caching Approac
Message size scalability investigation
pThis figure shows the system
behavior under increasing message sizes.
pThe system performs well for small
size context. Performance remains
same between 100Byte and 10KBytes
Simulation parameters Backup
frequency every 10seconds
Registry size 5000 metadata Observation 200
pThis figure shows the system
behavior under increasing message sizes between 10 KB and 100 KB.
pThe system spends an additional ~7
22
Access: Request Distribution
p
Pub-sub system based message distribution
p
Broadcast-based request dissemination based
on a hashing scheme
nKeys are hashed to values (topics) that runs from 1 to
1000
nEach replica holder subscribes to topics (the hash
values) of the keys they have
nEach access request is broadcast on the topic
correspond to the key.
nReplica holders unicast a response with a copy of the
context under demand
p
Advantages
ndoes not flood the network with access request
messages
23
Access Distribution Experimen
Test Methodology
T
1 T2 T3
Time = T1 + T2 + T3
Simulation parameters
Backup frequency every 10 seconds pThe test consists of a
NaradaBrokering server and two hybrid
WS-Context instances for access request
distribution.
pWe determine the time
for avg. cost end-to-end metadata access.
pWe run the system
for 25000 observations.
pGridfarm and
24
Distribution experiment result
pThe figure shows average results for every 1000
observation. We have 25000 continuous observations.
pThe average transfer time shows the continuous access
distribution operation does not degrade the performance.
pThe figure shows the time required for various activities
of access request distribution.
pThe average overhead of distribution using the pub-sub
Optimizing Performance
Dynamic migration/replication
p
Dynamic migration/replication
nA methodology for creating temporary copies of a
context in the proximity of their requestors.
nAutonomous decisions
preplication decision belongs to the server
p
Algorithm based on [Rabinovich et al, 1999]
nThe system keeps the popularity (# of access requests)
record for each copy and flush it on regular time intervals
nThe system checks local data every so often for dynamic
migration or replication
nUnpopular server-initiated copies are deleted nPopular copies are moved where they wanted
26
T
1 T2 T3
Time = T1 + T2 + T3
Simulation parameters
message size / message rate 2.7 Kbytes / 10 msg/sec replication decision frequency every 100 seconds
deletion threshold 0.03 request/second replication threshold 0.18 request/second
registry size 1000 metadata in Indianapolis
pThe test consists of a
NaradaBrokering server and two hybrid
WS-Context instances for access request
distribution.
pWe determine the time
for mean end-to-end metadata access.
pWe run the system for
app. 45 minutes on Gridfarm and complexity
machines.
p
The figure shows average results for every
100 seconds.
p
The decrease in average latency shows that
28
Storage: Replica content placement
pPub-sub system for replica content placement
pEach node keeps a Replica Server Map
nThe new coming node sends a multicast probe message when
it joins a network
nEach network node responds with a unicast message to make
themselves discoverable
pSelection of Replica Server(s) for content placement
nSelect a node based on proximity weighting factor
pSending storage request to selected replica servers
n1st step: initiator unicasts storage request to each selected
replica server
n2nd step: recipient server stores the context and becomes
subscriber to the topic of that context
29
Fault-tolerance experiment
Testing Setup
Simulation parameters
Backup frequency every 10 seconds
pThe test system consists of
a NaradaBrokering server(s) and four hybrid WS-Context instances separated with significant network
instances.
pWe determine the time for
average end-to-end replica content creation.
pWe run the system
continuously for 25000 observations.
pGridfarm and Teragrid
30
Fault-tolerance experiment result
pThe figure shows average results for every 1000
observation. The system was continuously tested for 25000 observations.
pThe results indicate the continuous operation does not
degrade the performance.
pThe figure shows the results gathered from fault-tolerance
experiments data.
pOverhead of replica creation increases in the order of
Consistency enforcement
pPub-sub system for enforcing consistency pPrimary-copy approach
nUpdates of a same data are carried out at a single server nUse of NTP protocol based synchronized timestamps to
impose an order to write operations on the same data
pUpdate distribution
n1st step: An update request is forwarded (unicast) to the
primary copy holder by the initiator
n2nd step: The primary-copy holder performs the update
request and returns an acknowledgement
pUpdate propagation
nThe primary-copy pushes (broadcasts) updates of a context, non the topic (hash value) correspond to the key of the
context,
nif the primary-copy realizes that there exist a stale copy in
32
Consistency Enforcement Experimen
Test Methodology
T
1 T2 T3
Time = T1 + T2 + T3
Simulation parameters
Backup frequency every 10 seconds Message size 2.7 Kbytes
pThe test system consists
of a NaradaBrokering server and two hybrid WS-Context instances for access request
distribution.
pWe determine the avg.
time required for
enforcing consistency.
pWe run the system
for 25000 observations.
pGridfarm and
Consistency Enforcement Test Result
pThe figure shows average results for every 1000
observation. We have 25000 continuous observations.
pThe average transfer time shows the continuous
operation does not degrade the performance.
pThe figure shows the results gathered from consistency
experiments data.
pThe results indicate that the overhead of consistency
34
Comparison of Experiment Results
pThe figure shows the results gathered from the distribution,
fault-tolerance and consistency experiments data.
pThe results indicate that the overhead of integrating JavaSpaces
Contribution
pWe have shown that communication among services can
be achieved with efficient mediator metadata strategies.
nEfficient mediator services allow us to perform collective
operations such as queries on subsets of all available metadata in service conversation.
pWe have shown that efficient decentralized metadata
system can be built by integrating JavaSpaces with Publish/Subscribe paradigm.
nFault-tolerance, distribution and consistency can be succeeded
with few milliseconds system processing overhead.
pWe have shown that adaptation to instantaneous
changes in client demands can be achieved in decentralized metadata management.
pWe have introduced data models and programming