Managing Dynamic Metadata
and Context
Mehmet S. Aktas
Computer Science, Informatics, Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
Outline
n
Motivation
n
Research Issues
n
Proposed Approach
nEvaluation
Context as Service Metadata in
Gaggle of Services
n Context is metadata associated to both services and their activities
• interaction-independent
n slowly varying, quasi-static context
n Ex: type or endpoint of a service, less likely to change
• interaction-dependent, generated as result of interaction of
services
n dynamic, highly updated context
n information associated to a single service, a session (service activity)
or both
n Ex: session-id, URI of the coordinator of a workflow session
n Gaggle of Services
• set of actively collaborating managed services dynamically
assembled for specific tasks
• generate events as result of interactions
Collaboration Grids
n
Multimedia Collaboration domain
• collaborative A/V sessions with varying types of dynamic
metadata describing group of participants
n real-time metadata describing audio/video streams
• Collaboration Grids has also static metadata
n information about service, available sessions, and media
servers
• needs a distributed real-time session metadata management
systems
n
Characteristics of the domain
• widely distributed services
• metadata of events (archival data)
n mostly read-only
n persistent, but lifetime is bounded to lifetime of events
GIS/Sensor Grids
n
Workflow-style applications in Geographic Information
System and Sensor Grids
• sensor grid data services generates events when a certain
magnitude event occurs
• firing off various codes, filtering, analyzing raw data,
generating images, maps
• needs a distributed workflow session metadata management
systems to correlate workflow activities
n
Characteristics of domain
• any number of widely distributed services can be involved
• conversation metadata
n transient
n multiple writers
Problem Space and Requirements
n Practical Problem: We need management of all information
associated with services in Gaggle of Services for;
• correlating activities of widely distributed services (1, 2)
• enabling uniform query capabilities to both dialog or
monolog context information (3, 4)
n “Give me list of services satisfying C:{a,b,c..} QoS
requirements and participating S:{x,y,z..} sessions”
• management of events especially in multimedia collaboration
n providing information to enable (5)
• real-time replay/playback and
• session failure recovery capabilities
n Requirements
• dynamism
• performance
• uniformity
interoperab ility
Different Metadata Systems- I
nThere are different standards defining
interaction-independent meta-data
, such as
UDDI and its
extentions
n
And many different implementations from (extended)
UDDI through MCAT of the Storage Research Broker
n
And of course representations including RDF and OWL
nFurther there is system metadata (such as UDDI for
core services) and metadata catalogs for each
application domain such as WRS (Web Registry
Service) for GIS
n
They have different scope and different QoS trade-offs
• e.g. Distributed Hash Tables (Chord) to achieve scalability inDifferent Metadata Systems- II
n There are various technologies addressing interaction-dependent
meta-data.
n Point-to-Point
• WS-Metadata Exchange
• WS-Resource Framework
n Point-to-Point methodologies
• are limited to communication with metadata only from the
two services.
• do not scale in managing activities of widely distributed
services in workflow style grid applications
n WS-Context is promising it has limitations
• limited query capability
• lack of support interaction-independent metadata
• centralized – single point of failure, performance bottleneck
¢ Centralized
WS-Context
¢ Centralized
standard way of maintaining
distributed session state information
standard way of publishing,
discovering generic Web Service information
purpose
high performance, light-weight storage, up-to-date entries, notification
(members of an activity should be notified of the distributed state
information), synchronous better expressiveness power
(e.g., RDF-enabled UDDI Registries), up-to-date service entries, metadata-oriented
discovery capabilities, domain-specific capabilities (e.g.,
geospatial query capabilities)
most desired features
Sub-Grids, modest number interacting Web Services participating an activity
Whole Grid, UDDI is a domain-independent service for generic service metadata
scalability
simplicity in inquiry
arguments, mostly key-based retrieval queries, selectivity of queries is one.
high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results
types of typical queries
interaction-dependent, highly dynamic, small-size
interaction-independent, rarely-changing, small-size
metadata
characteristics
WS-Context UDDI & It’s Extensions
Motivations
n Lack of support for providing uniform programming interface
(with advanced query capabilities) to
• large scale relatively static metadata as in searchable
repository of all the world’s services and session related dynamic metadata
n Lack of support for managing small scale highly dynamic
metadata as in dynamic workflows for sensor integration and collaboration
• fault-tolerance and ability to support dynamic changes with
few millisecond delay
• but only a modest number of involved services (up to 1000’s
in a session)
• ability to adapt instantaneous changes in client demands
Research Issues
n
How can we achieve
a standard way of publishing
inquiring
both interaction-independent and
conversation-based
service metadata through a
uniform programming interface?
n
What is
a novel architecture for
a decentralized
Information Service managing dynamic session-related
metadata
of widely distributed services?
n
For building a decentralized metadata-system, we
investigate
research issues
related with;
• performance
• scalability
• fault-tolerance
Our approach:
Hybrid WS-Context XML Metadata Service
n
We designed and built a
WS-Context
compliant XML
Metadata services supporting distributed or central
paradigms.
This service a Fault Tolerant
and
High
Performance Information Service
(
FTHPIS
).
n
supports extensive metadata requirements of rich
interacting systems,
such as
• correlating activities of widely distributed services, EX:
workflow style GIS Service Oriented Architectures, AND
• optimizing Grid/Web Service messaging performance, EX:
mobile computing environment, AND
• managing dynamic events especially in multimedia
collaboration, EX: collaboration Grid/Web service
applications, AND
• providing information to enable session failure recovery
Hybrid XML Metadata Service
WS-Context + UDDI
n We combine extended functionalities of these two services:
WS-Context AND UDDI in one hybrid service to manage
Context (service metadata).
• extended WS-Context controlling a workflow
• extended UDDI providing a searchable repository for services
• This approach meets the interoperability and uniformity
requirements of the problem.
n Our approach enables advanced query capabilities on service
metadata
• hybrid functions operating on both metadata spaces
• extended WS-Context functions operating on session metadata,
(parent-child relationships are implemented)
• extended UDDI functions operating on
interaction-independent metadata
• information security functions providing a simple
Extended UDDI WSDL Service Interface Descriptions
uddi_extended.wsdl
HTTP
Hybrid WSContext Service interface combining Extended UDDI and WS-Context WSDL Descriptions
uddi_wscontext.wsdl
Database
JDBC
Extended UDDI Service
WSDL
HTTP(S)
WSDL FTHPIS Client
WSDL FTHPIS Client
WSDL WSDL
Hybrid WSContext Service
Database
WSDL
JDBC
n
We also designed and implemented an
extended
UDDI XML Metadata Service
(alternative to OGC
Web Registry Services).
This service
,
n
supports GIS Metadata Catalog (
functional
metadata),
user-defined
metadata ((name, value)
pairs),
up-to-date
service information (leasing),
dynamic aggregation of geospatial services
.
n
Our approach enables advanced query capabilities
•
geo-spatial
and
temporal
queries ,
•
metadata oriented queries,
•
domain independent queries
such as XPATH
queries on metadata catalog.
Key Design Features
n
Message Dissemination
• communication method among the nodes of the network
n
Caching
• usage of memory-built-in storage running on each node to
minimize latency and meet the performance requirement
n
Access
• methodology for redirecting client request to an appropriate
replica server to meet dynamism and the performance
requirements
n
Storage
• methodology for replicating data to meet fault tolerance and
performance requirements
n
Consistency enforcement
Message Dissemination
n Publish-Subscribe exploited to support replicated storage e.g.
• Initial storage of context
• Dissemination of context access requests
• Dissemination of updates to make copies consistent
n We used open source NaradaBrokering software to provide
multi-publisher multicast communication mechanism
• topic based publish/subscribe messaging system
• runs on a network of cooperating broker nodes.
• provides support for variety of QoSs, such as low latency,
HTTP(S) WSDL Client WSDL Client HTTP Subscriber Publisher Database JDBC Extended UDDI Service WSDL Database WSDL Hybrid-WSContext Service JDBC Database WSDL Hybrid-WSContext Service JDBC
Topic Based Publish-Subscribe
Messaging System
Replica Server-2 Replica Server-N
WSDL WSDL Hybrid-WSContext Service Database WSDL JDBC
Distributed Hybrid WS-Context XML Metadata Services
Caching Strategy
n TupleSpaces paradigm exploited to support caching
• asynchronous communication • pioneered by David Gelernter • communication units are tuples
n data-structure consisting of one or more typed fields
n Hybrid WS-Context Service employs/extends TupleSpaces:
• use of A light-weight implementation of JavaSpaces
• all memory accesses. overhead is negligible (less than 1msec. for inquiries) • data sharing - mutual exclusive access to tuples
• associative lookup - content based search, appropriate for key-based
caching
• temporal, spatial uncoupling of communicating parties
• e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields:
a) a string, "context_id" and b) a Java object, "Context".
Access: Request Distribution
n Peer-to-Peer based message distribution methodology exploited
for redirecting a client request to the appropriate replica server
• Use of pub-sub system for request distribution
• broadcast-based Context access request dissemination
• servers that can satisfy the query unicast a response with a
copy of the context under demand
n Advantages: does not keep track of locations of every single data,
makes use of redundant copies kept only for fault-tolerance reasons, improves the responsiveness
n Practical Problem: If the number of repetitive queries that
require probing the network increased, this may amplify the network consumption and affect the system performance
n Approach: use of dynamic replication for moving/replicating
Storage: Replica placement
n Peer-to-Peer based message distribution methodology exploited
for creating initial permanent-copies of a context
• Use of pub-sub system for permanent-replication
• Use of non-blocking replica placement
• 1st step: initiator creates a temporary copy at every capable
replica server
• 2nd step: initiator keeps permanent copies only at a few first
answering replica servers for fault-tolerance
n Advantages: [1] the publishing client does not block until the
replication is completed, [2] a temporary full-replication
methodology exploited to improve the responsiveness, [3]
permanent-copies remain as backup facility to meet the
Storage: Dynamic replication
n Dynamic replication methodology exploited for creating
server-initiated (temporary) copies of a context
• Use of pub-sub system for server-initiated replication
• replication decision belongs to the server (autonomous)
• we keep the popularity (# of access requests) record for each
copy of a context and flush it on regular time intervals
• unpopular server-initiated copies of a context are deleted
• popular copies of a context are moved in the proximity of
their requestors (where the requests are originated)
• very popular copies of a context are replicated in the
proximity of their requestors (where the requests are originated)
n Advantages: [1] this strategy exploits locality which in turn
Consistency enforcement
n Consistency enforcement methodologies exploited to keep copies
of a context consistent.
• Use of weak consistency model: copies of a context can be
different, however, updates are propagated to replicas whenever it is needed for consistent view of information.
• Use of pub-sub system for update propagation
• Use of primary-copy approach, all updates for a specific
context are initiated at a single server
• Use of synchronized timestamps (as versions) to give sequence
to each published context to impose an order for concurrent write operations on the same data
• updates are pulled by a replica server from the primary-copy
if the replica server realizes that it has a stale copy
• updates are pushed (broadcasted) by the primary-copy if it
Consistency enforcement - II
n Advantage: this strategy employs non-blocking primary-copy
approach, thus the publisher does not block until an update operation is completed that in turn improves responsiveness
n Practical Problems: [1] with this strategy, one cannot update a
data item more frequently than one operation per 30
milliseconds, which the NaradaBrokering NTP-protocol based synchronized timestamp accuracy. [2] with this strategy, a client cannot make sure if the update operation is carried out
correctly.
n Approach: 1 update operation per 30 millisecond is acceptable
Prototype Evaluation
n
We evaluated the prototype implementation for three
distinct aspects of distributed systems:
Ø
Performance
n baseline performance
n effect of the network latency on the baseline performance
Ø
Scalability
n performance degradation of the system under increasing
message sizes or message rates
n scalability gain both in numbers and in performance when
moving from a centralized system to a distributed system
under the same workload.
Ø
Fault-tolerance
n the empirical cost of the fault-tolerance in terms of
Axis 2 (in Tomcat 5.5.8) SOAP Engine
Java 2 platform,
Standard Edition (1.4.2-beta-b19) Java Version
GNU/Linux (kernel release 2.4.22) OS
900 Mbits/sec.[1]
(among the cluster nodes) Network Bandwidth
2GB total RAM
Intel® Xeon™ CPU (2.40GHz) Processor
TESTBED: Cluster node configuration
Test-4. extended UDDI inquiry/publication WSDL si ng le th re ad ed WSDL extended UDDI Client 1 user/1000 transactions Extended UDDI Server Extended UDDI Server Engine
Test-1. Dummy Server
WSDL si ng le th re ad ed WSDL Client 1 user/1000 transactions Dummy Server Dummy Server
Test-2. Hybrid-WSContext inquiry/publication without database access
q If query can be satisfied by Javaspaces cache, the query can be satisfied in < 1ms plus the few milliseconds of Web service overhead
q comparable performance for standard
operations with the existing metadata management
services. JWSDUDDI-MT 18.99 ms20.37 ms
40 ms JUDDI
Avg. latency for inquiries Metadata Services
TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes
si ng le th re ad ed WSDL WS-Context Client 1 user/100 transactions WSDL Hybrid FTHPIS-WSContext Service Publishin g Querying Module JDBC Handler Expeditor HTTP(S) WS DL T hr e a d P o ol WS DL T hr e a d P o ol WSDL Hybrid-WSContext Service Publishin g Querying Module JDBC Handler Expeditor
5 Client distributed to cluster nodes 1 to 5, with each running
1 to 15 threads
Ø The results indicate that the system performs well for small-size context payloads.
Ø The results also indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes.
Stdev=1.42 Stdev=2.68 Stdev=3.09
Stdev=11.03
Stdev=11.54
Stdev=8.27 Stdev=6.95 Stdev=6.72
Stdev=10.07
Ø The system can scale up to 940 simultaneous querying clients and
222 simultaneous publishing clients where each client sending one query per second, for small size context payloads with 30
milliseconds backup interval time for fault tolerance.
Ø Multi-core hosts will improve performance dramatically.
Stdev=10.31
Stdev=39.49Stdev=53
Stdev=0.65 Stdev=0.97Stdev=0.91
HTTP(S) WS DL T hr ea d P oo l WS DL T hr ea d P oo l
5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing
messages to randomly selected servers.
Ø We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.
Ø Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)
Ø 5 different FTHPIS system tested when N range from 1 to 5 under the same workload.
Ø At each testing case, same volume of data is evenly distributed among the nodes.
no de -1 no de -5 no de -1 no de -5 no de -4 no de -3 no de -2 no de -1 no de -5 no de -3 no de -1 no de -5 no de -3 no de -2
2 3 4 5
no de -5
1
Ø The scalability of metadata store
can be increased when moving from a centralized service to a distributed
system. 2 1005 40.76 ± 0.43 38.22
33.52 47.05 ± 0.24
940 1
Stdev (ms) mean ± error
(ms) message rate
# of nodes
Hybrid WS-Context inquiry operation
Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversin this choice should lead to throughput
FAULT-TOLERANCE TEST RESULTS
q Fault-tolerance ?? vs. Performance??.
q The lower the level of fault-tolerance, the higher the performance.
Summary of Contributions
n specification on managing all service metadata
• a method to achieve uniform programming interface to both
interaction-independent and session-related metadata. This method also introduces a data model for storing session-related metadata
n specification on managing interaction-independent service
metadata
• a method to achieve a Geographical Information Systems compatible,
domain-independent and metadata-oriented management of interaction-independent service metadata
n fault tolerant and high performance information service
• a method to achieve management of dynamic metadata and Context in
Future work
n
transaction scheduling
• Investigate how to minimize the time required to complete
transactions on two diff. metadata systems with diff. time constraints
n
evaluation of dynamic replication
• Carry out simulations for evaluation of dynamic replication
n
optimal caching methodologies
• Implement and test more optimal caching methodologies
n
smoothening the impacts of backups on performance
• Investigate how to minimize the impact of the time spent (high
Java 2, STE, (1.4.2- beta-b19) GNU/Linux (kernel release 2.6.16) 2GB
Dual Core AMD Opteron(tm) Processor 270 Tallahase, FL, USA
vlab2.scs.fsu.edu
Java 2, STE, (1.4.2- beta-b19) GNU/Linux 8GB GenuineIntel IA-64,
Itanium 2, 4 processors San Diego, CA, USA
tg-login.sdsc.teragrid.org
Java 2, STE, (1.4.2- beta-b19) GNU/Linux (kernel release 2.6.9) 4GB
Intel(R) Xeon(TM) CPU 3.20GHz
Austing, TX, USA lonestar.tacc.utexas.edu
Java
HotSpot (TM) 6 4 - B i t Server VM(1.4. 2-01) SunOS 5.9 16GB Sun-Fire-880, sun4u sparc SUNW Indianapolis, IN, USA complexity.ucs.indiana.edu
Java 2, STE, (1.4.2- beta-b19) GNU/Linux (kernel release 2.4.22) 2GB
Intel® Xeon™ CPU (2.40GHz) Bloomington, IN, USA gf6.ucs.indiana.edu Java Version OS RAM Processor Location Summary of machine configurations
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3...">
<soap:Header encodingStyle=“URL" mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100">
<context-id>http..</context-id>
<context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> SOAP header
The Pattern Informatics GIS-SOA based workflow application
5,6: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id
7,8,9:HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result
10: HPSearch writes the URI of the of the output file into Context
11: WMS polls the information from Context Service
12: WMS retrieves the generated output file by workflow script and generates a map
<context xsd:type="ContextType"timeout=“100">
<context-service>http://.../HPSearch</ context-service> <content> HPSearch associated additional data generated
during execution of workflow. </content> service
<context xsd:type="ContextType"timeout=“100">
<context-service>http://.../WMS</ context-service>
<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> session <context xsd:type="ContextType"timeout=“100">
<context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content>
</context> user profile
<context xsd:type="ContextType"timeout=“100">
<context-id>http://../abcdef:012345<context-id/>
<context-service>http://.../HPSearch</ context-service>
<content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> shared state <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content>
<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> activity 3 WMS WFS http://..../..../..txt HP Search Data Filter PI Code Data Filter http://..../..../x.gml Context Information Service 4 7,8,9 10 6 5,11 WMS Client Extended UDDI 0 1 2