Managing Dynamic Metadata and Context

(1)

Managing Dynamic Metadata

and Context

Mehmet S. Aktas

Computer Science, Informatics, Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

(2)

Outline

n

Motivation

n

Research Issues

n

Proposed Approach

n

Evaluation

(3)

Context as Service Metadata in

Gaggle of Services

n Context is metadata associated to both services and their activities

• interaction-independent

n slowly varying, quasi-static context

n Ex: type or endpoint of a service, less likely to change

• interaction-dependent, generated as result of interaction of

services

n dynamic, highly updated context

n information associated to a single service, a session (service activity)

or both

n Ex: session-id, URI of the coordinator of a workflow session

n Gaggle of Services

• set of actively collaborating managed services dynamically

assembled for specific tasks

• generate events as result of interactions

(4)

Collaboration Grids

n

Multimedia Collaboration domain

• collaborative A/V sessions with varying types of dynamic

metadata describing group of participants

n real-time metadata describing audio/video streams

• Collaboration Grids has also static metadata

n information about service, available sessions, and media

servers

• needs a distributed real-time session metadata management

systems

n

Characteristics of the domain

• widely distributed services

• metadata of events (archival data)

n mostly read-only

n persistent, but lifetime is bounded to lifetime of events

(5)

GIS/Sensor Grids

n

Workflow-style applications in Geographic Information

System and Sensor Grids

• sensor grid data services generates events when a certain

magnitude event occurs

• firing off various codes, filtering, analyzing raw data,

generating images, maps

• needs a distributed workflow session metadata management

systems to correlate workflow activities

n

Characteristics of domain

• any number of widely distributed services can be involved

• conversation metadata

n transient

n multiple writers

(6)

Problem Space and Requirements

n Practical Problem: We need management of all information

associated with services in Gaggle of Services for;

• correlating activities of widely distributed services (1, 2)

• enabling uniform query capabilities to both dialog or

monolog context information (3, 4)

n “Give me list of services satisfying C:{a,b,c..} QoS

requirements and participating S:{x,y,z..} sessions”

• management of events especially in multimedia collaboration

n providing information to enable (5)

• real-time replay/playback and

• session failure recovery capabilities

n Requirements

• dynamism

• performance

• uniformity

interoperab ility

(7)

Different Metadata Systems- I

n

There are different standards defining

interaction-independent meta-data

, such as

UDDI and its

extentions

n

And many different implementations from (extended)

UDDI through MCAT of the Storage Research Broker

n

And of course representations including RDF and OWL

n

Further there is system metadata (such as UDDI for

core services) and metadata catalogs for each

application domain such as WRS (Web Registry

Service) for GIS

n

They have different scope and different QoS trade-offs

• e.g. Distributed Hash Tables (Chord) to achieve scalability in

(8)

Different Metadata Systems- II

n There are various technologies addressing interaction-dependent

meta-data.

n Point-to-Point

• WS-Metadata Exchange

• WS-Resource Framework

n Point-to-Point methodologies

• are limited to communication with metadata only from the

two services.

• do not scale in managing activities of widely distributed

services in workflow style grid applications

n WS-Context is promising it has limitations

• limited query capability

• lack of support interaction-independent metadata

• centralized – single point of failure, performance bottleneck

¢ Centralized

WS-Context

¢ Centralized

(9)

standard way of maintaining

distributed session state information

standard way of publishing,

discovering generic Web Service information

purpose

high performance, light-weight storage, up-to-date entries, notification

(members of an activity should be notified of the distributed state

information), synchronous better expressiveness power

(e.g., RDF-enabled UDDI Registries), up-to-date service entries, metadata-oriented

discovery capabilities, domain-specific capabilities (e.g.,

geospatial query capabilities)

most desired features

Sub-Grids, modest number interacting Web Services participating an activity

Whole Grid, UDDI is a domain-independent service for generic service metadata

scalability

simplicity in inquiry

arguments, mostly key-based retrieval queries, selectivity of queries is one.

high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results

types of typical queries

interaction-dependent, highly dynamic, small-size

interaction-independent, rarely-changing, small-size

metadata

characteristics

WS-Context UDDI & It’s Extensions

(10)

Motivations

n Lack of support for providing uniform programming interface

(with advanced query capabilities) to

• large scale relatively static metadata as in searchable

repository of all the world’s services and session related dynamic metadata

n Lack of support for managing small scale highly dynamic

metadata as in dynamic workflows for sensor integration and collaboration

• fault-tolerance and ability to support dynamic changes with

few millisecond delay

• but only a modest number of involved services (up to 1000’s

in a session)

• ability to adapt instantaneous changes in client demands

(11)

Research Issues

n

How can we achieve

a standard way of publishing

inquiring

both interaction-independent and

conversation-based

service metadata through a

uniform programming interface?

n

What is

a novel architecture for

a decentralized

Information Service managing dynamic session-related

metadata

of widely distributed services?

n

For building a decentralized metadata-system, we

investigate

research issues

related with;

• performance

• scalability

• fault-tolerance

(12)

Our approach:

Hybrid WS-Context XML Metadata Service

n

We designed and built a

WS-Context

compliant XML

Metadata services supporting distributed or central

paradigms.

This service a Fault Tolerant

and

High

Performance Information Service

(

FTHPIS

).

n

supports extensive metadata requirements of rich

interacting systems,

such as

• correlating activities of widely distributed services, EX:

workflow style GIS Service Oriented Architectures, AND

• optimizing Grid/Web Service messaging performance, EX:

mobile computing environment, AND

• managing dynamic events especially in multimedia

collaboration, EX: collaboration Grid/Web service

applications, AND

• providing information to enable session failure recovery

(13)

Hybrid XML Metadata Service

WS-Context + UDDI

n We combine extended functionalities of these two services:

WS-Context AND UDDI in one hybrid service to manage

Context (service metadata).

• extended WS-Context controlling a workflow

• extended UDDI providing a searchable repository for services

• This approach meets the interoperability and uniformity

requirements of the problem.

n Our approach enables advanced query capabilities on service

metadata

• hybrid functions operating on both metadata spaces

• extended WS-Context functions operating on session metadata,

(parent-child relationships are implemented)

• extended UDDI functions operating on

interaction-independent metadata

• information security functions providing a simple

(14)

Extended UDDI WSDL Service Interface Descriptions

uddi_extended.wsdl

HTTP

Hybrid WSContext Service interface combining Extended UDDI and WS-Context WSDL Descriptions

uddi_wscontext.wsdl

Database

JDBC

Extended UDDI Service

WSDL

HTTP(S)

WSDL FTHPIS Client

WSDL WSDL

Hybrid WSContext Service

Database

WSDL

JDBC

(15)

n

We also designed and implemented an

extended

UDDI XML Metadata Service

(alternative to OGC

Web Registry Services).

This service

,

n

supports GIS Metadata Catalog (

functional

metadata),

user-defined

metadata ((name, value)

pairs),

up-to-date

service information (leasing),

dynamic aggregation of geospatial services

.

n

Our approach enables advanced query capabilities

•

geo-spatial

and

temporal

queries ,

•

metadata oriented queries,

•

domain independent queries

such as XPATH

queries on metadata catalog.

(16)

Key Design Features

n

Message Dissemination

• communication method among the nodes of the network

n

Caching

• usage of memory-built-in storage running on each node to

minimize latency and meet the performance requirement

n

Access

• methodology for redirecting client request to an appropriate

replica server to meet dynamism and the performance

requirements

n

Storage

• methodology for replicating data to meet fault tolerance and

performance requirements

n

Consistency enforcement

(17)

Message Dissemination

n Publish-Subscribe exploited to support replicated storage e.g.

• Initial storage of context

• Dissemination of context access requests

• Dissemination of updates to make copies consistent

n We used open source NaradaBrokering software to provide

multi-publisher multicast communication mechanism

• topic based publish/subscribe messaging system

• runs on a network of cooperating broker nodes.

• provides support for variety of QoSs, such as low latency,

(18)

HTTP(S) WSDL Client WSDL Client HTTP Subscriber Publisher Database JDBC Extended UDDI Service WSDL Database WSDL Hybrid-WSContext Service JDBC Database WSDL Hybrid-WSContext Service JDBC

Topic Based Publish-Subscribe

Messaging System

Replica Server-2 Replica Server-N

WSDL WSDL Hybrid-WSContext Service Database WSDL JDBC

Distributed Hybrid WS-Context XML Metadata Services

(19)

Caching Strategy

n TupleSpaces paradigm exploited to support caching

• asynchronous communication • pioneered by David Gelernter • communication units are tuples

n data-structure consisting of one or more typed fields

n Hybrid WS-Context Service employs/extends TupleSpaces:

• use of A light-weight implementation of JavaSpaces

• all memory accesses. overhead is negligible (less than 1msec. for inquiries) • data sharing - mutual exclusive access to tuples

• associative lookup - content based search, appropriate for key-based

caching

• temporal, spatial uncoupling of communicating parties

• e.g. a tuple: ("context_id", Context). This indicates a tuple with two fields:

a) a string, "context_id" and b) a Java object, "Context".

(20)

Access: Request Distribution

n Peer-to-Peer based message distribution methodology exploited

for redirecting a client request to the appropriate replica server

• Use of pub-sub system for request distribution

• broadcast-based Context access request dissemination

• servers that can satisfy the query unicast a response with a

copy of the context under demand

n Advantages: does not keep track of locations of every single data,

makes use of redundant copies kept only for fault-tolerance reasons, improves the responsiveness

n Practical Problem: If the number of repetitive queries that

require probing the network increased, this may amplify the network consumption and affect the system performance

n Approach: use of dynamic replication for moving/replicating

(21)

Storage: Replica placement

n Peer-to-Peer based message distribution methodology exploited

for creating initial permanent-copies of a context

• Use of pub-sub system for permanent-replication

• Use of non-blocking replica placement

• 1st step: initiator creates a temporary copy at every capable

replica server

• 2nd step: initiator keeps permanent copies only at a few first

answering replica servers for fault-tolerance

n Advantages: [1] the publishing client does not block until the

replication is completed, [2] a temporary full-replication

methodology exploited to improve the responsiveness, [3]

permanent-copies remain as backup facility to meet the

(22)

Storage: Dynamic replication

n Dynamic replication methodology exploited for creating

server-initiated (temporary) copies of a context

• Use of pub-sub system for server-initiated replication

• replication decision belongs to the server (autonomous)

• we keep the popularity (# of access requests) record for each

copy of a context and flush it on regular time intervals

• unpopular server-initiated copies of a context are deleted

• popular copies of a context are moved in the proximity of

their requestors (where the requests are originated)

• very popular copies of a context are replicated in the

proximity of their requestors (where the requests are originated)

n Advantages: [1] this strategy exploits locality which in turn

(23)

Consistency enforcement

n Consistency enforcement methodologies exploited to keep copies

of a context consistent.

• Use of weak consistency model: copies of a context can be

different, however, updates are propagated to replicas whenever it is needed for consistent view of information.

• Use of pub-sub system for update propagation

• Use of primary-copy approach, all updates for a specific

context are initiated at a single server

• Use of synchronized timestamps (as versions) to give sequence

to each published context to impose an order for concurrent write operations on the same data

• updates are pulled by a replica server from the primary-copy

if the replica server realizes that it has a stale copy

• updates are pushed (broadcasted) by the primary-copy if it

(24)

Consistency enforcement - II

n Advantage: this strategy employs non-blocking primary-copy

approach, thus the publisher does not block until an update operation is completed that in turn improves responsiveness

n Practical Problems: [1] with this strategy, one cannot update a

data item more frequently than one operation per 30

milliseconds, which the NaradaBrokering NTP-protocol based synchronized timestamp accuracy. [2] with this strategy, a client cannot make sure if the update operation is carried out

correctly.

n Approach: 1 update operation per 30 millisecond is acceptable

(25)

Prototype Evaluation

n

We evaluated the prototype implementation for three

distinct aspects of distributed systems:

Ø

Performance

n baseline performance

n effect of the network latency on the baseline performance

Ø

Scalability

n performance degradation of the system under increasing

message sizes or message rates

n scalability gain both in numbers and in performance when

moving from a centralized system to a distributed system

under the same workload.

Ø

Fault-tolerance

n the empirical cost of the fault-tolerance in terms of

(26)

Axis 2 (in Tomcat 5.5.8) SOAP Engine

Java 2 platform,

Standard Edition (1.4.2-beta-b19) Java Version

GNU/Linux (kernel release 2.4.22) OS

900 Mbits/sec.[1]

(among the cluster nodes) Network Bandwidth

2GB total RAM

Intel® Xeon™ CPU (2.40GHz) Processor

TESTBED: Cluster node configuration

(27)

Test-4. extended UDDI inquiry/publication WSDL si ng le th re ad ed WSDL extended UDDI Client 1 user/1000 transactions Extended UDDI Server Extended UDDI Server Engine

Test-1. Dummy Server

WSDL si ng le th re ad ed WSDL Client 1 user/1000 transactions Dummy Server Dummy Server

Test-2. Hybrid-WSContext inquiry/publication without database access

(28)

q If query can be satisfied by Javaspaces cache, the query can be satisfied in < 1ms plus the few milliseconds of Web service overhead

q comparable performance for standard

operations with the existing metadata management

services. _JWSDUDDI-MT _{18.99 ms}20.37 ms

40 ms JUDDI

Avg. latency for inquiries Metadata Services

(29)

TEST-1 - Hybrid-WSContext inquiry/publication with increasing message sizes

si ng le th re ad ed WSDL WS-Context Client 1 user/100 transactions WSDL Hybrid FTHPIS-WSContext Service Publishin g Querying Module JDBC Handler Expeditor HTTP(S) WS DL T hr e a d P o ol WS DL T hr e a d P o ol WSDL Hybrid-WSContext Service Publishin g Querying Module JDBC Handler Expeditor

5 Client distributed to cluster nodes 1 to 5, with each running

1 to 15 threads

(30)

Ø The results indicate that the system performs well for small-size context payloads.

Ø The results also indicate that the cost of inquiry and publication operations remains the same, as the context’s payload size increases from 100Bytes up to 10KBytes.

Stdev=1.42 Stdev=2.68 Stdev=3.09

Stdev=11.03

Stdev=11.54

Stdev=8.27 Stdev=6.95 Stdev=6.72

Stdev=10.07

(31)

Ø The system can scale up to 940 simultaneous querying clients and

222 simultaneous publishing clients where each client sending one query per second, for small size context payloads with 30

milliseconds backup interval time for fault tolerance.

Ø Multi-core hosts will improve performance dramatically.

Stdev=10.31

Stdev=39.49Stdev=53

Stdev=0.65 Stdev=0.97Stdev=0.91

(32)

HTTP(S) WS DL T hr ea d P oo l WS DL T hr ea d P oo l

5 Client distributed to cluster nodes 1 to 5, with each running 1 to 15 threads firing

messages to randomly selected servers.

Ø We investigate scalability when moving from a centralized server to a distributed one under heavy workloads.

Ø Numbered rectangle shapes correspond to an N-node FTHPIS system with various Publish-Subscribe topologies (this does NOT affect performance)

Ø 5 different FTHPIS system tested when N range from 1 to 5 under the same workload.

Ø At each testing case, same volume of data is evenly distributed among the nodes.

no de -1 no de -5 no de -1 no de -5 no de -4 no de -3 no de -2 no de -1 no de -5 no de -3 no de -1 no de -5 no de -3 no de -2

2 3 4 5

no de -5

1

(33)

Ø The scalability of metadata store

can be increased when moving from a centralized service to a distributed

system. 2 1005 40.76 ± 0.43 38.22

33.52 47.05 ± 0.24

940 1

Stdev (ms) mean ± error

(ms) message rate

# of nodes

Hybrid WS-Context inquiry operation

Non-optimal caching algorithm as does database access BEFORE Publish-Subscribe. Reversin this choice should lead to throughput

(34)

(35)

FAULT-TOLERANCE TEST RESULTS

q Fault-tolerance ?? vs. Performance??.

q The lower the level of fault-tolerance, the higher the performance.

(36)

Summary of Contributions

n specification on managing all service metadata

• a method to achieve uniform programming interface to both

interaction-independent and session-related metadata. This method also introduces a data model for storing session-related metadata

n specification on managing interaction-independent service

metadata

• a method to achieve a Geographical Information Systems compatible,

domain-independent and metadata-oriented management of interaction-independent service metadata

n fault tolerant and high performance information service

• a method to achieve management of dynamic metadata and Context in

(37)

Future work

n

transaction scheduling

• Investigate how to minimize the time required to complete

transactions on two diff. metadata systems with diff. time constraints

n

evaluation of dynamic replication

• Carry out simulations for evaluation of dynamic replication

n

optimal caching methodologies

• Implement and test more optimal caching methodologies

n

smoothening the impacts of backups on performance

• Investigate how to minimize the impact of the time spent (high

(38)

(39)

(40)

Java 2, STE, (1.4.2- beta-b19) GNU/Linux (kernel release 2.6.16) 2GB

Dual Core AMD Opteron(tm) Processor 270 Tallahase, FL, USA

vlab2.scs.fsu.edu

Java 2, STE, (1.4.2- beta-b19) GNU/Linux 8GB GenuineIntel IA-64,

Itanium 2, 4 processors San Diego, CA, USA

tg-login.sdsc.teragrid.org

Intel(R) Xeon(TM) CPU 3.20GHz

Austing, TX, USA lonestar.tacc.utexas.edu

Java

HotSpot (TM) 6 4 - B i t Server VM(1.4. 2-01) SunOS 5.9 16GB Sun-Fire-880, sun4u sparc SUNW Indianapolis, IN, USA complexity.ucs.indiana.edu

Intel® Xeon™ CPU (2.40GHz) Bloomington, IN, USA gf6.ucs.indiana.edu Java Version OS RAM Processor Location Summary of machine configurations

(41)

<?xml version="1.0" encoding="UTF-8"?>

<soap:Envelope xmlns:soap="http://www.w3...">

<soap:Header encodingStyle=“URL" mustUnderstand="true"> <context xmlns=“ctxt schema“ timeout="100">

<context-id>http..</context-id>

<context-service> http.. </context-service> <context-manager> http.. </context-service> <activity-list mustUnderstand="true" mustPropagate="true"> <p-service>http://../WMS</p-service> <p-service>http://../HPSearch</p-service> </activity-list> </context> </soap:Header> SOAP header

The Pattern Informatics GIS-SOA based workflow application

5,6: WMS starts a session, invokes HPSearch to run workflow script for PI Code with a session id

7,8,9:HPSearch runs the workflow script and generates output file in GML format (& PDF Format) as result

10: HPSearch writes the URI of the of the output file into Context

11: WMS polls the information from Context Service

12: WMS retrieves the generated output file by workflow script and generates a map

<context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service> <content> HPSearch associated additional data generated

during execution of workflow. </content> service

<context-service>http://.../WMS</ context-service>

<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../WMS</service> <service>http://.../HPSearch</service> </activity-list> </context> session <context xsd:type="ContextType"timeout=“100">

<context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> profile information related WMS </content>

</context> user profile

<context-id>http://../abcdef:012345<context-id/>

<context-service>http://.../HPSearch</ context-service>

<content>http://danube.ucs.indiana.edu:8080\x.xml</content> </context> shared state <context xsd:type="ContextType"timeout=“100"> <context-service>http://.../HPSearch</ context-service> <parent-context>http://../abcdef:012345<parent-context/> <content> shared data for HPSearch activity </content>

<activity-list mustUnderstand="true" mustPropagate="true"> <service>http://.../DataFilter1</service> <service>http://.../PICode</service> <service>http://.../DataFilter2</service> </activity-list> </context> activity 3 WMS WFS http://..../..../..txt HP Search Data Filter PI Code Data Filter http://..../..../x.gml Context Information Service 4 7,8,9 10 6 5,11 WMS Client Extended UDDI 0 1 2